## We try to detect active/passive voice of a scentence using spacy model

In [1]:
import spacy
from spacy import displacy
import pandas as pd

nlp = spacy.load("en_core_web_sm")

### How to do dependency parse?

In [7]:
# to print children of a word

sent = "It was the best of times and it was the worst of times."
d = nlp(sent)
print(d[3], len(list(d[3].children)))

best 2


In [8]:
# another example
sent = "Dole was defeated by Clinton"
d = nlp(sent)
displacy.render(d, style='dep', jupyter=True)

In [2]:
active = ['Hens lay eggs.',
         'Birds build nests.',
         'The batter hit the ball.',
         'The computer transmitted a copy of the manual']
passive = ['Eggs are laid by hens',
           'Nests are built by birds',
           'The ball was hit by the batter',
           'A copy of the manual was transmitted by the computer.']

In [3]:
doc = nlp(active[0])
for tok in doc:
    print(tok.text,tok.dep_)

Hens nsubj
lay ROOT
eggs dobj
. punct


### visualize the parse

In [4]:
displacy.render(doc, style="dep", jupyter = True)

To understand what these dependency relationships one can use <a>https://universaldependencies.org/docs/en/dep/</a>


Going through the dependency relationships it looks like that one would need to know linguistics and grammar to be able to do analysis. This is not entirely true. Many times being able to find out patterns in terms of dependency relationships is enough to perform the task at hand¶


In [5]:
for sent in active:
    doc = nlp(sent)
    displacy.render(doc, style="dep")

In [6]:
for sent in passive:
    doc = nlp(sent)
    displacy.render(doc, style="dep")

## we can observe that the passive scentences always have nsubpass dependency

In [10]:
from spacy.matcher import Matcher

## read about matcher here: <a>https://spacy.io/api/matcher</a>

In [8]:
doc = nlp(passive[0])
displacy.render(doc, style="dep")

In [11]:
# creating rule with matcher
rule = [{'POS':'NOUN'}]
matcher = Matcher(nlp.vocab)
matcher.add('Rule',[rule])

In [12]:
matcher(doc)

[(15740618714089435985, 0, 1), (15740618714089435985, 4, 5)]

In [13]:
doc[0:1]

Eggs

In [15]:
doc[4:5]

hens

In [16]:
# creating rule for nsubjpass
passive_rule = [{'DEP':'nsubjpass'}]
matcher = Matcher(nlp.vocab)
matcher.add('Rule',[passive_rule])

In [20]:
print(active[0], "----", matcher(nlp(active[0])))
print(passive[0], "----", matcher(nlp(passive[0])))

Hens lay eggs. ---- []
Eggs are laid by hens ---- [(15740618714089435985, 0, 1)]


In [21]:
# fnictoin to chekc passive
def is_passive(doc,matcher):
    if len(matcher(doc))>0:
        return True
    else:
        return False

In [22]:
for sent in active:
    doc = nlp(sent)
    print(is_passive(doc,matcher))

False
False
False
False


In [23]:
for sent in passive:
    doc = nlp(sent)
    print(is_passive(doc,matcher))

True
True
True
True



## Summary

    One can go a long way by observing patterns in linguistic data, you don't always need to know the details of the linguitsics very well.
    Once can use the matcher object to find if certain linguistic patterns exist in data



In [24]:
rule = [{'DEP':'nsubjpass'}]
tmatcher = Matcher(nlp.vocab)
tmatcher.add('Rule',[rule])

doc = nlp('I am learning NLP from upGrad.')
tmatcher(doc)

[]