# Contexts

This notebook allows us to examine the contexts in which words occur in sentences throughout the two subcorpora. Contexts vary according to the method used:

- Using spaCy we can explore contexts by specifying both a word and its part-of-speech.
- Using NLTK's `concordance` functionality we can explore all a word's contexts in the conventional KWiC format -- there is also code here for those interested in lemmatizing words before generating an NLTK `text`.
- Using the already-generated SVOs, we can quickly glimpse the related subjects, verbs, and objects for a particular word, though it has to be appear somewhere as an S, V, or O.

## spaCy

The code below uses spaCy's `child` functionality to determine what are the subjects of a sentence and then to return the sentences in which a particular subject appears. It could be adapted to a wide variety of uses. 

Development of this code was based on insights from this Stackoverflow thread: [How to get the dependency tree with spaCy?](https://stackoverflow.com/questions/36610179/how-to-get-the-dependency-tree-with-spacy) thread on Stack Overflow. 

In [5]:
# IMPORTS
import pandas as pd, spacy
# from spacy.lang.en import English

# Loading the Data in a gendered partitioned fashion: 
talks_m = pd.read_csv('../output/talks_male.csv', index_col='Talk_ID')
talks_f = pd.read_csv('../output/talks_female.csv', index_col='Talk_ID')

# And then grabbing on the texts of the talks:
texts_women = talks_f.text.tolist()
texts_men = talks_m.text.tolist()

# Lowercase everything before we create spaCy docs
texts_w = ' '.join([text.lower() for text in texts_women])
texts_m = ' '.join([text.lower() for text in texts_men])

In [1]:
nlp = spacy.load('en_core_web_sm')

# spaCy is fussy about memory allocation
# Use the pipe method to feed documents 
docs_w = nlp.pipe(texts_w)
# docs_m = list(nlp.pipe(texts_m))

In [4]:
# This function allows us to specify the subject of the sentence
# and to see all the sentences in which it appears as the subject.
def find_subject(subject, doc):
    subject_sents = []
    sentences = doc.sents
    for sentence in sentences:
        root_token = sentence.root
        for child in root_token.children:
            if child.dep_ == 'nsubj':
                subj = child
                if subj.text == subject:
                    subject_sents.append(sentence)
    return subject_sents

In [12]:
for doc in docs_w:
    find_subject("father", doc)

Not sure why **below** works and **above** does not.

In [10]:
doc_one = nlp(texts_women[1])

In [11]:
find_subject('father', doc_one)

[And my father smiled and said, "Well, you know what that means, don't you?"]

In [9]:
texts_women[1]

'  On September 10, the morning of my seventh birthday, I came downstairs to the kitchen, where my mother was washing the dishes and my father was reading the paper or something, and I sort of presented myself to them in the doorway, and they said, "Hey, happy birthday!" And I said, "I\'m seven." And my father smiled and said, "Well, you know what that means, don\'t you?" And I said, "Yeah, that I\'m going to have a party and a cake and get a lot of presents?" And my dad said, "Well, yes. But more importantly, being seven means that you\'ve reached the age of reason, and you\'re now capable of committing any and all sins against God and man."    (Laughter)    Now, I had heard this phrase, "age of reason," before. Sister Mary Kevin had been bandying it about my second-grade class at school. But when she said it, the phrase seemed all caught up in the excitement of preparations for our first communion and our first confession, and everybody knew that was really all about the white dress 

In [None]:
def find_root(subject, doc):
    subject_sents = []
    sentences = doc.sents
    for sentence in sentences:
        root_token = sentence.root
        for child in root_token.children:
            if child.dep_ == 'nsubj':
                subj = child
                if subj.text == subject:
                    subject_sents.append(sentence)
    return subject_sents

In [None]:
for sentence in sentences:
    print(sentence)

In [None]:
find_subject("i", docs_w[0])

## NLTK

To change display results, the contents of the concordance method are: `("word", window width, lines=#[25, all])`. The `window width` is an integer specifying the number of characters to the left and right of a word to display. The default for `lines` is 25, but it can be set to any integer or to `all` (no quotation marks).

In [6]:
import nltk
from nltk import word_tokenize

# Create NLTK texts for concordances
words_w = word_tokenize(" ".join(talks_f.text.tolist()))
women = nltk.Text(words_w)

words_m = word_tokenize(" ".join(talks_m.text.tolist()))
men = nltk.Text(words_m)

In [9]:
women.concordance('kill', lines=50)

Displaying 45 of 45 matches:
, because the Lamanites were able to kill all the Nephites . All but one guy ,
ifice , that 's revenge ! ] [ If you kill , there 's no difference between vic
now — I put this in today — he would kill me . But the thing was , my friends 
s public audience . And I started to kill my blog slowly . I 'm like , I do n'
 would you stroke my vagina , '' you kill the act right there . ( Laughter ) I
 thank you , '' you certainly do n't kill yourself or slip into a clinical dep
people who are rejected in love will kill for it . People live for love . They
for it . People live for love . They kill for love . They die for love . They 
ble . They want to commit suicide or kill somebody else . I would recommend it
ress the dopamine circuit , but they kill the sex drive . And when you kill th
ey kill the sex drive . And when you kill the sex drive , you kill orgasm . An
nd when you kill the sex drive , you kill orgasm . And when you kill orgasm , 
ive , you kill orgasm .

In [8]:
men.concordance("kill", lines=100)

Displaying 100 of 128 matches:
omebody else says , `` I 'm going to kill people to do it . '' They 're trying
s way ; shoot ; and you 're bound to kill something . So , this is the promise
hem might carry a disease that could kill you , for which you had no antiviral
iseases is to find them early and to kill them before they spread . So , my TE
ings tried , and when they crash and kill the pilot , do n't try that again . 
an delay technology , but you ca n't kill it . So this makes sense , because i
eart and blood vessel diseases still kill more people — not only in this count
 they discovered there was a plot to kill him , they dressed him up like a beg
n't , you know , get the army out to kill terrorists . So I 'm not really worr
s death genes that have to go in and kill cell reproduction . You know , that 
t 200 times the radiation that would kill you . And by now you should be getti
xample , which reserves the right to kill , the right to be duplicitous and so
on the question , `` 

### Lemmatized Concordance: a spaCy / NLTK hybrid

In [None]:
# IMPORTS
import nltk, pandas as pd, spacy

# Load the Space pipeline to be used
nlp = spacy.load('en_core_web_sm')

# Use the pipe method to feed documents 
docs_w = list(nlp.pipe(texts_w))
docs_m = list(nlp.pipe(texts_m))

In [13]:
def contextualize(spacy_docs):
    '''contextualize takes a list of spaCy docs
       and converts them to a single NLTK text'''
    all_lemmas = []
    # Grab the lemmas from each of the documents
    # and append to a list of all the lemmas
    for doc in spacy_docs:
        lemmas = [token.lemma_ for token in doc]
        all_lemmas.append(lemmas)
    # all_lemmas is a list of lists that needs to be flattened
    flattened = [item for sublist in all_lemmas for item in sublist]
    # all our texts are now one long list of words to be fed into NLTK Text
    contextualized = nltk.Text(flattened)
    return contextualized

In [None]:
women = contextualize(docs_w)
men = contextualize(docs_m)

## SVO

While this was at first appealing because of the simplicity and accuracy, since we would be loading the SVOs themselves, the resulting contexts were too impoverished to be much use for hand inspection.

In [2]:
# IMPORTS
import pandas as pd

# LOAD DATAFRAMES
# the `lem` suffix indicates the verbs have been lemmatized
svos_m = pd.read_csv("../output/svos_m_lem.csv", index_col=0)
svos_w = pd.read_csv("../output/svos_w_lem.csv", index_col=0)

In [10]:
svos_w.query(' subject=="he" & verb=="kill" ')

Unnamed: 0,subject,verb,object
917,he,kill,[me]
3820,he,kill,[that]
4877,he,kill,[hector]
5839,he,kill,[me]
11239,he,kill,[22]
11240,he,kill,[28]
