# Contexts

This notebook allows us to examine the contexts in which words occur in sentences throughout the two subcorpora. Contexts vary according to the method used:

- Using spaCy we can explore contexts by specifying both a word and its part-of-speech.
- Using NLTK's `concordance` functionality we can explore all a word's contexts in the conventional KWiC format -- there is also code here for those interested in lemmatizing words before generating an NLTK `text`.
- Using the already-generated SVOs, we can quickly glimpse the related subjects, verbs, and objects for a particular word, though it has to be appear somewhere as an S, V, or O.

## spaCy

The code below uses spaCy's `child` functionality to determine what are the subjects of a sentence and then to return the sentences in which a particular subject appears. It could be adapted to a wide variety of uses. 

Development of this code was based on insights from this Stackoverflow thread: [How to get the dependency tree with spaCy?](https://stackoverflow.com/questions/36610179/how-to-get-the-dependency-tree-with-spacy) thread on Stack Overflow. 

In [2]:
# IMPORTS
import pandas as pd, spacy
# from spacy.lang.en import English

nlp = spacy.load("en_core_web_sm")


# Loading the Data in a gendered partitioned fashion: 
talks_m = pd.read_csv('../output/talks_male.csv', index_col='Talk_ID')
talks_f = pd.read_csv('../output/talks_female.csv', index_col='Talk_ID')

# And then grabbing on the texts of the talks:
texts_women = talks_f.text.tolist()
texts_men = talks_m.text.tolist()

# Lowercase everything before we create spaCy docs
texts_w = ' '.join([text.lower() for text in texts_women])
texts_m = ' '.join([text.lower() for text in texts_men])

In [3]:
# spaCy is fussy about memory allocation
# Use the pipe method to feed documents 
docs_w = list(nlp.pipe(texts_w))
docs_m = list(nlp.pipe(texts_m))

In [4]:
def find_svpairs(text, noun, verb):
#     doc = nlp(text)
    matching_sentences = []
    for sent in doc.sents:
        verb_in_sentence = False
        for token in sent:
            if token.lemma_ == verb and token.pos_ == "VERB":
                verb_in_sentence = True
                break
        if noun in sent.text.split() and verb_in_sentence:
            matching_sentences.append(sent.text)
    return matching_sentences

In [6]:
noun = "he"
verb = "turn"

matches = []
for doc in docs_w:
    matching_sentences = find_svpairs(doc, noun, verb)
    matches.append(matching_sentences)
    
print(matches)

IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)



## NLTK

To change display results, the contents of the concordance method are: `("word", window width, lines=#[25, all])`. The `window width` is an integer specifying the number of characters to the left and right of a word to display. The default for `lines` is 25, but it can be set to any integer or to `all` (no quotation marks).

This just uses the NLTK's `concordance` method to find a word:

In [7]:
import nltk
from nltk import word_tokenize

# Create NLTK texts for concordances
words_w = word_tokenize(" ".join(talks_f.text.tolist()))
women = nltk.Text(words_w)

words_m = word_tokenize(" ".join(talks_m.text.tolist()))
men = nltk.Text(words_m)

In [8]:
women.concordance('turn', lines=50)

Displaying 50 of 105 matches:
 degrade under the sun , which we in turn breathe . Green roofs also retain up
at the age of reason starts when you turn seven , and then you 're capable of 
p ? ] [ Woman : Perhaps . We have to turn it into a moral war . ] [ Man : How 
y to harness this waste resource and turn it into a fuel that would be somethi
 who were angry , and anger that can turn to violence , and we 're all familia
ithecine would have walked . As they turn toward us you 'll see that the pelvi
 or not . And so therefore , this in turn meant that there was an extremely in
ts of tweaking and torquing , and we turn our puffer into the Mola . You know 
 words make that difficult left-hand turn into the dictionary , and keep the b
e 's another male on top waiting his turn . Often the queens mate more than on
t with a bit of sand , put it down , turn around , and go back in . And finall
oing to be the tool that 's going to turn our usage overnight . And what kind 
hat follows you around

In [9]:
men.concordance('turn', lines=50)

Displaying 50 of 388 matches:
ita instead of family income , and I turn these individual data into regional 
 sort of `` shotgun flexibility '' — turn your head this way ; shoot ; and you
rpose Driven Life . '' And I want to turn now , briefly , to talk about that b
XML is going to do is it 's going to turn those pages into Lego blocks . XML a
ng to aspire to . It 's to make them turn their back on what they think they l
 in Pennsylvania , and I took a left turn trying to get back to the highway . 
an appropriate part of the brain can turn off a brain disorder . And consider 
n actually change the world . We can turn the inevitable outcomes , and transf
f years ago . We 'd like to actually turn that program off . They tried that i
ersonalization . So , with that , in turn , 20 million dollars today does this
hin client computers . And then , in turn , businesses started to grow , like 
had to use it , channel its energy , turn it into something that would clarify
y moment , `` OK , the

This attempts to match on the noun and verb but it just treats them as two words in a sentence. 

WARNING: Tense and conjugation matter here.

In [10]:
# nltk.download('punkt')  # Download the necessary tokenizer data

def find_sentences(text, noun, verb):
    sentences = nltk.sent_tokenize(text)
    matching_sentences = []
    for sentence in sentences:
        if (noun in sentence.split()) and (verb in sentence.split()):
            matching_sentences.append(sentence)
    return matching_sentences

In [11]:
w_he_turn = find_sentences(texts_w, "he", "turn")

In [12]:
print(len(w_he_turn))
print(w_he_turn[0:20])

1
['and there\'d be that sound in the signal — it\'s like (screeching) — and he thought, "oh, what if i could control that sound and turn it into an instrument, because there are pitches in it."']


In [21]:
m_i_think = find_sentences(texts_m, "i", "think")

In [23]:
print(len(m_i_think))
print(m_i_think[0:20])

1914
["i don't think you understand.", 'i think your phone lines are unmanned.', 'and the truth is, for years i was a little depressed, because americans obviously did not value it, because the mac had three percent market share, windows had 95 percent market share — people did not think it was worth putting a price on it.', 'so i have a big interest in education, and i think we all do.', "and she's exceptional, but i think she's not, so to speak, exceptional in the whole of childhood.", 'we were sitting there and i think they just went out of sequence, because we talked to the little boy afterward and we said, "you ok with that?"', 'i think this is rather important.', 'i think math is very important, but so is dance.', "i think you'd have to conclude, if you look at the output, who really succeeds by this, who does everything that they should, who gets all the brownie points, who are the winners — i think you'd have to conclude the whole purpose of public education throughout the worl

This actually addresses the noun and verb as parts of speech:

In [7]:
# nltk.download('punkt')  # Download the necessary tokenizer data
# nltk.download('averaged_perceptron_tagger')  # Download 

def find_sents(text, noun, verb):
    sentences = nltk.sent_tokenize(text)
    matching_sentences = []
    for sentence in sentences:
        words = nltk.word_tokenize(sentence)
        pos_tags = nltk.pos_tag(words)
        verb_in_sentence = False
        for word, tag in pos_tags:
            if word == verb and tag.startswith('VB'):
                verb_in_sentence = True
                break
        if noun in sentence.split() and verb_in_sentence:
            matching_sentences.append(sentence)
    return matching_sentences

In [15]:
find_sents(texts_m, "she", "eat")

["because she can't believe i can't eat this penguin.",
 'so she asked, "eat what?']

## SVO

While this was at first appealing because of the simplicity and accuracy, since we would be loading the SVOs themselves, the resulting contexts were too impoverished to be much use for hand inspection.

In [13]:
# IMPORTS
import pandas as pd

# LOAD DATAFRAMES
# the `lem` suffix indicates the verbs have been lemmatized
svos_m = pd.read_csv("../output/svos_m_lem.csv", index_col=0)
svos_w = pd.read_csv("../output/svos_w_lem.csv", index_col=0)

In [14]:
svos_w.query(' subject=="he" & verb=="turn" ')

Unnamed: 0,doc,subject,verb,object
12790,834,he,turn,"[party, invitations]"
14539,852,he,turn,[flashlight]
16020,868,he,turn,"[sustainability, project]"
18894,898,he,turn,[18]
18904,898,he,turn,[11]
26071,973,he,turn,[you]
