# Contexts

This notebook allows us to examine the contexts in which words occur in sentences throughout the two subcorpora. Contexts vary according to the method used:

- Using spaCy we can explore contexts by specifying both a word and its part-of-speech.
- Using NLTK's `concordance` functionality we can explore all a word's contexts in the conventional KWiC format -- there is also code here for those interested in lemmatizing words before generating an NLTK `text`.
- Using the already-generated SVOs, we can quickly glimpse the related subjects, verbs, and objects for a particular word, though it has to be appear somewhere as an S, V, or O.

## spaCy

The code below uses spaCy's `child` functionality to determine what are the subjects of a sentence and then to return the sentences in which a particular subject appears. It could be adapted to a wide variety of uses. 

Development of this code was based on insights from this Stackoverflow thread: [How to get the dependency tree with spaCy?](https://stackoverflow.com/questions/36610179/how-to-get-the-dependency-tree-with-spacy) thread on Stack Overflow. 

In [1]:
# IMPORTS
import pandas as pd, spacy
# from spacy.lang.en import English

nlp = spacy.load("en_core_web_sm")


# Loading the Data in a gendered partitioned fashion: 
talks_m = pd.read_csv('../output/talks_male.csv', index_col='Talk_ID')
talks_f = pd.read_csv('../output/talks_female.csv', index_col='Talk_ID')

# And then grabbing on the texts of the talks:
texts_women = talks_f.text.tolist()
texts_men = talks_m.text.tolist()

# Lowercase everything before we create spaCy docs
texts_w = ' '.join([text.lower() for text in texts_women])
texts_m = ' '.join([text.lower() for text in texts_men])

In [None]:
# spaCy is fussy about memory allocation
# Use the pipe method to feed documents 
docs_w = list(nlp.pipe(texts_w))
# docs_m = list(nlp.pipe(texts_m))

In [None]:
def find_svpairs(text, noun, verb):
#     doc = nlp(text)
    matching_sentences = []
    for sent in doc.sents:
        verb_in_sentence = False
        for token in sent:
            if token.lemma_ == verb and token.pos_ == "VERB":
                verb_in_sentence = True
                break
        if noun in sent.text.split() and verb_in_sentence:
            matching_sentences.append(sent.text)
    return matching_sentences

In [None]:
noun = "he"
verb = "kill"

matches = []
for doc in docs_w:
    matching_sentences = find_sentences(doc, noun, verb)
    matches.append(matching_sentences)
    
print(matches)

## NLTK

To change display results, the contents of the concordance method are: `("word", window width, lines=#[25, all])`. The `window width` is an integer specifying the number of characters to the left and right of a word to display. The default for `lines` is 25, but it can be set to any integer or to `all` (no quotation marks).

This just uses the NLTK's `concordance` method to find a word:

In [None]:
import nltk
from nltk import word_tokenize

# Create NLTK texts for concordances
words_w = word_tokenize(" ".join(talks_f.text.tolist()))
women = nltk.Text(words_w)

words_m = word_tokenize(" ".join(talks_m.text.tolist()))
men = nltk.Text(words_m)

In [19]:
women.concordance('think', lines=50)

Displaying 50 of 1239 matches:
well as local economic development . Think bike shops , juice stands . We secur
oth hotbeds of cultural innovation : think hip-hop and jazz . Both are waterfro
ponse was a grant program . I do n't think he understood that I was n't asking 
dad said , `` Yeah , but , honey , I think that 's technically just between Tha
ad another child to come , so what I think she — understandably — really meant 
oked really happy , because I do n't think this happens to them all that often 
come back and check in on me , and I think I said something like , `` Please do
the son of God , '' I mean , I would think that 's equally ridiculous . I 'm ju
t would have been much different , I think I would have instantly answered , ``
t . I ca n't help but this wish : to think about when you 're a little kid , an
ze . '' ( Laughter ) But I really do think it makes sense . And I think that th
ally do think it makes sense . And I think that the first step to world peace i
ow to do 

In [18]:
men.concordance('eat', lines=130)

Displaying 130 of 155 matches:
and we started looking for a place to eat . We were on I-40 . We got to Exit 23
d pretty much do anything , including eat , yell , play chess and so forth . No
llion copies of this book . Al Gore , eat your heart out . ( Laughter ) Just ex
ers , you cut down the clinic and you eat it . It 's an eat-your-own-clinic . S
e way to find out what people want to eat , what will make people happy , is to
he same idea . It turns out people do eat hamburgers , and Ray Kroc , for a whi
curring , that people are starting to eat like us , and live like us , and die 
ut the people in Asia are starting to eat like we are , which is why they 're s
and hip and crunchy and convenient to eat healthier foods , like — I chair the 
ca because countries are beginning to eat like us , live like us and die like u
 , when you 're done , well — you can eat it . But the thing that I 'm really ,
 snatch a motorist off the street and eat his liver . So every effort is made t
 . It has

This attempts to match on the noun and verb but it just treats them as two words in a sentence. 

WARNING: Tense and conjugation matter here.

In [5]:
# nltk.download('punkt')  # Download the necessary tokenizer data

def find_sentences(text, noun, verb):
    sentences = nltk.sent_tokenize(text)
    matching_sentences = []
    for sentence in sentences:
        if (noun in sentence.split()) and (verb in sentence.split()):
            matching_sentences.append(sentence)
    return matching_sentences

In [22]:
w_i_think = find_sentences(texts_w, "i", "think")

In [24]:
print(len(w_i_think))
print(w_i_think[0:20])

633
["i don't think he understood that i wasn't asking for funding.", 'and my dad said, "yeah, but, honey, i think that\'s technically just between thanksgiving and christmas."', 'i thought about it, and when i was four, i was already the oldest of four children, and my mother even had another child to come, so what i think she — understandably — really meant was that she was so ready, she was so ready.', "and they looked really happy, because i don't think this happens to them all that often.", 'then they gave me this book of mormon, told me to read this chapter and that chapter, and said they\'d come back and check in on me, and i think i said something like, "please don\'t hurry," or maybe just, "please don\'t," and they were gone.', '(laughter)    "and she had a baby, and that\'s the son of god," i mean, i would think that\'s equally ridiculous.', 'well, that would have been much different, i think i would have instantly answered, "yes, yes, i feel it all the time.', 'i can\'t help

In [21]:
m_i_think = find_sentences(texts_m, "i", "think")

In [23]:
print(len(m_i_think))
print(m_i_think[0:20])

1914
["i don't think you understand.", 'i think your phone lines are unmanned.', 'and the truth is, for years i was a little depressed, because americans obviously did not value it, because the mac had three percent market share, windows had 95 percent market share — people did not think it was worth putting a price on it.', 'so i have a big interest in education, and i think we all do.', "and she's exceptional, but i think she's not, so to speak, exceptional in the whole of childhood.", 'we were sitting there and i think they just went out of sequence, because we talked to the little boy afterward and we said, "you ok with that?"', 'i think this is rather important.', 'i think math is very important, but so is dance.', "i think you'd have to conclude, if you look at the output, who really succeeds by this, who does everything that they should, who gets all the brownie points, who are the winners — i think you'd have to conclude the whole purpose of public education throughout the worl

This actually addresses the noun and verb as parts of speech:

In [7]:
# nltk.download('punkt')  # Download the necessary tokenizer data
# nltk.download('averaged_perceptron_tagger')  # Download 

def find_sents(text, noun, verb):
    sentences = nltk.sent_tokenize(text)
    matching_sentences = []
    for sentence in sentences:
        words = nltk.word_tokenize(sentence)
        pos_tags = nltk.pos_tag(words)
        verb_in_sentence = False
        for word, tag in pos_tags:
            if word == verb and tag.startswith('VB'):
                verb_in_sentence = True
                break
        if noun in sentence.split() and verb_in_sentence:
            matching_sentences.append(sentence)
    return matching_sentences

In [15]:
find_sents(texts_m, "she", "eat")

["because she can't believe i can't eat this penguin.",
 'so she asked, "eat what?']

## SVO

While this was at first appealing because of the simplicity and accuracy, since we would be loading the SVOs themselves, the resulting contexts were too impoverished to be much use for hand inspection.

In [None]:
# IMPORTS
import pandas as pd

# LOAD DATAFRAMES
# the `lem` suffix indicates the verbs have been lemmatized
svos_m = pd.read_csv("../output/svos_m_lem.csv", index_col=0)
svos_w = pd.read_csv("../output/svos_w_lem.csv", index_col=0)

In [None]:
svos_w.query(' subject=="he" & verb=="kill" ')