# NLTK Contexts
Using NLTK's `concordance` functionality we can explore all a word's contexts in the conventional KWiC format -- there is also code here for those interested in lemmatizing words before generating an NLTK `text`.

**Nota bene**: *To change display results, the contents of the concordance method are: `("word", window width, lines=#[25, all])`. The `window width` is an integer specifying the number of characters to the left and right of a word to display. The default for `lines` is 25, but it can be set to any integer or to `all` (no quotation marks).*

In [1]:
import nltk, pandas as pd

# Load from the gendered corpora
talks_m = pd.read_csv('../output/talks_male.csv', index_col='Talk_ID')
talks_f = pd.read_csv('../output/talks_female.csv', index_col='Talk_ID')
talks_nog = pd.read_csv('../output/talks_nog.csv', index_col='Talk_ID')

# Create gendered NLTK "Texts"
words_w = nltk.word_tokenize(" ".join(talks_f.text.tolist()))
women = nltk.Text(words_w)
words_m = nltk.word_tokenize(" ".join(talks_m.text.tolist()))
men = nltk.Text(words_m)

In [4]:
women.concordance('kill', lines=50)

Displaying 45 of 45 matches:
, because the Lamanites were able to kill all the Nephites . All but one guy ,
ifice , that 's revenge ! ] [ If you kill , there 's no difference between vic
now — I put this in today — he would kill me . But the thing was , my friends 
s public audience . And I started to kill my blog slowly . I 'm like , I do n'
 would you stroke my vagina , '' you kill the act right there . ( Laughter ) I
 thank you , '' you certainly do n't kill yourself or slip into a clinical dep
people who are rejected in love will kill for it . People live for love . They
for it . People live for love . They kill for love . They die for love . They 
ble . They want to commit suicide or kill somebody else . I would recommend it
ress the dopamine circuit , but they kill the sex drive . And when you kill th
ey kill the sex drive . And when you kill the sex drive , you kill orgasm . An
nd when you kill the sex drive , you kill orgasm . And when you kill orgasm , 
ive , you kill orgasm .

In [6]:
men.concordance('kill', lines=50)

Displaying 50 of 128 matches:
omebody else says , `` I 'm going to kill people to do it . '' They 're trying
s way ; shoot ; and you 're bound to kill something . So , this is the promise
hem might carry a disease that could kill you , for which you had no antiviral
iseases is to find them early and to kill them before they spread . So , my TE
ings tried , and when they crash and kill the pilot , do n't try that again . 
an delay technology , but you ca n't kill it . So this makes sense , because i
eart and blood vessel diseases still kill more people — not only in this count
 they discovered there was a plot to kill him , they dressed him up like a beg
n't , you know , get the army out to kill terrorists . So I 'm not really worr
s death genes that have to go in and kill cell reproduction . You know , that 
t 200 times the radiation that would kill you . And by now you should be getti
xample , which reserves the right to kill , the right to be duplicitous and so
on the question , `` D

This attempts to match on the noun and verb but it just treats them as two words in a sentence. 

WARNING: Tense and conjugation matter here.

In [10]:
# nltk.download('punkt')  # Download the necessary tokenizer data

def find_sentences(text, noun, verb):
    sentences = nltk.sent_tokenize(text)
    matching_sentences = []
    for sentence in sentences:
        if (noun in sentence.split()) and (verb in sentence.split()):
            matching_sentences.append(sentence)
    return matching_sentences

In [11]:
w_he_turn = find_sentences(texts_w, "he", "turn")

In [12]:
print(len(w_he_turn))
print(w_he_turn[0:20])

1
['and there\'d be that sound in the signal — it\'s like (screeching) — and he thought, "oh, what if i could control that sound and turn it into an instrument, because there are pitches in it."']


In [21]:
m_i_think = find_sentences(texts_m, "i", "think")

In [23]:
print(len(m_i_think))
print(m_i_think[0:20])

1914
["i don't think you understand.", 'i think your phone lines are unmanned.', 'and the truth is, for years i was a little depressed, because americans obviously did not value it, because the mac had three percent market share, windows had 95 percent market share — people did not think it was worth putting a price on it.', 'so i have a big interest in education, and i think we all do.', "and she's exceptional, but i think she's not, so to speak, exceptional in the whole of childhood.", 'we were sitting there and i think they just went out of sequence, because we talked to the little boy afterward and we said, "you ok with that?"', 'i think this is rather important.', 'i think math is very important, but so is dance.', "i think you'd have to conclude, if you look at the output, who really succeeds by this, who does everything that they should, who gets all the brownie points, who are the winners — i think you'd have to conclude the whole purpose of public education throughout the worl

This actually addresses the noun and verb as parts of speech:

In [7]:
# nltk.download('punkt')  # Download the necessary tokenizer data
# nltk.download('averaged_perceptron_tagger')  # Download 

def find_sents(text, noun, verb):
    sentences = nltk.sent_tokenize(text)
    matching_sentences = []
    for sentence in sentences:
        words = nltk.word_tokenize(sentence)
        pos_tags = nltk.pos_tag(words)
        verb_in_sentence = False
        for word, tag in pos_tags:
            if word == verb and tag.startswith('VB'):
                verb_in_sentence = True
                break
        if noun in sentence.split() and verb_in_sentence:
            matching_sentences.append(sentence)
    return matching_sentences

In [15]:
find_sents(texts_m, "she", "eat")

["because she can't believe i can't eat this penguin.",
 'so she asked, "eat what?']