# SVO Contexts

This notebook is intended primarily to "hand" inspect the contexts for results in other explorations. We are going to try two approaches:

- using the SVOs (This turns out to be rather underwhelming.)
- using the NLTK's concordance method

In the *SVO as Context* below, we end up loading only the mens subcorpus to explore how useful the complete SVOs would be. We don't take this exploration any further.

Afterwards, we turn to the texts themselves as context, loading the two gendered dataframes and then collecting the texts from each. We process the two lists of texts with spacy's NLP pipeline to produce featureful documents which make it easy to get the lemmas out. Using the spacy lemmatization parallels our usage in building the SVOs, so we are going to get the same results and thus enable us to explore the two subcorpora much more effectively. 

Finally, we take the texts as strings of lemmas and create a single NLTK Text, which allows us to use the NLTK's concordance functionality to see words in context. To do this, we create a function that takes spacy doc, lemmatizes the words within it, build a list of lemmas, compiles those lists into a list for the subcorpus, flattens that subcorpus list into a single, very long, string of tokens, from which an NLTK Text is created. We create two NLTK Texts: one for female speakers, `women`, and one for male speakers, `men`. 

**TO DO**: Find a way to save either the list of spacy docs or the NLTK Text. The spacy NLP pipeline takes time to run, and we shouldn't need to create space docs or NLTK Texts every time we want to use this notebook.

## SVOs as Context

While this was at first appealing because of the simplicity and accuracy, since we would be loading the SVOs themselves, the resulting contexts were too impoverished to be much use for hand inspection. Only the men's subcorpus is loaded here.

In [1]:
import csv, re
with open('../output/svos_m_lem.csv', newline='') as csvfile:
    reader = csv.reader(csvfile, delimiter=',', quotechar='|')
    # Drop the first row
    next(reader)
    # Skip the first column
    contexts_m = [row[1:4] for row in reader]

Each line in **contexts** is a list of three strings. The list comprehension below first joins all the items in each line into a single string, it then replaces any square brackets, that sometimes occur in the third item on each line, with nothing. It does this for all the lines in the list.

In [2]:
sentences = [re.sub("[\([{})\]]", "", ' '.join(item)) for item in contexts_m]
print(sentences[0:10])

['i blow conference', 'i want "to', 'i need that', 'laughter put yourselves', 'i fly two', 'i have "to', 'i tell story', 'i leave "the', 'i look me', 'it hit me']


## Fuller Contexts

In [3]:
# IMPORTS
import nltk, pandas as pd, re, spacy

# from nltk.stem import WordNetLemmatizer
# wnl = WordNetLemmatizer()

# Loading the Data in a gendered partitioned fashion: 
talks_m = pd.read_csv('talks_male.csv', index_col='Talk_ID')
talks_f = pd.read_csv('talks_female.csv', index_col='Talk_ID')

# And then grabbing the texts of the talks:
texts_w = [text.lower() for text in talks_f.text.tolist()]
texts_m = [text.lower() for text in talks_m.text.tolist()]

The cell below takes the most time to run. Finding a way to save either the spacy docs or the resulting NLTK text object so that we could retrieve them for exploration would be really useful.

In [4]:
# Load the Space pipeline to be used
nlp = spacy.load('en_core_web_sm')

# Use the pipe method to feed documents 
docs_w = list(nlp.pipe(texts_w))
docs_m = list(nlp.pipe(texts_m))

In [5]:
def contextualize(spacy_docs):
    '''contextualize takes a list of spaCy docs
       and converts them to a single NLTK text'''
    all_lemmas = []
    # Grab the lemmas from each of the documents
    # and append to a list of all the lemmas
    for doc in spacy_docs:
        lemmas = [token.lemma_ for token in doc]
        all_lemmas.append(lemmas)
    # all_lemmas is a list of lists that needs to be flattened
    flattened = [item for sublist in all_lemmas for item in sublist]
    # all our texts are now one long list of words to be fed into NLTK Text
    contextualized = nltk.Text(flattened)
    return contextualized

In [7]:
women = contextualize(docs_w)
men = contextualize(docs_m)

## Save / Load NLTK Texts

In [13]:
import dill
dill.dump(women, file = open("contexts_w.pickle", "wb"))
dill.dump(men, file = open("contexts_m.pickle", "wb"))

In [1]:
# To pickup where we left off without having to re-run everything:
import dill, nltk
women = dill.load(open("contexts_w.pickle", "rb"))
men = dill.load(open("contexts_m.pickle", "rb"))

## Words in Context

In [2]:
women.concordance('love')

Displaying 25 of 579 matches:
nd luckily for I , that home and the love inside of it , along with help from 
hat ruth build by destroy two well - love community park . now , we 'll have e
they say , " do you believe that god love you with all his heart ? " and I thi
k in my head : do I believe that god love I with all his heart ? because I be 
 have ask I , " do you feel that god love you with all his heart ? " well , th
feel it all the time . I feel god 's love when I be hurt and confuse , and I f
 care for . I take shelter in god 's love when I do not understand why tragedy
 why tragedy hit , and I feel god 's love when I look with gratitude at all th
 it bring out the complexity of this love - hate relationship that the arab wo
er )     ( applause )     design — I love its design . I remember when I be li
e same model .     entertainment — I love the entertainment . but actually , t
go , you could not stop they . woman love to talk about their vagina , they do
, and great sex life ,

In [3]:
men.concordance("love")

Displaying 25 of 798 matches:
ou who be good at branding , I would love to get your advice and help on how t
 a technology nut , and I absolutely love it . the job , though , come with on
   I hear a great story recently — I love tell it — of a little girl who be in
not want to come to los angeles . he love it , but he have a girlfriend in eng
 girlfriend in england . this be the love of his life , sarah . he would know 
ut the good computer . you give they love , joy , be there to comfort they . t
the rest of their life with all this love , education , money and background g
ere to go to work , and you meet the love of your life there , a career decisi
ople , to build , create something , love someone . there be the fuel you pick
need surprise . how many of you here love surprise ? say , " aye . "     audie
what we really need : connection and love , fourth need . we all want it ; mos
nt it ; most settle for connection , love 's too scary . who here have be hurt
 money or friend you h

To change display results, the contents of the concordance method are: `("word", window width, lines=#[25, all])`. The `window width` is an integer specifying the number of characters to the left and right of a word to display. The default for `lines` is 25, but it can be set to any integer or to `all` (no quotation marks).