# Tense

After we load our modules and data, creating a list from the text column so that we can work with subsets of texts, we explore part-of-speech tagging and how to focus on the verbs and their tenses and then count those.

In [1]:
import nltk, re, pandas as pd
from collections import Counter

In [2]:
df = pd.read_csv('../output/TEDall_speakers.csv')
texts = df.text.tolist()

## Working through POS-tagging

The function below is just so we can pull some sentences from various TED talks in order to test our POS-tagging script as we develop it.

In [3]:
def sentience (the_string):
    sentences = [
            [word.lower() for word in nltk.word_tokenize(sentence)]
            for sentence in nltk.sent_tokenize(the_string)
        ]
    return sentences

The first thing we are going to do is grab two sentences from a talk and use them as our `test`. There has to be a more elegant, or even pythonic, way to do this, but nothing I tried with nested list comprehensions of itertool's `chain` worked. This did:

In [42]:
test = []
for i in range(len(text2[0:2])): #Traversing through the main list
    for j in range (len(text2[i])): #Traversing through each sublist
        test.append(text2[i][j]) 

In [43]:
print(test)

['if', 'you', "'re", 'here', 'today', '—', 'and', 'i', "'m", 'very', 'happy', 'that', 'you', 'are', '—', 'you', "'ve", 'all', 'heard', 'about', 'how', 'sustainable', 'development', 'will', 'save', 'us', 'from', 'ourselves', '.', 'however', ',', 'when', 'we', "'re", 'not', 'at', 'ted', ',', 'we', 'are', 'often', 'told', 'that', 'a', 'real', 'sustainability', 'policy', 'agenda', 'is', 'just', 'not', 'feasible', ',', 'especially', 'in', 'large', 'urban', 'areas', 'like', 'new', 'york', 'city', '.']


In [45]:
print(nltk.pos_tag(test))

[('if', 'IN'), ('you', 'PRP'), ("'re", 'VBP'), ('here', 'RB'), ('today', 'NN'), ('—', 'NNP'), ('and', 'CC'), ('i', 'JJ'), ("'m", 'VBP'), ('very', 'RB'), ('happy', 'JJ'), ('that', 'IN'), ('you', 'PRP'), ('are', 'VBP'), ('—', 'JJ'), ('you', 'PRP'), ("'ve", 'VBP'), ('all', 'DT'), ('heard', 'NN'), ('about', 'IN'), ('how', 'WRB'), ('sustainable', 'JJ'), ('development', 'NN'), ('will', 'MD'), ('save', 'VB'), ('us', 'PRP'), ('from', 'IN'), ('ourselves', 'PRP'), ('.', '.'), ('however', 'RB'), (',', ','), ('when', 'WRB'), ('we', 'PRP'), ("'re", 'VBP'), ('not', 'RB'), ('at', 'IN'), ('ted', 'VBN'), (',', ','), ('we', 'PRP'), ('are', 'VBP'), ('often', 'RB'), ('told', 'VBN'), ('that', 'IN'), ('a', 'DT'), ('real', 'JJ'), ('sustainability', 'NN'), ('policy', 'NN'), ('agenda', 'NN'), ('is', 'VBZ'), ('just', 'RB'), ('not', 'RB'), ('feasible', 'JJ'), (',', ','), ('especially', 'RB'), ('in', 'IN'), ('large', 'JJ'), ('urban', 'JJ'), ('areas', 'NNS'), ('like', 'IN'), ('new', 'JJ'), ('york', 'NN'), ('city',

## A Toy Example

In [50]:
tenses = '''
Molly wanted to leave.
Molly wants to leave.
Molly will want to leave.
'''
print(nltk.pos_tag([word.lower() for word in nltk.word_tokenize(tenses)]))

[('molly', 'RB'), ('wanted', 'VBD'), ('to', 'TO'), ('leave', 'VB'), ('.', '.'), ('molly', 'RB'), ('wants', 'VBZ'), ('to', 'TO'), ('leave', 'VB'), ('.', '.'), ('molly', 'RB'), ('will', 'MD'), ('want', 'VB'), ('to', 'TO'), ('leave', 'VB'), ('.', '.')]


The only thing in which we are interested in the current moment is the tense of the main verb. NLTK has the following possibilities:


| Tag | Object                          | Example |
|--:--|:--------------------------------|:--------|
| VB  | verb, base form                 | take    |
| VBD | verb, past tense                | took    |
| VBG | verb, gerund/present participle | taking  |
| VBN | verb, past participle           | taken   |
| VBP | verb, sing. present, non-3d     | take    |
| VBZ | verb, 3rd person sing. present  | takes   |

We are looking for: present tense verbs (VB, VBP, VBZ) and past tense verbs (VBD). We will ignore the complexities introduced via participles. And we will have to explore how to track future (will) and, possibly, subjunctive tenses (would, could, should). 

So, what the NLTK POS-tagger returns is a list of tuples with punctuation still in place, so we need to find the tuples with the form `('word', 'tag')` where `tag == ` one of our four possibilities (VB, VBP, VBZ for present tense and VBD for past tense)

In [None]:
def sentenser(sentence_words):
    fps_length=len(first_singular_words.intersection(sentence_words))
    fpp_length=len(first_plural_words.intersection(sentence_words))
    sec_length=len(second_words.intersection(sentence_words))
    third_length=len(third_person_words.intersection(sentence_words))
    

    if fps_length > (fpp_length + sec_length + third_length):
        perspective = 'first person singular'
    elif fpp_length > (fps_length + sec_length + third_length): 
        perspective = 'first person plural'
    elif sec_length > (fps_length + fpp_length + third_length): 
        perspective = 'second person'
    elif third_length > (fps_length + fpp_length + sec_length): 
        perspective = 'third person'
    else:
        perspective = 'mixed'
    return perspective

In [None]:
def count_tense(sentences):
    
    """ REQUIRES: from collections import Counter"""
        
    sents = Counter()
    words = Counter()
    
    for sentence in sentences:
        perspective = perse_sentence(sentence)
        sents[perspective] += 1
        words[perspective] +=len(sentence)
        
    return sents, words

In [None]:
def parper(text):
    
        """ REQUIRES: import nltk """
        
        sentences = [
            [word.lower() for word in nltk.word_tokenize(sentence)]
            for sentence in nltk.sent_tokenize(text)
        ]
        
        sents, words = count_perspective(sentences)
        total = sum(words.values())
        
        for perspective, count in words.items():
            pcent = (count / total) * 100
            nsents = sents[perspective]
            
            print(f"{pcent:.2f} {perspective} ( {nsents} ) ")

The example below simply establishes that gender is not straightforward: there is no guarantee that someone using gendered words, as constructed above, is actually talking about women.

In [None]:
parper(words)

Al Gore's text is a bellweather because it sits at `0` in the list index. Here we mix it up with Majora Carter's talk on urban renewal:

In [None]:
parper(texts[2])