# Perspectives

This notebook is based on work by Neal Caren's ["Using Python to see how the Times writes about men and women"][1] as simplified by Bengfort, Bilbro, and Ojeda in _Applied Text Analysis with Python_ which has a [GitHub repo][2] for comparison both with Caren as well as with the current implementation.


[1]: https://nbviewer.jupyter.org/gist/nealcaren/5105037
[2]: https://github.com/foxbook/atap

In [6]:
import nltk, re, pandas as pd
from collections import Counter

In [7]:
df = pd.read_csv('../output/TEDall_speakers.csv')
print(list(df))

['Set', 'Talk_ID', 'public_url', 'headline', 'description', 'event', 'duration', 'published', 'tags', 'views', 'text', 'speaker_1', 'speaker1_occupation', 'speaker1_introduction', 'speaker1_profile', 'speaker_2', 'speaker2_occupation', 'speaker2_introduction', 'speaker2_profile', 'speaker_3', 'speaker3_occupation', 'speaker3_introduction', 'speaker3_profile', 'speaker_4', 'speaker4_occupation', 'speaker4_introduction', 'speaker4_profile']


In [8]:
texts = df.text.tolist()

## Defining the Perspective Functions

In [2]:
# Perspective Word Lists

FIRSTS = 'first person singular'
FIRSTP = 'first person plural'
SECOND = 'second person'
THIRD_F = 'third person feminine'
THIRD_M = 'third person masculine'
THIRD_P = 'third person plural'

first_singular_words = set([
    'i', 'me', 'my', 'mine'
])

first_plural_words = set([
    'we', 'us', 'our', 'ours'
])

second_words = set([
    'you', 'your', 'yours'
])

third_singular_m_words = set([
    'he', 'him', 'his'
])

third_singular_f_words = set([
    'she', 'her', 'hers'
])

third_plural_words = set([
    'they', 'them', 'their', 'theirs'
])

third_person_words = set(list(third_singular_f_words)+
                         list(third_singular_m_words)+
                         list(third_plural_words))

In [16]:
def perse_sentence(sentence_words):
    fps_length=len(first_singular_words.intersection(sentence_words))
    fpp_length=len(first_plural_words.intersection(sentence_words))
    sec_length=len(second_words.intersection(sentence_words))
    third_length=len(third_person_words.intersection(sentence_words))
    

    if fps_length > (fpp_length + sec_length + third_length):
        perspective = 'first person singular'
    elif fpp_length > (fps_length + sec_length + third_length): 
        perspective = 'first person plural'
    elif sec_length > (fps_length + fpp_length + third_length): 
        perspective = 'second person'
    elif third_length > (fps_length + fpp_length + sec_length): 
        perspective = 'third person'
    else:
        perspective = 'mixed'
    return perspective

In [4]:
def count_perspective(sentences):
    
    """ REQUIRES: from collections import Counter"""
        
    sents = Counter()
    words = Counter()
    
    for sentence in sentences:
        perspective = perse_sentence(sentence)
        sents[perspective] += 1
        words[perspective] +=len(sentence)
        
    return sents, words

In [13]:
def parper(text):
    
        """ REQUIRES: import nltk """
        
        sentences = [
            [word.lower() for word in nltk.word_tokenize(sentence)]
            for sentence in nltk.sent_tokenize(text)
        ]
        
        sents, words = count_perspective(sentences)
        total = sum(words.values())
        
        for perspective, count in words.items():
            pcent = (count / total) * 100
            nsents = sents[perspective]
            
            print(f"{pcent:.2f} {perspective} ( {nsents} ) ")

The example below simply establishes that gender is not straightforward: there is no guarantee that someone using gendered words, as constructed above, is actually talking about women.

In [9]:
text = "My dog Molly is a good girl."
words = re.sub("[^\w+]", " ", text.lower())
print(words)

my dog molly is a good girl 


In [14]:
parper(words)

100.00 first person singular ( 1 ) 


Al Gore's text is a bellweather because it sits at `0` in the list index. Here we mix it up with Majora Carter's talk on urban renewal:

In [17]:
parper(texts[2])

44.51 mixed ( 89 ) 
20.62 first person plural ( 36 ) 
10.87 third person ( 18 ) 
19.30 first person singular ( 37 ) 
4.71 second person ( 9 ) 
