# NLP (Natural Language Processing) library testing

### Step 1 : Importing and initial test

We found many NLP recommendations for spacy and NLTK packages, which both offer pre-trained sentiment analysis.

Here, we will test both to compare them on a few quotes we, sor far, extracted.

An easy NLP test to start :

In [90]:
import spacy
import nltk

from spacytextblob.spacytextblob import SpacyTextBlob
nlp = spacy.load('en_core_web_sm')
nlp.add_pipe("spacytextblob")
text = "I had the worst day ever"  
print('Spacy:', round(nlp(text)._.polarity,4))

from nltk.sentiment import SentimentIntensityAnalyzer
sia = SentimentIntensityAnalyzer()
print('NLTK :', sia.polarity_scores(text)["compound"])

Spacy: -1.0
NLTK : -0.6249


### Step 2 : Trying interesting quotes :

1) "There were protections in executive orders that beneficiaries of grantees and contractors were not to be discriminated against on the basis of sexual orientation and gender identity."

Our analysis : neutral-positive

Justification : No real opinion on the topic, this is just a "legal" statement.
The analysis could be > 0 as it says there is no discrimination (so no bad thing).

In [91]:
text = "there were protections in executive orders that beneficiaries of grantees and contractors were not to be discriminated against on the basis of sexual orientation and gender identity." 
print('Spacy:', round(nlp(text)._.polarity,4))
print('NLTK :', sia.polarity_scores(text)["compound"])

Spacy: 0.5
NLTK : 0.4215


2) "To claim that homosexual behavior is wrong would be to hold others to a moral standard to which one's own heterosexual behavior does not conform. Whether bi -, homo -, hetero -, all forms of hyphenated sexuality want the same thing: sex without moral or generative limits, relationships without cultural or familial constraints. We are in flight from sexuality and we are using sex as the vehicle for that flight."

Our : no real opinion

Justification : The author's thoughts are actually not easily deductible for us...

In [92]:
text = "To claim that homosexual behavior is wrong would be to hold others to a moral standard to which one's own heterosexual behavior does not conform. Whether bi -, homo -, hetero -, all forms of hyphenated sexuality want the same thing: sex without moral or generative limits, relationships without cultural or familial constraints. We are in flight from sexuality and we are using sex as the vehicle for that flight." 
print('Spacy:', round(nlp(text)._.polarity,4))
print('NLTK :', sia.polarity_scores(text)["compound"])

Spacy: 0.0286
NLTK : -0.4215


3) "Wasn't sure if homosexuality was a choice."

Our analysis : no real opinion

Justification : Hard to say, could be "not a choice" as the author doesn't figure homosexuality as something possible for him/her, or the opposite

In [93]:
text = "wasn't sure if homosexuality was a choice." 
print('Spacy:', round(nlp(text)._.polarity,4)) 
print('NLTK :', sia.polarity_scores(text)["compound"])

Spacy: 0.5
NLTK : -0.2411


4) "The misogyny and the racism, those two key facts are something you can't really ignore"

Our analysis : neutral - hard to say

Justification : The sentence objectively only says misoginy and racism are topics of interest.
The part of subjectivity is large : it all depends on which sentence came after !

In [94]:
text = "The misogyny and the racism, those two key facts are something you can't really ignore" 
print('Spacy:', round(nlp(text)._.polarity,4))
print('NLTK :', sia.polarity_scores(text)["compound"])

Spacy: 0.1
NLTK : -0.4163


5) "I just can't remember when LGBT people were not in my life. You know, gosh. My piano teachers when I was 11 and 12 were two gay men in a little town in New Jersey who had a collection of Mexican art and pinatas and silver lantern covers, and their house was wonderful, not like anybody else's house in Berkeley Heights, New Jersey"

Our analysis : positive

Justification : The opinion is overall good : good relations with LGBT people.

In [95]:
text = "I just can't remember when LGBT people were not in my life. You know, gosh. My piano teachers when I was 11 and 12 were two gay men in a little town in New Jersey who had a collection of Mexican art and pinatas and silver lantern covers, and their house was wonderful, not like anybody else's house in Berkeley Heights, New Jersey," 
print('Spacy:', round(nlp(text)._.polarity,4))
print('NLTK :', sia.polarity_scores(text)["compound"])

Spacy: 0.2503
NLTK : 0.3798


6) "I am grateful to this incredible organization for what you've done, in such a smart, systematic, and strategic way, to secure and safeguard the fundamental rights of LGBTQ Americans. Much of the credit for the advances in acceptance, advocacy, and law comes in a straight line from your efforts"

Our analysis : positive

Justification : Very grateful for progress regarding LGBTQ cause

In [96]:
text = "I am grateful to this incredible organization for what you've done, in such a smart, systematic, and strategic way, to secure and safeguard the fundamental rights of LGBTQ Americans. Much of the credit for the advances in acceptance, advocacy, and law comes in a straight line from your efforts" 
print('Spacy:', round(nlp(text)._.polarity,4)) 
print('NLTK :', sia.polarity_scores(text)["compound"])

Spacy: 0.319
NLTK : 0.9451


### Step 3 : Some final raw comparisons - pushing libraries towards their limits

In [97]:
text = "good" 
print(text)
print('Spacy:', round(nlp(text)._.polarity,4)) 
print('NLTK :', sia.polarity_scores(text)["compound"])
print()
text = "bad" 
print(text)
print('Spacy:', round(nlp(text)._.polarity,4)) 
print('NLTK :', sia.polarity_scores(text)["compound"])
print()
text = "adore" 
print(text)
print('Spacy:', round(nlp(text)._.polarity,4)) 
print('NLTK :', sia.polarity_scores(text)["compound"])
print()
text = "hate" 
print(text)
print('Spacy:', round(nlp(text)._.polarity,4)) 
print('NLTK :', sia.polarity_scores(text)["compound"])
print()
text = "gay" 
print(text)
print('Spacy:', round(nlp(text)._.polarity,4)) 
print('NLTK :', sia.polarity_scores(text)["compound"])
print()
text = "gay marriage" 
print(text)
print('Spacy:', round(nlp(text)._.polarity,4)) 
print('NLTK :', sia.polarity_scores(text)["compound"])
print()
text = "LGBT" 
print(text)
print('Spacy:', round(nlp(text)._.polarity,4)) 
print('NLTK :', sia.polarity_scores(text)["compound"])
print()
text = "feminism" 
print(text)
print('Spacy:', round(nlp(text)._.polarity,4)) 
print('NLTK :', sia.polarity_scores(text)["compound"])

good
Spacy: 0.7
NLTK : 0.4404

bad
Spacy: -0.7
NLTK : -0.5423

adore
Spacy: 0.0
NLTK : 0.5574

hate
Spacy: -0.8
NLTK : -0.5719

gay
Spacy: 0.4167
NLTK : 0.0

gay marriage
Spacy: 0.4167
NLTK : 0.0

LGBT
Spacy: 0.0
NLTK : 0.0

feminism
Spacy: 0.0
NLTK : 0.0


### Step 4 : Making conclusions

For all the examples we considered as "easy", that is, **1)** and the start of **Step 3**, we are a bit more satisfied with Spacy's results that are, with some logic, more interpretable and logic.

Results from **2)**, **3)** and **4)** are hardly interpretable as these quotes are more subjective to us, and the one from **1)** and **5)** nearly matches for both libraries.


However, the 1-word examples are quite far from quotes we expect to read in Quotebank, as they are properly not real sentences.
In this purpose, the result from test **6)** also shows us NLTK's result is closer to our expectations.

We are also happy to see both Spacy and NLTK give a 0-score for LGBT and feminism keywords, but disappointed with the "gay" result from Spacy, which shows it probably interpreted this word as being related to an emotion, which is a threat in our specific context.

We will use **NLTK** for the rest of our project.