# Text Sentiment Analysis
The NLTK libraries include a few packages to help solve the issues we experienced in the gender classifier model. 

* First is the SentimentAnalyzer module, which allows you to include additional features using built-in functions.
* The second is called VADER, which stands for Valence Aware Dictionary and Sentiment Reasoner

In [1]:
# Warnings
import warnings
warnings.filterwarnings('ignore')

# BEGIN: fix Python or Notebook SSL CERTIFICATE_VERIFY_FAILED
import os, ssl
if (not os.environ.get('PYTHONHTTPSVERIFY', '') and getattr(ssl, '_create_unverified_context', None)):
    ssl._create_default_https_context = ssl._create_unverified_context
# END: fix Python or Notebook SSL CERTIFICATE_VERIFY_FAILED

In [2]:
!pip -q install -U setuptools wheel spacy nltk twython pandas numpy beautifulsoup4 html2text



In [3]:
# Installing Spacy NLP english models. More information at https://spacy.io/models/en
!pip -q install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.0.0/en_core_web_sm-3.0.0.tar.gz



## Read website content

In [2]:
import urllib as url
import bs4 as bs

from IPython.display import HTML, display

In [3]:
content = "https://www.scu.edu/ethics-in-technology-practice/ethical-toolkit/"
article_html = url.request.urlopen(content)
article_html = article_html.read()

type(article_html), len(article_html)
# article

(bytes, 85614)

In [4]:
import html2text

html_2_text = html2text.HTML2Text()
html_2_text.ignore_links = True
article_txt = html_2_text.handle(article_html.decode('utf-8'))

# display(HTML(article_txt))

## Analyze the Sentiment Score on webcontent 

### Remove stop words 

In [7]:
import spacy

In [8]:
# load English tokenizer, tagger, parser and NET
print (f" Spacy version: {spacy.__version__} ")

nlp = spacy.load("en_core_web_sm")
doc = nlp(article_txt)

print(f"Noun phrases: {chunk.text for chunk in doc.noun_chunks}")
print(f"Verbs: {token.lemma_ for token in doc if token.pos_ == 'VERB'}")

# find named entities, phrases 
# for entity in doc.ents:
#     print(entity.text, entity.label_)

 Spacy version: 3.0.6 
Noun phrases: <generator object <genexpr> at 0x7fb3509c27b0>
Verbs: <generator object <genexpr> at 0x7fb3509c2f20>


### Predict sentiment on the content

In [9]:
import nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /Users/krishnamanchikalapudi/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [11]:
nltk.__version__

'3.6.2'

In [10]:
analyzer = SentimentIntensityAnalyzer()
content_analyzer = analyzer.polarity_scores(article_txt)

# Vader score
print(content_analyzer)
print(f"Positive score: {content_analyzer['pos']}")
print(f"Negative score: {content_analyzer['neg']}")
print(f"Neutral score: {content_analyzer['neu']}")

predict_sentiment = ''
if (content_analyzer['compound'] >= 0.3):
    predict_sentiment = 'POSITIVE'
elif ((content_analyzer['compound'] >= 0) & (content_analyzer['compound'] < 0.3)):
    predict_sentiment = 'NEUTRAL'
elif (content_analyzer['compound'] < 0):
    predict_sentiment = 'NEGATIVE'
    
print(f"\n\nPredicted sentiment {predict_sentiment} and score is {content_analyzer['compound']} \nurl: {content} ")

{'neg': 0.098, 'neu': 0.812, 'pos': 0.09, 'compound': -0.9947}
Positive score: 0.09
Negative score: 0.098
Neutral score: 0.812


Predicted sentiment NEGATIVE and score is -0.9947 
url: https://nymag.com/intelligencer/2020/12/moderna-covid-19-vaccine-design.html 
