<a href="https://colab.research.google.com/github/osullik/bc-autoreporter/blob/main/SentimentAnalyzer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Imports

In [63]:
import nltk
from nltk.sentiment import SentimentIntensityAnalyzer
nltk.download('vader_lexicon')
nltk.download('stopwords')
nltk.download('names')
nltk.download('averaged_perceptron_tagger')
#nltk.download('movie_reviews')

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package names to /root/nltk_data...
[nltk_data]   Package names is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package movie_reviews to /root/nltk_data...
[nltk_data]   Package movie_reviews is already up-to-date!


True

# Utility Functions

In [45]:
unwanted = nltk.corpus.stopwords.words("english")
unwanted.extend([w.lower() for w in nltk.corpus.names.words()])

In [46]:
# Utility function to prevent words from being returned during the analysis
# Not currently used
def skip_unwanted(pos_tuple):
    word, tag = pos_tuple
    if not word.isalpha() or word in unwanted:
        return False
    if tag.startswith("NN"):
        return False
    return True

In [34]:
# positive_words = [word for word, tag in filter(skip_unwanted, nltk.pos_tag(nltk.corpus.movie_reviews.words(categories=["pos"])))]
# negative_words = [word for word, tag in filter(skip_unwanted,nltk.pos_tag(nltk.corpus.movie_reviews.words(categories=["neg"])))]

In [48]:
# VADER (Valence Aware Dictionary and sEntiment Reasoner) is an open-source lexicon and rule-based sentiment analysis tool 
# that is specifically attuned to sentiments expressed in social media.
# Compound score is used to determine sentiment. It ranges from -1 (extreme negative) to +1 (extrememly positive)
def sentiment_score(text):
  sia = SentimentIntensityAnalyzer()
  return sia.polarity_scores(text)

In [61]:
# Function to return all adjectives and adverbs from the report text
# Descriptive words (adjectives and adverbs) are most likely to influence the sentiment of the text.
def extract_adj(text):
  tagged_text = nltk.pos_tag(text.split())
  return [i[0] for i in tagged_text if i[1] in ['RB','RBR','RBS','JJ','JJR','JJS']]  # adjectives or adverbs

# Sample data to test functions

In [50]:
observations = ["On January 15th, 2021, @HomerSimpson was observed to perform to a satisfactory standard. This was evidenced by his successful completion of a routine safety check on Reactor 4. His actions show #competence #attentiontodetail #safetyconsciousness.",
"On February 2nd, 2021, @HomerSimpson was observed to perform to a poor standard. This was evidenced by his failure to properly label hazardous waste containers in the storage room, leading to a safety hazard. His actions show #carelessness #lackofattention #safetyoversight.",
"On March 12th, 2021, @HomerSimpson was observed to perform to an excellent standard. This was evidenced by his quick thinking and calm demeanor during an unexpected power surge in the control room. His actions show #resilience #problem-solving #teamwork."]


In [62]:
for observation in observations:
  print(observation)
  print(sentiment_score(observation))
  print(extract_adj(observation))

On January 15th, 2021, @HomerSimpson was observed to perform to a satisfactory standard. This was evidenced by his successful completion of a routine safety check on Reactor 4. His actions show #competence #attentiontodetail #safetyconsciousness.
{'neg': 0.0, 'neu': 0.761, 'pos': 0.239, 'compound': 0.8442}
['satisfactory', 'successful', 'routine', '#competence']
On February 2nd, 2021, @HomerSimpson was observed to perform to a poor standard. This was evidenced by his failure to properly label hazardous waste containers in the storage room, leading to a safety hazard. His actions show #carelessness #lackofattention #safetyoversight.
{'neg': 0.2, 'neu': 0.739, 'pos': 0.061, 'compound': -0.7506}
['poor', 'properly', 'hazardous', '#carelessness']
On March 12th, 2021, @HomerSimpson was observed to perform to an excellent standard. This was evidenced by his quick thinking and calm demeanor during an unexpected power surge in the control room. His actions show #resilience #problem-solving #te