# Anonalyze: An NLP-Enhanced and ML-Driven Platform for Sentiment and Insight Extraction
A platform designed like an online discussion board where users can freely share their thoughts and opinions anonymously. It uses AI, ML, and language processing tools to analyze the posts, helping to understand the overall mood and key ideas in the discussions.

## Initialization

In [20]:
import pickle
import nltk
import re
import numpy as np

from nltk import pos_tag
from nltk.corpus import stopwords, wordnet
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import WordPunctTokenizer

In [9]:
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('punkt')
nltk.download('punkt_tab')
nltk.download('omw-1.4')
nltk.download('averaged_perceptron_tagger')
nltk.download('averaged_perceptron_tagger_eng')

[nltk_data] Downloading package stopwords to
[nltk_data]     /home/cabrera/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /home/cabrera/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package punkt to /home/cabrera/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to
[nltk_data]     /home/cabrera/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /home/cabrera/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/cabrera/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /home/cabrera/nltk_data...
[nltk_data]   Package averaged_perceptron_ta

True

Loading the pickled vectorizer, selector, and model

In [4]:
with open('./models/sentiment-emotion-classification/pkl/tfidf_vectorizer.pkl', 'rb') as file:
  vectorizer = pickle.load(file)
  
with open('./models/sentiment-emotion-classification/pkl/selector_sentiment.pkl', 'rb') as file:
  selector_sentiment = pickle.load(file)
  
with open('./models/sentiment-emotion-classification/pkl/selector_emotion.pkl', 'rb') as file:
  selector_emotion = pickle.load(file)
  
with open('./models/sentiment-emotion-classification/pkl/model_sentiment.pkl', 'rb') as file:
  model_sentiment = pickle.load(file)
  
with open('./models/sentiment-emotion-classification/pkl/model_emotion.pkl', 'rb') as file:
  model_emotion = pickle.load(file)

## Execution

Making a data pre-processor pipe composing of:
* Denoising: removing the non-alphabetical characters in the content
* Removing stopwords: removing stopwords such as `[a, an, the, and, but]`
* Lemmatizing: reducing words to their base form e.g. `[changing, changed, change] -> change`

In [36]:
class Preprocessor:
  @staticmethod
  def denoiser(text: str) -> str:
    text = re.sub(r'@\w+', '', text) 
    text = re.sub(r'[^a-zA-Z ]', '', text)
    text = re.sub(r'https\w+', '', text)
    text = re.sub(r'http\w+', '', text)
    text = text.strip()
    text = text.lower()
    return text

  @staticmethod
  def stopwords_remover(text: str) -> str:
    matcher = re.compile(r"|".join([fr"\b{word}\b" for word in stopwords.words("english")]))
    text = " ".join(matcher.sub('', text).split())
    return text

  @staticmethod
  def lemmatizer(text: str) -> str:
    wordnet_lemmatizer = WordNetLemmatizer()
    tokenizer = WordPunctTokenizer()

    wordnet_pos_tag_map = {
        "J": wordnet.ADJ,
        "N": wordnet.NOUN,
        "V": wordnet.VERB,
        "R": wordnet.ADV,
    }

    tokens = tokenizer.tokenize(text)
    pos_tags = pos_tag(tokens)

    lemmatized_tokens = []
    for token, tag in pos_tags:
        wordnet_tag = wordnet_pos_tag_map.get(tag[0].upper())
        if wordnet_tag is None:
            lemmatized_tokens.append(token)
        else:
            lemmatized_tokens.append(wordnet_lemmatizer.lemmatize(token, wordnet_tag))
            
    return ' '.join(lemmatized_tokens)
  
  @staticmethod
  def process_text(text: str) -> str:
    text = Preprocessor.denoiser(text)
    text = Preprocessor.stopwords_remover(text)
    text = Preprocessor.lemmatizer(text)
    return text

Making a static class for the sentiment model to simplify the workflow

In [71]:
class ModelSentiment:
  vectorizer = None
  selector_sentiment = None
  model_sentiment = None
  
  sentiment_label_description_map = {
    0: 'negative',
    1: 'positive',
    2: 'neutral',
  }

  @staticmethod
  def _initialize():
    if ModelSentiment.vectorizer is None:
      with open('./models/sentiment-emotion-classification/pkl/tfidf_vectorizer.pkl', 'rb') as file:
          ModelSentiment.vectorizer = pickle.load(file)
    
    if ModelSentiment.selector_sentiment is None:
      with open('./models/sentiment-emotion-classification/pkl/selector_sentiment.pkl', 'rb') as file:
          ModelSentiment.selector_sentiment = pickle.load(file)
    
    if ModelSentiment.model_sentiment is None:
      with open('./models/sentiment-emotion-classification/pkl/model_sentiment.pkl', 'rb') as file:
          ModelSentiment.model_sentiment = pickle.load(file)
  
  @staticmethod
  def _vectorize(text: str):
    ModelSentiment._initialize()
    return ModelSentiment.vectorizer.transform([text])

  @staticmethod
  def _select_best_features(vector):
    return ModelSentiment.selector_sentiment.transform(vector)

  @staticmethod
  def predict(text: str):
    ModelSentiment._initialize()
    vector = ModelSentiment._vectorize(text)
    vector = ModelSentiment._select_best_features(vector)
    return (
      ModelSentiment
      .sentiment_label_description_map
      .get(ModelSentiment.model_sentiment.predict(vector)[0]))

Making a static class for the emotion model to simplify the workflow

In [65]:
class ModelEmotion:
  vectorizer = None
  selector_emotion = None
  model_emotion = None
  
  emotion_label_description_map = {
    0: 'sadness',
    1: 'joy',
    2: 'love',
    3: 'anger',
    4: 'fear',
    5: 'surprised',
  }

  @staticmethod
  def _initialize():
    if ModelEmotion.vectorizer is None:
      with open('./models/sentiment-emotion-classification/pkl/tfidf_vectorizer.pkl', 'rb') as file:
          ModelEmotion.vectorizer = pickle.load(file)
    
    if ModelEmotion.selector_emotion is None:
      with open('./models/sentiment-emotion-classification/pkl/selector_emotion.pkl', 'rb') as file:
          ModelEmotion.selector_emotion = pickle.load(file)
    
    if ModelEmotion.model_emotion is None:
      with open('./models/sentiment-emotion-classification/pkl/model_emotion.pkl', 'rb') as file:
          ModelEmotion.model_emotion = pickle.load(file)
  
  @staticmethod
  def _vectorize(text: str):
    ModelEmotion._initialize()
    return ModelEmotion.vectorizer.transform([text])

  @staticmethod
  def _select_best_features(vector):
    return ModelEmotion.selector_emotion.transform(vector)

  @staticmethod
  def predict(text: str):
    ModelEmotion._initialize()
    vector = ModelEmotion._vectorize(text)
    vector = ModelEmotion._select_best_features(vector)
    
    return (
      ModelEmotion
      .emotion_label_description_map
      .get(ModelEmotion.model_emotion.predict(vector)[0]))

### Simulating the platform
**Thread question**: How do you think about the impact of online anonymity on user behavior in social media platforms?

In [69]:
responses = [
  "I believe online anonymity encourages more honest and open communication, allowing users to express their true opinions",
  "In my view, online anonymity can lead to a significant increase in negative behaviors, such as trolling and cyberbullying, because users feel shielded from accountability.",
  "I think anonymity provides a double-edged sword; while it allows for free expression, it also creates an environment where people may engage in harmful or deceitful actions.",
  "Online anonymity empowers marginalized voices to speak out, but it also makes it difficult to identify and address harmful content effectively.",
  "I see online anonymity as a critical factor in fostering diverse discussions, but it also contributes to the spread of misinformation, as sources cannot always be verified.",
  "I think that online anonymity can lead to more genuine interactions in certain communities, but it may also reduce the quality of discourse by enabling users to avoid responsibility for their words.",
  "Anonymity online is essential for privacy, but it can also encourage users to engage in behavior they might avoid if their identity were known.",
  "In my opinion, the impact of online anonymity is largely context-dependent; it can promote both positive and negative behaviors depending on the platform and community norms.",
  "I believe online anonymity amplifies both the best and worst aspects of human behavior, providing a space for both creativity and cruelty.",
  "I think online anonymity allows people to connect more authentically, but it can also lead to a lack of trust and credibility in online interactions."
]

In [80]:
for response in responses:
  preprocessed_text = Preprocessor.process_text(response)
  predicted_sentiment = ModelSentiment.predict(preprocessed_text)
  predicted_emotion = ModelEmotion.predict(preprocessed_text)
  print(f"{response[:90]}... \tsentiment: {predicted_sentiment} \temotion: {predicted_emotion}")

I believe online anonymity encourages more honest and open communication, allowing users t... 	sentiment: positive 	emotion: joy
In my view, online anonymity can lead to a significant increase in negative behaviors, suc... 	sentiment: positive 	emotion: joy
I think anonymity provides a double-edged sword; while it allows for free expression, it a... 	sentiment: positive 	emotion: joy
Online anonymity empowers marginalized voices to speak out, but it also makes it difficult... 	sentiment: positive 	emotion: joy
I see online anonymity as a critical factor in fostering diverse discussions, but it also ... 	sentiment: negative 	emotion: joy
I think that online anonymity can lead to more genuine interactions in certain communities... 	sentiment: positive 	emotion: joy
Anonymity online is essential for privacy, but it can also encourage users to engage in be... 	sentiment: positive 	emotion: joy
In my opinion, the impact of online anonymity is largely context-dependent; it can promote... 	se