<a href="https://colab.research.google.com/github/edgardpitta/jopper/blob/main/Sentiment_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
#@title 1. Libraries and Dependencies {display-mode: "form"}
 
# Do not change it.

# Installs
%%capture

# Imports
from textblob import TextBlob
import pandas as pd
import altair as alt
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from google.colab import data_table
from wordcloud import WordCloud, STOPWORDS
import spacy
nlp = spacy.load('en_core_web_sm')
import neattext as nfx
import matplotlib.pyplot as plt
data_table.enable_dataframe_formatter()
alt.renderers.enable('default')

# Functions
def convert_to_df(sentiment):
	sentiment_dict = {'polarity':sentiment.polarity,'subjectivity':sentiment.subjectivity}
	sentiment_df = pd.DataFrame(sentiment_dict.items(),columns=['metric','value'])
	return sentiment_df 

def analyze_token_sentiment(docx):
	analyzer = SentimentIntensityAnalyzer()
	pos_list = []
	neg_list = []
	neu_list = []
	for i in docx.split():
		res = analyzer.polarity_scores(i)['compound']
		if res > 0.1:
			pos_list.append(i)
			pos_list.append(res)

		elif res <= -0.1:
			neg_list.append(i)
			neg_list.append(res)
		else:
			neu_list.append(i)

	result = {'positives':pos_list,'negatives':neg_list,'neutral':neu_list}
	return result 

def text_analyzer(my_text):
	docx = nlp(my_text)
	allData = [(token.text, token.shape_, token.pos_, token.tag_, token.lemma_,token.is_alpha, token.is_stop) for token in docx]
	df_tokens = pd.DataFrame(allData, columns=['Token', 'Shape', 'PoS', 'Tag', 'Lemma', 'IsAlpha', 'IsStopWords'])
	return df_tokens

def text_cleaner(my_text):
	stpw = nfx(my_text)
	stpw = stpw.lower().remove_stopwords().remove_numbers().remove_punctuations().remove_special_characters()
	return stpw

def plot_wordcloud(docx):
	my_wordcloud = WordCloud(background_color="white").generate(docx)
	fig = plt.figure()
	plt.imshow(my_wordcloud, interpolation='bilinear')
	plt.axis('off')

## 2. Paste your pitch in the field below and press Enter


In [None]:
#@title Pitch {display-mode: "form"}
raw_text = input() 



If you want to change your pitch, just re-run all the cells from here by selecting the cell 'Pitch' and clicking 'Ctrl+F10' or in the menu item Runtime Environment/Environnement d'exécution, choose All/Tout.

In [None]:
#@title Code {display-mode: "form"}

# Do not change it.
sentiment = TextBlob(raw_text).sentiment
result_df = convert_to_df(sentiment)
token_sentiments = analyze_token_sentiment(raw_text)

## 3. Wordclouds
This wordcloud shows the main keywords of your pitch.

In [None]:
#@title This wordcloud shows the main keywords of your pitch. {display-mode: "form"}
plot_wordcloud(raw_text)

In [None]:
#@title This wordcloud shows the main keywords of your pitch, excluding stop-words. {display-mode: "form"}
# Create stopword list:
stopwords = set(STOPWORDS)
stopwords.update(["thanks", "little", "Hi", "Hello", "School", "city", "name", "time", "semester", "tell", "work", "lots", "glad", "meet", "morning", "afternoon", "field", "will", "looking", ])

# Generate a word cloud image
wordcloud = WordCloud(stopwords=stopwords, background_color="white").generate(raw_text)

# Display the generated image:
# the matplotlib way:
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()


Stopwords are those words that do not provide any useful information to decide in which category a text should be classified. This may be either because they don't have any meaning (prepositions, conjunctions, etc.) or because they are too frequent in the classification context.

## 3. Your pitch's Sentiment Analysis 

---

Sentiment analysis (also known as opinion mining or emotion AI) is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. 



In [None]:
#@title Chart {display-mode: "form"}
chart = alt.Chart(result_df).mark_bar().encode(x='metric', y='value', color='metric')
chart

The **Polarity** defines the phase of emotions expressed in the analyzed text. It ranges from -1 to 1 and goes like this: Very Positive, Positive, Neutral, Negative, Very Negative)

**Subjectivity** quantifies the amount of personal opinion and factual information contained in the text. It has values from 0 to 1. A value closer to 0 shows the sentence is objective and closer to 1 means that the text contains personal opinion rather than factual information. 


In [None]:
#@title Your pitch's Polarity {display-mode: "form"}

# This code will be hidden when the notebook is loaded.

polarity = sentiment.polarity

if sentiment.polarity > 0:
  print("Positive polarity 😀")
elif sentiment.polarity < 0:
  ("Negative polarity 😟")
else:
  print("Neutral polarity 😴")


In [None]:
#@title Your pitch's Subjectivity {display-mode: "form"}

# This code will be hidden when the notebook is loaded.
if sentiment.subjectivity > 0.5:
  print("High subjectivity 😀")
else:
  print("Low subjectivity 😴")

## 4. Token Polarity Analysis

In [None]:
#@title Positive Words {display-mode: "form"}

token_sentiments['positives']


In [None]:
#@title Negative Words {display-mode: "form"}

token_sentiments['negatives']

In [None]:
#@title Neutral Words {display-mode: "form"}

token_sentiments['neutral']