###Sentiment Analysis  

This program uses the `SentimentIntensityAnalyzer` from the `nltk` library to calculate the sentiment of each sentence. The `polarity_scores` method returns a dictionary with various scores, including a `compound` score that represents the overall sentiment of the sentence. If the `compound` score is positive, the sentiment is considered positive; if it is negative, the sentiment is considered negative; otherwise, it is considered neutral.

The results are stored in a list of lists, where each inner list contains the sentence number (starting from 1), the sentence itself, and the sentiment label. This list of lists is then converted into a pandas DataFrame with the specified column names.

In [1]:
!python --version

Python 3.10.12


In [2]:
import nltk
nltk.download('vader_lexicon')
import pandas as pd
from nltk.sentiment import SentimentIntensityAnalyzer # English only

# Define sentiment analyser func
def sentiment_analysis(sentences):
    sia = SentimentIntensityAnalyzer()
    results = []
    for i, sentence in enumerate(sentences):
        sentiment = sia.polarity_scores(sentence)
        compound = sentiment['compound']
        if compound > 0:
            sentiment_label = 'positive'
        elif compound < 0:
            sentiment_label = 'negative'
        else:
            sentiment_label = 'neutral'
        results.append([i+1, sentence, sentiment_label, compound])
    df = pd.DataFrame(results, columns=['sentence number', 'sentence', 'sentiment', 'compound'])
    return df

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...


In [3]:
sentences = ["I love the new car from Tesla but the lead time sucks", 
             "There is a war going on between Ukrain and Russia but hopefully it will soon be over", 
             "How about North Korea?", 
             "I love my new bike"]

df = sentiment_analysis(sentences)
df.head()

Unnamed: 0,sentence number,sentence,sentiment,compound
0,1,I love the new car from Tesla but the lead tim...,negative,-0.1655
1,2,There is a war going on between Ukrain and Rus...,positive,0.2732
2,3,How about North Korea?,neutral,0.0
3,4,I love my new bike,positive,0.6369


The sentence “I love the new car from Tesla but the lead time sucks” expresses both positive and negative sentiments and should be considered neutral overall. This is confusing.

Sentiment analysis is a complex task and the results can vary depending on the specific algorithm and model used. The `SentimentIntensityAnalyzer` from the `nltk` library, which was used in the example program I provided, uses a rule-based approach to calculate the sentiment of a given text. While this approach can provide accurate results in many cases, it may not always be able to correctly identify the overall sentiment of a text, especially when the text contains mixed sentiments.  
  
`SentimentIntensityAnalyzer` from the calculate the sentiment of each sentence. This analyzer uses a rule-based approach to calculate the sentiment and assigns a `compound` score to each sentence based on the intensity and polarity of the words it contains. The compound score is then used to determine the overall sentiment of the sentence (positive, negative, or neutral).

In the case of the sentence “I love the new car from Tesla but the lead time sucks”, the `SentimentIntensityAnalyzer` assigns a negative compound score, which results in a negative sentiment label. This is because the negative words in the sentence (“sucks”) have a stronger impact on the compound score than the positive words (“love”).

While it is possible to manually adjust the sentiment label for this specific sentence to “neutral”, it is not recommended to do so as it would not address the underlying issue with the sentiment analysis algorithm. Instead, you may want to consider using a different sentiment analysis algorithm or model that is better suited to handle sentences with mixed sentiments.

The compound score is a normalized value that ranges from -1 (most negative) to +1 (most positive).

In the sentiment analysis program above, the following thresholds were used to determine the overall sentiment of a sentence based on its compound score:

* If the `compound` score is greater than 0, the sentiment is considered **positive**.
* If the `compound` score is less than 0, the sentiment is considered **negative**.
* If the `compound` score is equal to 0, the sentiment is considered **neutral**.  
  
These thresholds are commonly used in sentiment analysis and can provide accurate results in many cases. However, it is important to note that the choice of thresholds can have a significant impact on the results and may need to be adjusted depending on the specific use case and data.

In the sentiment analysis above, the sentiment of a sentence is considered neutral if its `compound` score is exactly 0. However, this is just one way to define the thresholds for determining the overall sentiment of a sentence. Depending on the specific use case and data, you may want to use different thresholds to determine when a sentence should be considered neutral.

The `compound` score is calculated by the `SentimentIntensityAnalyzer` from the `nltk` library using a rule-based approach. The analyzer uses a pre-defined lexicon of words that are labeled with their sentiment polarity (positive or negative) and intensity (how positive or negative the word is). The analyzer then calculates the sentiment of a given text by summing the sentiment scores of its individual words and normalizing the result to obtain a `compound` score that ranges from -1 (most negative) to +1 (most positive).

The specific details of how the `compound` score is calculated can be found in the paper “*VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text*” by Hutto and Gilbert (2014), which describes the algorithm used by the `SentimentIntensityAnalyzer`.

In [26]:
# text analytics function using walrus operator :=

def analyze_text(text: str) -> dict:
  #words = text.split() # this we don´t need to do when using the := inside the dict
  
  details: dict = {'words': (words := text.split()),
                   'amount': len(words),
                   'chars': len(''.join(words)),
                   'reversed': words[::-1]}
  return details

print(analyze_text("I love my new bike"))

{'words': ['I', 'love', 'my', 'new', 'bike'], 'amount': 5, 'chars': 14, 'reversed': ['bike', 'new', 'my', 'love', 'I']}


In [40]:
# analyze sentences
for i, sentence in enumerate(sentences):
  print(sentence)
  print(analyze_text(sentences[i]),"\n") 

I love the new car from Tesla but the lead time sucks
{'words': ['I', 'love', 'the', 'new', 'car', 'from', 'Tesla', 'but', 'the', 'lead', 'time', 'sucks'], 'amount': 12, 'chars': 42, 'reversed': ['sucks', 'time', 'lead', 'the', 'but', 'Tesla', 'from', 'car', 'new', 'the', 'love', 'I']} 

There is a war going on between Ukrain and Russia but hopefully it will soon be over
{'words': ['There', 'is', 'a', 'war', 'going', 'on', 'between', 'Ukrain', 'and', 'Russia', 'but', 'hopefully', 'it', 'will', 'soon', 'be', 'over'], 'amount': 17, 'chars': 68, 'reversed': ['over', 'be', 'soon', 'will', 'it', 'hopefully', 'but', 'Russia', 'and', 'Ukrain', 'between', 'on', 'going', 'war', 'a', 'is', 'There']} 

How about North Korea?
{'words': ['How', 'about', 'North', 'Korea?'], 'amount': 4, 'chars': 19, 'reversed': ['Korea?', 'North', 'about', 'How']} 

I love my new bike
{'words': ['I', 'love', 'my', 'new', 'bike'], 'amount': 5, 'chars': 14, 'reversed': ['bike', 'new', 'my', 'love', 'I']} 



In [4]:
# models available for different languages, including: 
# rule-based approaches
# machine learning-based approaches 
# and deep learning-based approaches