In [1]:
import pandas as pd
import flair

`Flair` provides many built-in, state-of-the-art language model
- To install flair. Follow this:
```
conda create -n flair -c conda-forge -c bioconda flair
conda activate flair
flair [align/correct/...]

```

we will be using the English sentiment classifier
- character-level embedding (no OOV tokens)
- Transformer nased language model (DistilBERT)

In [2]:
sentiment_model = flair.models.TextClassifier.load('en-sentiment')

2023-04-10 23:01:44,629 loading file /Users/nikhilsaireddychoppa/.flair/models/sentiment-en-mix-distillbert_4.pt


- Load our tweets

In [3]:
tweets = pd.read_csv('tesla_tweets.csv', sep='|')
len(tweets)

16125

To perform sentemental analysis, we create a `Sentence` object from a string -> then call `predict` method from our Flair model.

In [4]:
sentence = flair.data.Sentence(tweets['text'].iloc[0])
sentence

Sentence: "RT @ nytimes : Elon Musk 's many tweaks to Twitter have resulted in a clunkier and less predictable experience , some users say . Even Twitter 's …"

In [5]:
sentiment_model.predict(sentence)

The `predict` method modifies the `Sentense` object

In [6]:
sentence

Sentence: "RT @ nytimes : Elon Musk 's many tweaks to Twitter have resulted in a clunkier and less predictable experience , some users say . Even Twitter 's …" → NEGATIVE (0.6028)

We are given two new metrics:
1. Sentiment - this is either 'POSITIVE' or 'NEGATIVE'
2. Confidence/probability - how likely this sentiment is to be correct, from 0-1

- we can call `help(sentense)` to see what methods are available to us - `get_labels()` or `labels`

In [7]:
help(sentence)

Help on Sentence in module flair.data object:

class Sentence(DataPoint)
 |  Sentence(text: Union[str, List[str]], use_tokenizer: Union[bool, flair.data.Tokenizer] = True, language_code: str = None, start_position: int = 0)
 |  
 |  A Sentence is a list of tokens and is used to represent a sentence or text fragment.
 |  
 |  Method resolution order:
 |      Sentence
 |      DataPoint
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __copy__(self)
 |  
 |  __getitem__(self, subscript)
 |  
 |  __init__(self, text: Union[str, List[str]], use_tokenizer: Union[bool, flair.data.Tokenizer] = True, language_code: str = None, start_position: int = 0)
 |      Class to hold all meta related to a text (tokens, predictions, language code, ...)
 |      :param text: original string (sentence), or a list of string tokens (words)
 |      :param use_tokenizer: a custom tokenizer (default is :class:`SpaceTokenizer`)
 |          more advanced options are :class:`SegTokTokenizer` to use seg

In [8]:
help(sentence.get_labels())

Help on list object:

class list(object)
 |  list(iterable=(), /)
 |  
 |  Built-in mutable sequence.
 |  
 |  If no argument is given, the constructor creates a new empty list.
 |  The argument must be an iterable if specified.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __delitem__(self, key, /)
 |      Delete self[key].
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(...)
 |      x.__getitem__(y) <==> x[y]
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __iadd__(self, value, /)
 |      Implement self+=value.
 |  
 |  __imul__(self, value, /)
 |      Implement self*=value.
 |  
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate sign

In [9]:
sentence.get_labels()[0].value

'NEGATIVE'

In [10]:
sentence.labels[0].value, sentence.labels[0].score

('NEGATIVE', 0.602799654006958)

# Results

In [11]:
sentiment = []
confidence = []

sample = tweets.iloc[:1000]
for tweet in sample['text'].to_list():
    sentence = flair.data.Sentence(tweet)
    sentiment_model.predict(sentence)
    
    sentiment.append(sentence.labels[0].value)
    confidence.append(sentence.labels[0].score)
    
sample['sentiment'] = sentiment
sample['confidence'] = confidence

sample.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  sample['sentiment'] = sentiment
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  sample['confidence'] = confidence


Unnamed: 0,created_at,edit_history_tweet_ids,text,id,lang,withheld,sentiment,confidence
0,2023-04-09T21:53:12.000Z,['1645183104896876545'],RT @nytimes: Elon Musk's many tweaks to Twitte...,1645183104896876545,en,,NEGATIVE,0.6028
1,2023-04-09T21:53:12.000Z,['1645183104343048197'],RT @elonmusk: Tesla opening Megapack factory i...,1645183104343048197,en,,POSITIVE,0.983427
2,2023-04-09T21:53:12.000Z,['1645183103550496771'],RT @OmarRiverosays: BREAKING NEWS: Yahoo News ...,1645183103550496771,en,,NEGATIVE,0.837053
3,2023-04-09T21:53:12.000Z,['1645183103332384769'],RT @blockkbusiness: Is it true Tesla is buildi...,1645183103332384769,en,,NEGATIVE,0.958561
4,2023-04-09T21:53:12.000Z,['1645183102195494913'],RT @blockkbusiness: Is it true Tesla is buildi...,1645183102195494913,en,,NEGATIVE,0.958561


In [12]:
pd.set_option('display.max_colwidth', None)

In [13]:
sample.head()

Unnamed: 0,created_at,edit_history_tweet_ids,text,id,lang,withheld,sentiment,confidence
0,2023-04-09T21:53:12.000Z,['1645183104896876545'],"RT @nytimes: Elon Musk's many tweaks to Twitter have resulted in a clunkier and less predictable experience, some users say. Even Twitter's…",1645183104896876545,en,,NEGATIVE,0.6028
1,2023-04-09T21:53:12.000Z,['1645183104343048197'],RT @elonmusk: Tesla opening Megapack factory in Shanghai to supplement output of Megapack factory in California,1645183104343048197,en,,POSITIVE,0.983427
2,2023-04-09T21:53:12.000Z,['1645183103550496771'],"RT @OmarRiverosays: BREAKING NEWS: Yahoo News drops bombshell, reveals that Elon Musk's Twitter is now “amplifying” the Twitter accounts of…",1645183103550496771,en,,NEGATIVE,0.837053
3,2023-04-09T21:53:12.000Z,['1645183103332384769'],RT @blockkbusiness: Is it true Tesla is building a factory in China? https://t.co/EfVGqWOBBP,1645183103332384769,en,,NEGATIVE,0.958561
4,2023-04-09T21:53:12.000Z,['1645183102195494913'],RT @blockkbusiness: Is it true Tesla is building a factory in China? https://t.co/EfVGqWOBBP,1645183102195494913,en,,NEGATIVE,0.958561
