<a href="https://colab.research.google.com/github/pedro-pauletti/nlp-with-transformers/blob/main/Sentiment%20Analysis/Sentiment_Models_With_Transformers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Requirements

In [1]:
!pip install transformers

Collecting transformers
  Downloading transformers-4.33.3-py3-none-any.whl (7.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m20.2 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.15.1 (from transformers)
  Downloading huggingface_hub-0.17.3-py3-none-any.whl (295 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m295.0/295.0 kB[0m [31m29.8 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m52.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting safetensors>=0.3.1 (from transformers)
  Downloading safetensors-0.3.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m54.7 MB/s[0m eta [36m0:00:0

### Sentiment Model

In [2]:
model_name = 'ProsusAI/finbert'

In [3]:
from transformers import BertForSequenceClassification

In [7]:
model = BertForSequenceClassification.from_pretrained(model_name)

Downloading pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

In [8]:
from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained(model_name)

1. Tokenize

2. Token IDs -> Model

3. Model activations -> probabilities (using Softmax)

4. Argmax of those probs

In [10]:
txt = ("Given the recent downturn in stocks especially in tech which is likely to persist as yields keep going up, "
       "I thought it would be prudent to share the risks of investing in ARK ETFs, written up very nicely by "
       "[The Bear Cave](https://thebearcave.substack.com/p/special-edition-will-ark-invest-blow). The risks comes "
       "primarily from ARK's illiquid and very large holdings in small cap companies. ARK is forced to sell its "
       "holdings whenever its liquid ETF gets hit with outflows as is especially the case in market downturns. "
       "This could force very painful liquidations at unfavorable prices and the ensuing crash goes into a "
       "positive feedback loop leading into a death spiral enticing even more outflows and predatory shorts.")

tokens = tokenizer.encode_plus(txt, max_length=512,
                               truncation=True,
                               padding='max_length',
                               add_special_tokens=True,
                               return_tensors='pt')

[CLS] = 101

[SEP] = 102

[MAKS] = 103

[UNK] = 100

[PAD] = 0

In [None]:
tokens

In [12]:
output = model(**tokens)

In [13]:
output

SequenceClassifierOutput(loss=None, logits=tensor([[-1.8200,  2.4484,  0.0216]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

In [14]:
output[0]

tensor([[-1.8200,  2.4484,  0.0216]], grad_fn=<AddmmBackward0>)

In [17]:
import torch.nn.functional as F

probs = F.softmax(output[0], dim=-1)

In [18]:
probs

tensor([[0.0127, 0.9072, 0.0801]], grad_fn=<SoftmaxBackward0>)

In [19]:
import torch

pred = torch.argmax(probs)

In [20]:
pred.item()

1