# Sentiment analysis (using IPUs)

https://huggingface.co/blog/sentiment-analysis-python

In [1]:
!gc-monitor --no-card-info

+--------------------------------------------------------------+-----------------------+
|          IPUs in p64 attached from other namespaces          |         Board         |
+----+------------------------------+--------------+-----------+-----------+-----------+
| ID |       Application host       |    Clock     |   Temp    |   Temp    |   Power   |
+----+------------------------------+--------------+-----------+-----------+-----------+
| 4  |        gbnwp-pod012-3        |   1330MHz    |  32.0 C   |  25.2 C   |  158.3 W  |
| 5  |        gbnwp-pod012-3        |   1330MHz    |  29.3 C   |           |           |
| 6  |        gbnwp-pod012-3        |   1330MHz    |  31.5 C   |           |           |
| 7  |        gbnwp-pod012-3        |   1330MHz    |  30.4 C   |           |           |
+----+------------------------------+--------------+-----------+-----------+-----------+
| 8  |        gbnwp-pod012-3        |   1330MHz    |  39.9 C   |  30.5 C   |  171.1 W  |
| 9  |        gbnwp-p

In [2]:
# %pip install -e ../
# %pip install emoji==0.6.0

In [3]:
%load_ext autoreload
%autoreload 2

In [4]:
import os
os.environ["POPTORCH_LOG_LEVEL"] = "ERR"
import transformers
from optimum.graphcore import pipelines

In [5]:
inference_config = dict(layers_per_ipu=[20], ipus_per_replica=1)

In [6]:
sentiment_pipeline = pipelines.pipeline("sentiment-analysis", ipu_config_kwargs=inference_config)
data = ["I love you", "I hate you"]
sentiment_pipeline(data)

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
No padding arguments specified, so pad to 128 by default. Inputs longer than 128 will be truncated.
Graph compilation: 100%|██████████| 100/100 [00:22<00:00]


[{'label': 'POSITIVE', 'score': 0.9998660087585449},
 {'label': 'NEGATIVE', 'score': 0.9990818500518799}]

In [7]:
sentiment_pipeline(["How are you today?", "I'm a little tired, I didn't sleep well, but I hope it gets better"])


[{'label': 'POSITIVE', 'score': 0.9356999397277832},
 {'label': 'POSITIVE', 'score': 0.9859092831611633}]

In [8]:
specific_model = pipelines.pipeline(
    model="finiteautomata/bertweet-base-sentiment-analysis", ipu_config_kwargs=inference_config,
)
specific_model(data)

Downloading:   0%|          | 0.00/1.48k [00:00<?, ?B/s]

No padding arguments specified, so pad to 128 by default. Inputs longer than 128 will be truncated.
Graph compilation: 100%|██████████| 100/100 [00:38<00:00]


[{'label': 'POS', 'score': 0.9902849793434143},
 {'label': 'NEG', 'score': 0.979720413684845}]

In [9]:
specific_model(["How are you today?", "I'm a little tired, I didn't sleep well, but I hope it gets better"])


[{'label': 'NEU', 'score': 0.7505843043327332},
 {'label': 'NEG', 'score': 0.8974782228469849}]

In [10]:
specific_model = pipelines.pipeline(
    model="cardiffnlp/twitter-roberta-base-sentiment",ipu_config_kwargs=inference_config
)
specific_model(data)

Downloading:   0%|          | 0.00/1.41k [00:00<?, ?B/s]

No padding arguments specified, so pad to 128 by default. Inputs longer than 128 will be truncated.
Graph compilation: 100%|██████████| 100/100 [00:37<00:00]


[{'label': 'LABEL_2', 'score': 0.9551451206207275},
 {'label': 'LABEL_0', 'score': 0.96500563621521}]

In [11]:
specific_model(["How are you today?", "I'm a little tired, I didn't sleep well, but I hope it gets better"])


[{'label': 'LABEL_1', 'score': 0.8688815236091614},
 {'label': 'LABEL_1', 'score': 0.47830677032470703}]

In [12]:
model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
multilingual_model = pipelines.pipeline(
    model=model_name, ipu_config_kwargs=inference_config
)
multilingual_model(data)

Downloading:   0%|          | 0.00/1.55k [00:00<?, ?B/s]

No padding arguments specified, so pad to 128 by default. Inputs longer than 128 will be truncated. To change this behaviour, pass the `padding='max_length'` and`max_length=<your desired input length>` arguments to the pipeline function
Graph compilation: 100%|██████████| 100/100 [00:36<00:00]


[{'label': '5 stars', 'score': 0.853151261806488},
 {'label': '1 star', 'score': 0.6354780197143555}]

In [13]:
multilingual_model(["How are you today?", "I'm a little tired, I didn't sleep well, but I hope it gets better"])


[{'label': '5 stars', 'score': 0.5348031520843506},
 {'label': '3 stars', 'score': 0.7582475543022156}]

In [14]:
multilingual_model(["How are you today?", "Je suis un peu fatigue, je n'ai pas bien dormi mais j'espere que la journee s'ameliore"])


[{'label': '5 stars', 'score': 0.5348031520843506},
 {'label': '3 stars', 'score': 0.7263907790184021}]

In [15]:
sentiment_pipeline(["How are you today?", "Je suis un peu fatigue, je n'ai pas bien dormi mais j'espere que la journee s'ameliore"])


[{'label': 'POSITIVE', 'score': 0.9356999397277832},
 {'label': 'NEGATIVE', 'score': 0.9287972450256348}]

In [16]:
model_name = "bhadresh-savani/distilbert-base-uncased-emotion"
emotion_model = pipelines.pipeline(model=model_name, ipu_config_kwargs=inference_config)
emotion_model(data)

Downloading:   0%|          | 0.00/4.07k [00:00<?, ?B/s]

No padding arguments specified, so pad to 128 by default. Inputs longer than 128 will be truncated. To change this behaviour, pass the `padding='max_length'` and`max_length=<your desired input length>` arguments to the pipeline function
Graph compilation: 100%|██████████| 100/100 [00:20<00:00]


[{'label': 'love', 'score': 0.9584758281707764},
 {'label': 'anger', 'score': 0.8243763446807861}]

In [17]:
emotion_model(["How are you today?", "I'm a little tired, I didn't sleep well, but I hope it gets better"])


[{'label': 'joy', 'score': 0.7177484035491943},
 {'label': 'joy', 'score': 0.9376221299171448}]

In [18]:
sentiment_pipeline

TextClassificationPipeline(
    task=text-classification,
    modelcard=None,
    feature_extractor=None,
    framework=pt,
    device=cpu,
    call_count=3,
    tokenizer=PreTrainedTokenizerFast(name_or_path='distilbert-base-uncased-finetuned-sst-2-english', vocab_size=30522, model_max_len=512, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'unk_token': '[UNK]', 'sep_token': '[SEP]', 'pad_token': '[PAD]', 'cls_token': '[CLS]', 'mask_token': '[MASK]'}),
    model.config=DistilBertConfig {
  "_name_or_path": "distilbert-base-uncased-finetuned-sst-2-english",
  "activation": "gelu",
  "architectures": [
    "DistilBertForSequenceClassification"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "finetuning_task": "sst-2",
  "hidden_dim": 3072,
  "id2label": {
    "0": "NEGATIVE",
    "1": "POSITIVE"
  },
  "initializer_range": 0.02,
  "label2id": {
    "NEGATIVE": 0,
    "POSITIVE": 1
  },
  "max_position_embeddings": 512,
  "model_type": "di

In [19]:
emotion_model

TextClassificationPipeline(
    task=text-classification,
    modelcard=None,
    feature_extractor=None,
    framework=pt,
    device=cpu,
    call_count=2,
    tokenizer=PreTrainedTokenizerFast(name_or_path='bhadresh-savani/distilbert-base-uncased-emotion', vocab_size=30522, model_max_len=512, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'unk_token': '[UNK]', 'sep_token': '[SEP]', 'pad_token': '[PAD]', 'cls_token': '[CLS]', 'mask_token': '[MASK]'}),
    model.config=DistilBertConfig {
  "_name_or_path": "bhadresh-savani/distilbert-base-uncased-emotion",
  "activation": "gelu",
  "architectures": [
    "DistilBertForSequenceClassification"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "id2label": {
    "0": "sadness",
    "1": "joy",
    "2": "love",
    "3": "anger",
    "4": "fear",
    "5": "surprise"
  },
  "initializer_range": 0.02,
  "label2id": {
    "anger": 3,
    "fear": 4,
    "joy": 1,
    "love": 2

In [20]:
multilingual_model

TextClassificationPipeline(
    task=text-classification,
    modelcard=None,
    feature_extractor=None,
    framework=pt,
    device=cpu,
    call_count=3,
    tokenizer=PreTrainedTokenizerFast(name_or_path='nlptown/bert-base-multilingual-uncased-sentiment', vocab_size=105879, model_max_len=512, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'unk_token': '[UNK]', 'sep_token': '[SEP]', 'pad_token': '[PAD]', 'cls_token': '[CLS]', 'mask_token': '[MASK]'}),
    model.config=BertConfig {
  "_name_or_path": "nlptown/bert-base-multilingual-uncased-sentiment",
  "_num_labels": 5,
  "architectures": [
    "BertForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "directionality": "bidi",
  "finetuning_task": "sentiment-analysis",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "1 star",
    "1": "2 stars",
    "2": "3 stars",
    "3": "4 stars",
    "4": "5 star

In [21]:
!gc-monitor --no-card-info

+-----------------------------------+------------------------+-----------------+
|Attached processes in partition p64|          IPU           |      Board      |
+--------+----+--------+------------+----+----------+--------+--------+--------+
|  PID   |...d|  Time  |    User    | ID |  Clock   |  Temp  |  Temp  | Power  |
+--------+----+--------+------------+----+----------+--------+--------+--------+
|3462939 |...t| 4m20s  | alexandrep | 0  | 1330MHz  | 43.3 C | 33.8 C |161.2 W |
|3462939 |...t| 4m20s  | alexandrep | 1  | 1330MHz  | 39.6 C |        |        |
|3462939 |...t| 4m20s  | alexandrep | 2  | 1330MHz  | 44.7 C |        |        |
|3462939 |...t| 4m20s  | alexandrep | 3  | 1330MHz  | 41.3 C |        |        |
+--------+----+--------+------------+----+----------+--------+--------+--------+
+--------------------------------------------------------------+-----------------------+
|          IPUs in p64 attached from other namespaces          |         Board         |
+----+------