<a href="https://colab.research.google.com/github/lov435/SOEmotions/blob/main/huffing_goemo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

###Install the basic module dependencies

In [1]:
!pip install transformers
!pip install torch
!pip install attrdict

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


### Clone the Huffingface's GitHub repo, if already not cloned.

In [2]:
![ ! -d 'GoEmotions-pytorch' ] && git clone https://github.com/monologg/GoEmotions-pytorch.git

###Add the cloned repo dir to the import paths.

In [3]:
import sys
sys.path.insert(0,'/content/GoEmotions-pytorch')

###Import the required classes

In [4]:
from transformers import BertTokenizer, AutoTokenizer
from model import BertForMultiLabelClassification
from multilabel_pipeline import MultiLabelPipeline
from pprint import pprint
import torch
import pandas as pd

###Initialize the tokenizer and model

In [5]:
tokenizer = AutoTokenizer.from_pretrained("monologg/bert-base-cased-goemotions-original")
model = BertForMultiLabelClassification.from_pretrained("monologg/bert-base-cased-goemotions-original")

###Define the prediction method that uses pretrained GoEmotion model

In [6]:
def predict(texts) :
  results = []
  threshold = .3
  for txt in texts:
      if pd.isna(txt):
        #Empty strings cause problems. Avoid them manually.
        results.append({"labels": 'neutral', "scores": 1.0})
        continue
      inputs = tokenizer(txt,return_tensors="pt")
      outputs = model(**inputs)
      scores =  1 / (1 + torch.exp(-outputs[0]))  # Sigmoid
      for item in scores:
          labels = []
          scores = []
          for idx, s in enumerate(item):
              if s > threshold:
                  labels.append(model.config.id2label[idx])
                  scores.append(s)
          results.append({"labels": labels, "scores": scores})
  return results

###Toy test set of sentences

In [7]:
texts = [
    "Hey that's a thought! Maybe we need [NAME] to be the celebrity vaccine endorsement!",
    "it’s happened before?! love my hometown of beautiful new ken 😂😂",
    "I love you, brother.",
    "Troll, bro. They know they're saying stupid shit. The motherfucker does nothing but stink up libertarian subs talking shit",
]

###Predict the emotions on the test texts

In [8]:
results = predict(texts)
#pprint(results)
for res in results:
  print('%s  %s' % (res.get('labels'), res.get('scores')))

['realization', 'neutral']  [tensor(0.7098, grad_fn=<UnbindBackward0>), tensor(0.9381, grad_fn=<UnbindBackward0>)]
['curiosity', 'love']  [tensor(0.9586, grad_fn=<UnbindBackward0>), tensor(0.9358, grad_fn=<UnbindBackward0>)]
['love']  [tensor(0.9945, grad_fn=<UnbindBackward0>)]
['anger']  [tensor(0.9937, grad_fn=<UnbindBackward0>)]


###Haoxiang's dataset of Stack Overflow comments

In [9]:
url='https://drive.google.com/file/d/1KyyF_f_Cf_NkKpuZSqGDgNi8soxX6LNY/view?usp=sharing'
file_id=url.split('/')[-2]
dwn_url='https://drive.google.com/uc?id=' + file_id
df = pd.read_csv(dwn_url)
comments = df['CommentTextProc'].tolist();

###Predict the emotions on Haoxiang's SO comment set
First 500 only

In [10]:
results = predict(comments[0:500])
#pprint(results)
for res in results:
  print('%s  %s' % (res.get('labels'), res.get('scores')))

['neutral']  [tensor(0.9985, grad_fn=<UnbindBackward0>)]
['neutral']  [tensor(0.9989, grad_fn=<UnbindBackward0>)]
['neutral']  [tensor(0.9990, grad_fn=<UnbindBackward0>)]
['confusion', 'neutral']  [tensor(0.3435, grad_fn=<UnbindBackward0>), tensor(0.4133, grad_fn=<UnbindBackward0>)]
['neutral']  [tensor(0.9988, grad_fn=<UnbindBackward0>)]
['neutral']  [tensor(0.9286, grad_fn=<UnbindBackward0>)]
['disapproval', 'neutral']  [tensor(0.4143, grad_fn=<UnbindBackward0>), tensor(0.9678, grad_fn=<UnbindBackward0>)]
['neutral']  [tensor(0.9985, grad_fn=<UnbindBackward0>)]
['neutral']  [tensor(0.9990, grad_fn=<UnbindBackward0>)]
['neutral']  [tensor(0.9969, grad_fn=<UnbindBackward0>)]
['neutral']  [tensor(0.9986, grad_fn=<UnbindBackward0>)]
['neutral']  [tensor(0.9990, grad_fn=<UnbindBackward0>)]
['curiosity', 'neutral']  [tensor(0.9295, grad_fn=<UnbindBackward0>), tensor(0.4496, grad_fn=<UnbindBackward0>)]
['neutral']  [tensor(0.9990, grad_fn=<UnbindBackward0>)]
['admiration']  [tensor(0.9937, 