<a href="https://colab.research.google.com/github/felipefe20/GettingAndCleaningDataProject/blob/master/Transformers_sentiment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
#Example of classification
from transformers import AutoModelForSequenceClassification
from transformers import TFAutoModelForSequenceClassification
from transformers import AutoTokenizer
import numpy as np
from scipy.special import softmax
import csv
import urllib.request

# Preprocess text (username and link placeholders)
def preprocess(text):
    new_text = []
 
 
    for t in text.split(" "):
        t = '@user' if t.startswith('@') and len(t) > 1 else t
        t = 'http' if t.startswith('http') else t
        new_text.append(t)
    return " ".join(new_text)

# Tasks:
# emoji, emotion, hate, irony, offensive, sentiment
# stance/abortion, stance/atheism, stance/climate, stance/feminist, stance/hillary

task='sentiment'
MODEL = f"cardiffnlp/twitter-roberta-base-{task}"

tokenizer = AutoTokenizer.from_pretrained(MODEL)

# download label mapping
labels=[]
mapping_link = f"https://raw.githubusercontent.com/cardiffnlp/tweeteval/main/datasets/{task}/mapping.txt"
with urllib.request.urlopen(mapping_link) as f:
    html = f.read().decode('utf-8').split("\n")
    csvreader = csv.reader(html, delimiter='\t')
labels = [row[1] for row in csvreader if len(row) > 1]

# PT
#model = AutoModelForSequenceClassification.from_pretrained(MODEL)
#model.save_pretrained(MODEL)

#text = "Good night 😊"
#text = preprocess(text)
#encoded_input = tokenizer(text, return_tensors='pt')
#output = model(**encoded_input)
#scores = output[0][0].detach().numpy()
#scores = softmax(scores)

# # TF
model = TFAutoModelForSequenceClassification.from_pretrained(MODEL)
model.save_pretrained(MODEL)
tokenizer.save_pretrained(MODEL)
text = "Good night 😊"
encoded_input = tokenizer(text, return_tensors='tf')
output = model(encoded_input)
scores = output[0][0].numpy()
scores = softmax(scores)

ranking = np.argsort(scores)
ranking = ranking[::-1]
for i in range(scores.shape[0]):
    l = labels[ranking[i]]
    s = scores[ranking[i]]
    print(f"{i+1}) {l} {np.round(float(s), 4)}")


All model checkpoint layers were used when initializing TFRobertaForSequenceClassification.

All the layers of TFRobertaForSequenceClassification were initialized from the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForSequenceClassification for predictions without further training.


1) positive 0.8466
2) neutral 0.1458
3) negative 0.0076


# New Section

In [6]:
text = "Call with @SecBlinken on further steps to strengthen Ukraine’s defense capabilities. Grateful to the U.S. for the new package of tough sanctions on Russia. Pressure must be elevating until Russia stops its brutal aggression and barbaric war crimes against Ukrainians."
encoded_input = tokenizer(text, return_tensors='tf')
output = model(encoded_input)
scores = output[0][0].numpy()
scores = softmax(scores)

ranking = np.argsort(scores)
ranking = ranking[::-1]
for i in range(scores.shape[0]):
    l = labels[ranking[i]]
    s = scores[ranking[i]]
    print(f"{i+1}) {l} {np.round(float(s), 4)}")

1) neutral 0.5717
2) positive 0.3135
3) negative 0.1147


In [9]:
from transformers import pipeline

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

ARTICLE="""News of the resignations comes as the US warned Chinese companies not to breach restrictions on technology exports to Russia.

China abstained on a United Nations resolution condemning Russia's invasion but its government has also recently expressed "regret" about the military action, saying it was extremely concerned about the harm to civilians.

Commerce Secretary Gina Raimondo told the New York Times Washington could take "devastating" action against Chinese companies that defied Russian sanctions, prohibiting the use of US equipment and software needed to make their products.

Russia "is certainly going to be courting other countries to do an end run around our sanctions and export controls", Ms Raimondo told the newspaper.

The threats echo measures taken against Huawei in 2020, when Donald Trump's administration added the company to its "entity list", which bans it from acquiring technology from US companies without government approval.

The US government said at the time it believed Huawei posed a national security threat, something the company strongly denied.

But the restrictions hit the company's earnings hard and deprived it of access to key technologies."""

print(summarizer(ARTICLE, max_length=40, min_length=10, do_sample=False))



[{'summary_text': 'China abstained on a United Nations resolution condemning Russia\'s invasion. But its government has expressed "regret" about the military action. News of the resignations comes as the US warned'}]
