### Natural Language Processing with Hugging Face Transformers

In [1]:
!pip install torch



In [2]:
!pip install --upgrade torch



In [3]:
!pip install -q transformers

In [4]:
!pip install datasets evaluate transformers[sentencepiece]



In [5]:
!pip install sacremoses



In [6]:
import warnings
warnings.filterwarnings('ignore')

In [7]:
from transformers import pipeline
from transformers import AutoTokenizer
from transformers import AutoModel


  LARGE_SPARSE_SUPPORTED = LooseVersion(scipy_version) >= '0.14.0'


### Let's Practice

### Exercise 1 - Sentiment Analysis

For sentiment analysis, we can also use a specific model that is better suited to our use case by providing the name of the model. For example, if we want a sentiment analysis model for tweets, we can specify the following model id: "cardiffnlp/twitter-roberta-base-sentiment". This model has been trained on ~58M tweets and fine-tuned for sentiment analysis with the "TweetEval" benchmark. The output labels for this model are: 0 -> Negative; 1 -> Neutral; 2 -> Positive.

In this Exercise, use "cardiffnlp/twitter-roberta-base-sentiment" model pre-trained on tweets data, to analyze any tweet of choice. Optionally, use the default model (used in Example 1) on the same tweet, to see if the result will change.

In [8]:
data = "Artificial intelligence and automation are already causing friction in the workforce. Should schools revamp existing programs for topics like #AI, or are new research areas required?"

specific_model = pipeline(model="cardiffnlp/twitter-roberta-base-sentiment")
results_specific = specific_model(data)

classifier = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")

print("Cardiff NLP Twitter Roberta:")
print(f"Sentiment: {results_specific[0]['label']}")
print(f"Score: {results_specific[0]['score']}")

Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


Cardiff NLP Twitter Roberta:
Sentiment: LABEL_1
Score: 0.5272255539894104


In [9]:
data = "Artificial intelligence and automation are already causing friction in the workforce. Should schools revamp existing programs for topics like #AI, or are new research areas required?"

original_model = pipeline("sentiment-analysis")  
results_original = original_model(data)

print("Original Model:")
print(f"Sentiment: {results_original[0]['label']}")  
print(f"Score: {results_original[0]['score']}")


No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Original Model:
Sentiment: NEGATIVE
Score: 0.9989722967147827


### Exercise 2 - Topic Classification

In this Exercise, use any sentence of choice to classify it under any classes/ topics of choice. Use "zero-shot-classification" and specify the model="facebook/bart-large-mnli".

In [10]:
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
classifier(
    "I like eating pie",
    candidate_labels=["Vegetables", "education", "cake"],
)

{'sequence': 'I like eating pie',
 'labels': ['cake', 'Vegetables', 'education'],
 'scores': [0.4752267599105835, 0.27782317996025085, 0.24695000052452087]}

### Exercise 3 - Text Generation Models
In this Exercise, use 'text-generator' and 'gpt2' model to complete any sentence. Define any desirable number of returned sentences.

In [11]:
generator = pipeline('text-generation', model = 'gpt2')
generator("I'm in Jambi now", max_length = 20, num_return_sequences=5)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "I'm in Jambi now—I'm sure you've been there. You've got your"},
 {'generated_text': "I'm in Jambi now and I will be going by my usual rules, such as eating"},
 {'generated_text': 'I\'m in Jambi now!" The two men had come to the site of Jambi'},
 {'generated_text': 'I\'m in Jambi now. I\'m not ready to die, I\'m dead." ('},
 {'generated_text': 'I\'m in Jambi now," says the boy. "So I\'m really hoping some of'}]

### Exercise 4 - Name Entity Recognition
In this Exercise, use any sentence of choice to extract entities: person, location and organization, using Name Entity Recognition task, specify model as "Jean-Baptiste/camembert-ner".


In [12]:
nlp = pipeline("ner", model="Jean-Baptiste/camembert-ner", grouped_entities=True)
example = "My name is Rizki and I live in Jambi city"

ner_results = nlp(example)
print(ner_results)

[{'entity_group': 'PER', 'score': 0.98433656, 'word': 'Rizki', 'start': 10, 'end': 16}, {'entity_group': 'LOC', 'score': 0.9981303, 'word': 'Jambi', 'start': 30, 'end': 36}]


### Exercise 5 - Question Answering
In this Exercise, use any sentence and a question of choice to extract some information, using "distilbert-base-cased-distilled-squad" model.


In [13]:
question_answerer = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")
question_answerer(
    question="Where is Jambi located??",
    context="Jambi is located on the island of Sumatra, Indonesia. To be precise, Jambi is in the central part of the island of Sumatra, on the east coast.",
)

{'score': 0.6591984629631042,
 'start': 34,
 'end': 52,
 'answer': 'Sumatra, Indonesia'}

### Exercise 6 - Text Summarization
In this Exercise, use any document/paragraph of choice and summarize it, using "sshleifer/distilbart-cnn-12-6" model.


In [1]:
from transformers import pipeline
summarizer = pipeline("summarization", model="sshleifer/distilbart-cnn-12-6",  max_length=56)
summarizer(
    """The origin of the name "Jambi" has several versions. The first version, the name Jambi appeared when this area was ruled by Putri Selaras Pinang Masak, where the word "pinang" was called "jambe" in Old Javanese. The second version, the name Jambi comes from the many areca palm trees along the Batanghari River. The third version, the name Jambi is associated with the word "jambi" in Malay which means "low land".
Regardless of which version is correct, Jambi has a long and rich history engraved in various cultural relics, such as the Muaro Jambi Temple and the Jambi Malay Kingdom..
"""
)

  LARGE_SPARSE_SUPPORTED = LooseVersion(scipy_version) >= '0.14.0'


[{'summary_text': ' The origin of the name "Jambi" has several versions . The name Jambi appeared when this area was ruled by Putri Selaras Pinang Masak, where the word "pinang" was called "jambe" in Old Javan'}]

### Exercise 7 - Translation
In this Exercise, use any sentence of choice to translate English to German. The translation model you can use is "translation_en_to_de".


In [1]:
from transformers import pipeline
translator = pipeline("translation_en_to_de", model="t5-small")
print(translator("Meatballs are my favorite food", max_length=50))

  LARGE_SPARSE_SUPPORTED = LooseVersion(scipy_version) >= '0.14.0'


[{'translation_text': 'Fleischballs sind mein Lieblingsgericht'}]
