## SBRT 2024 - **An Introduction to Generative Artificial Intelligence with Applications in Telecommunications**

In [20]:
!pip install transformers



In [21]:
!pip install datasets



## SBrT'24 - Pipeline: Text generation

In [22]:
from transformers import pipeline

generator = pipeline("text-generation", model="distilgpt2")

res = generator("I am a good person and I will",
                max_length=300, num_return_sequences=1)

print(res)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'I am a good person and I will be pleased as the opportunity comes," she says.'}]


## SBrT'24 - Pipeline: zero-shot classification

In [23]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")

res = classifier("This is my candidate: I support her because she is left-wing!", 
	candidate_labels=["education", "politics", "economy"])

print(res)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'sequence': 'This is my candidate: I support her because she is left-wing!', 'labels': ['politics', 'economy', 'education'], 'scores': [0.9677773118019104, 0.021402737125754356, 0.010819985531270504]}


## SBrT'24 - Pipeline: Sentiment analysis

In [24]:
from transformers import pipeline
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# High-level description
classifier = pipeline("sentiment-analysis")

res = classifier("I am a good person!")

print(res)

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.999876856803894}]


In [25]:
from transformers import pipeline
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Detailed description
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)

res = classifier("I am a good person!")

print(res)

sequence = "I am a good person!"
res = tokenizer(sequence)
print(res)

tokens = tokenizer.tokenize(sequence)
print(tokens)

ids = tokenizer.convert_tokens_to_ids(tokens)
print(ids)

decoded_string = tokenizer.decode(ids)
print(decoded_string)


[{'label': 'POSITIVE', 'score': 0.999876856803894}]
{'input_ids': [101, 1045, 2572, 1037, 2204, 2711, 999, 102], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1]}
['i', 'am', 'a', 'good', 'person', '!']
[1045, 2572, 1037, 2204, 2711, 999]
i am a good person!


## SBrT'24 - Pipeline: Summarization

About the "do_sample" parameter:

do_sample=True: The model uses sampling to pick the next token in the sequence based on the probability distribution of all possible tokens. This introduces randomness, which can lead to more diverse and creative outputs.

do_sample=False (default): The model uses greedy decoding or beam search. In greedy decoding, the model selects the token with the highest probability at each step, resulting in more deterministic outputs but potentially less variety.

In [26]:
from transformers import pipeline

summarizer = pipeline("summarization", model="sshleifer/distilbart-cnn-12-6")

text = """Remo visited Paysandu this Saturday and won the Para classic game 3-2, in
an electrifying duel for the 9th round of the Brazilian Championship Series C.
In a lively second half, Leao opened the score 2-0 with Helio and Marlon, but
saw Papao react and tie in the final minutes, with Wesley Matos and Nicolas.
At 43, Wallace scored the third and sealed the victory. Remo won the classic game
against Paysandu in Series C. With the result, Remo took the provisional lead
of group A with 16 points, against 15 for Santa Cruz, who still plays in the
round and can retake the lead. Paysandu, on the other hand, remained at 11 points,
in 5th place, outside the zone of qualifying for the quarterfinals.
"""

summary = summarizer(text, max_length=80, min_length=5)

print(summary)

[{'summary_text': ' Remo beat Paysandu 3-2 in the 9th round of the Brazilian Championship Series C . Remo took the provisional lead of group A with 16 points, against 15 for Santa Cruz .'}]
