## SBRT 2024 - **An Introduction to Generative Artificial Intelligence with Applications in Telecommunications**

In [None]:
!pip install transformers

In [4]:
!pip install datasets

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting datasets
  Downloading datasets-3.0.1-py3-none-any.whl (471 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m471.6/471.6 kB[0m [31m903.6 kB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting pyarrow>=15.0.0
  Downloading pyarrow-17.0.0-cp39-cp39-manylinux_2_28_x86_64.whl (39.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m39.9/39.9 MB[0m [31m791.3 kB/s[0m eta [36m0:00:00[0m00:01[0m00:02[0m
Collecting multiprocess
  Downloading multiprocess-0.70.17-py39-none-any.whl (133 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m133.4/133.4 kB[0m [31m574.7 kB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting aiohttp
  Downloading aiohttp-3.10.8-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m680.1 kB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting requests>=2.32.2
  Using cached request

In [2]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")

res = classifier("I will not say anything")

print(res)

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


[{'label': 'NEGATIVE', 'score': 0.9992280006408691}]


In [3]:
from transformers import pipeline

generator = pipeline("text-generation", model="distilgpt2")

res = generator("I am a bad person and I will", max_length=300, num_return_sequences=1)

print(res)

config.json:   0%|          | 0.00/762 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/353M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


[{'generated_text': 'I am a bad person and I will just let her know that she needs to know that she is scared, upset and overwhelmed by the problems that come with her," she said, according to the report."Her behavior is a little like \'I want to come back, I want to see this happen, and I\'ve been through a couple of days right now with such a difficult time."'}]


In [None]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")

res = classifier("This is my candidate: I want my university degree", 
	candidate_labels=["education", "politics", "nerd"])

print(res)

In [None]:
from transformers import AutoTokenizer
import numpy as np
from datasets import load_dataset

dataset = load_dataset("glue", "cola")
dataset = dataset["train"]  # Just take the training split for now

tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-cased")
tokenized_data = tokenizer(dataset["sentence"], return_tensors="np", padding=True)
# Tokenizer returns a BatchEncoding, but we convert that to a dict for Keras
tokenized_data = dict(tokenized_data)

print(tokenized_data)

labels = np.array(dataset["label"])  # Label is already an array of 0 and 1

print(labels)

In [None]:
from transformers import pipeline
from transformers import AutoTokenizer, AutoModelForSequenceClassification

classifier = pipeline("sentiment-analysis")

res = classifier("I am a bad person!")

print(res)

model_name = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)

res = classifier("I am a bad person!")

print(res)

sequence = "I am a bad person!"
res = tokenizer(sequence)
print(res)

tokens = tokenizer.tokenize(sequence)
print(tokens)

ids = tokenizer.convert_tokens_to_ids(tokens)
print(ids)

decoded_string = tokenizer.decode(ids)
print(decoded_string)


In [None]:
from transformers import pipeline

summarizer = pipeline("summarization", model="sshleifer/distilbart-cnn-12-6")

text = """Remo visited Paysandu this Saturday and won the Para classic game 3-2, in
an electrifying duel for the 9th round of the Brazilian Championship Series C.
In a lively second half, Leao opened the score 2-0 with Helio and Marlon, but
saw Papao react and tie in the final minutes, with Wesley Matos and Nicolas.
At 43, Wallace scored the third and sealed the victory. Remo won the classic game
against Paysandu in Series C. With the result, Remo took the provisional lead
of group A with 16 points, against 15 for Santa Cruz, who still plays in the
round and can retake the lead. Paysandu, on the other hand, remained at 11 points,
in 5th place, outside the zone of qualifying for the quarterfinals.
"""

summary = summarizer(text, max_length=20, min_length=5, do_sample=False)

print(summary)