<a href="https://colab.research.google.com/github/luisgdelafuente/gnai/blob/main/Transformers_tests.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Generating text with GPT2
Este código instala las librerías Transformers y Xformers, y luego utiliza la librería Transformers para cargar un modelo pre-entrenado llamado "gpt2" y crear un pipeline de generación de texto. El pipeline utiliza el modelo GPT-2 para generar texto a partir de una prompt (frase inicial).


In [33]:
!pip install transformers

from transformers import pipeline
model = pipeline('text-generation', model='gpt2')

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [34]:
prompt = "I wonder if there´s a higher dimensional reality"

generated_text = model(prompt, max_length=250, num_return_sequences=1)

print(generated_text[0]['generated_text'])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


I wonder if there´s a higher dimensional reality. Or a higher consciousness... If that´s the case..." "No, your question is nonsense!" ―Sif and Alderaan, talking to Captain Alderaan on the bridge of the BFG-2A-R2-C, and Hoth.[src]

The BFG-2A-R2-C first appeared in the Buffalo Shipyard in the year 2000 and first appeared in The Essential Guide Trilogy. According to Alderaan, it was first seen in the Trench and First Battle Lines, and was first seen during a battle between a group of the Old Republic soldiers called the Battlecreekers.[8] On June 10, 2001, the BFG-2A-R2 crew reported to Dr. Zulka, a scientist with the BFG-2A-R2 Advanced Space System, stating that these two were not related, and that the three were speaking out of turn.

The mission was originally to investigate another anomaly, the Hoth anomaly, which had not been discovered by the BFG-2A-R2,[9] but, to date, been confirmed with the assistance of Dio, a


# Sentiment analysis, default model 

To use the default DistilGPT-2 model for sentiment analysis, you can simply remove the model and tokenizer arguments from the pipeline function. Here's the modified code:

In this example, we first install the transformers library and then import the pipeline module. We then create a sentiment analysis pipeline using pipeline("sentiment-analysis"). We pass the text we want to analyze, "I love my new shoes!", to the sentiment_analysis pipeline and get back a result, which we print out. The result contains a label ("positive", "negative", or "neutral") and a score indicating the level of confidence the model has in its prediction.

In [31]:
# Most basic approach: 
from transformers import pipeline

nlp = pipeline('sentiment-analysis')
result = nlp("My new shoes are ok")
print(result)

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9998170733451843}]


In [29]:
# Another approach 

# Install transformers library
# !pip install transformers

# Import necessary modules
from transformers import pipeline

# Create sentiment analysis pipeline with default DistilGPT-2 model
sentiment_analysis = pipeline("sentiment-analysis")

# Analyze sentiment of text
text = "Im not sure if I like my new shoes!"
result = sentiment_analysis(text)

# Print sentiment label and score
print(result[0]["label"])
print(result[0]["score"])


No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


NEGATIVE
0.9995133876800537


# Sentiment analysis, specific model 

In this modified code, we load the distilgpt2 model and tokenizer for sequence classification, which is a version of the GPT-2 model fine-tuned for sentiment analysis. We then pass this model and tokenizer to the sentiment-analysis pipeline using the model and tokenizer arguments, respectively. Finally, we analyze the sentiment of the text "I hate my new shoes!" and print the sentiment label and score of the result.

In [22]:

# Install transformers library
!pip install transformers

# Import necessary modules
from transformers import pipeline, AutoModelForSequenceClassification, AutoTokenizer

# Load GPT-2 model and tokenizer
model_name = "distilgpt2"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Create sentiment analysis pipeline
sentiment_analysis = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)

# Analyze sentiment of text
text = "I love my new shoes!"
result = sentiment_analysis(text)

# Print sentiment label and score
print(result[0]["label"])
print(result[0]["score"])



Some weights of the model checkpoint at distilgpt2 were not used when initializing GPT2ForSequenceClassification: ['lm_head.weight']
- This IS expected if you are initializing GPT2ForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing GPT2ForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at distilgpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


LABEL_0
0.02248859405517578


# Generating missing words

Here's an example of using the fill-mask pipeline in Transformers to generate missing words in a sentence. This code will use the bert-base-uncased model to generate the missing word in the sentence "I want to eat a [MASK] for breakfast." The fill-mask pipeline will generate multiple possible words to fill in the blank, and the top 5 predictions will be printed to the console.

In [36]:
from transformers import pipeline

# Create the pipeline
fill_mask = pipeline("fill-mask", model="bert-base-uncased")

# Generate a sentence with a missing word
text = "I want to eat a [MASK] for breakfast."

# Use the pipeline to fill the missing word
results = fill_mask(text)

# Print the top 5 predictions
for result in results[:5]:
    print(result["sequence"])


Downloading (…)lve/main/config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Downloading (…)okenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

i want to eat a sandwich for breakfast.
i want to eat a little for breakfast.
i want to eat a lot for breakfast.
i want to eat a burger for breakfast.
i want to eat a fish for breakfast.


# Entity recognition in a sentence

Here's a sample code for named entity recognition using the pre-trained BERT model. In this code, we use the pipeline() function to create a named entity recognition pipeline. We specify the pre-trained BERT model and its tokenizer to be used for this task. Then, we pass a text containing named entities to the pipeline and obtain the entities and their labels as the output. Finally, we print the entities and their labels using a loop.

In [39]:
# Import necessary modules
from transformers import pipeline

# Create named entity recognition pipeline
ner = pipeline("ner", model="bert-base-cased", tokenizer="bert-base-cased")

# Analyze named entities in text
text = "Paris is the capital of France"
result = ner(text)

# Print named entities and their labels
print (result)


Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForTokenClassification: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-cas

[{'entity': 'LABEL_0', 'score': 0.6753529, 'index': 1, 'word': 'Paris', 'start': 0, 'end': 5}, {'entity': 'LABEL_0', 'score': 0.7050678, 'index': 2, 'word': 'is', 'start': 6, 'end': 8}, {'entity': 'LABEL_0', 'score': 0.7615427, 'index': 3, 'word': 'the', 'start': 9, 'end': 12}, {'entity': 'LABEL_0', 'score': 0.6836092, 'index': 4, 'word': 'capital', 'start': 13, 'end': 20}, {'entity': 'LABEL_0', 'score': 0.60161495, 'index': 5, 'word': 'of', 'start': 21, 'end': 23}, {'entity': 'LABEL_0', 'score': 0.5857065, 'index': 6, 'word': 'France', 'start': 24, 'end': 30}]
