# NLP Pipeline Usage with Hugging Face Transformers

This notebook demonstrates various usages of Hugging Face's `transformers` library, focusing on different NLP tasks with pre-trained models like `t5-small`, `t5-base`, and `gpt2`. It covers the following tasks:

1. **Summarization**: Using the `t5-small` model to generate concise summaries from text content.
2. **Text Generation**: Leveraging `t5-base` and `gpt2` models to generate text based on input prompts.
3. **Sentiment Analysis**: Performing sentiment analysis to classify the sentiment of given sentences.
4. **Question Answering**: Answering questions by determining entailment relationships.
5. **Translation**: Translating English text to French.

Each task is implemented using the `pipeline` API, which simplifies the process of applying pre-trained models to various NLP tasks. This notebook is ideal for exploring the functionalities provided by Hugging Face's `transformers` library for common NLP use cases.


In [None]:
!pip install tensorflow
!pip install transformers



In [None]:
from transformers import pipeline

In [None]:
import transformers
print(transformers.__version__)

4.44.2


# t5-small model usage

In [None]:
summarizer = pipeline("summarization", model="t5-small", tokenizer="t5-small", truncation=True, framework="tf")

All PyTorch model weights were used when initializing TFT5ForConditionalGeneration.

All the weights of TFT5ForConditionalGeneration were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


In [None]:
# upload mlflow.txt before excution
with open("mlflow.txt", "r") as _f:
    print(summarizer(_f.read()))

[{'summary_text': 'MLflow is basically a tool/platform that manages the ML life-cycle . a cookiecutter template helps streamline creating, testing, running and deploying projects . this allows the data team to focus more on the implementation of the model .'}]


# t5-base model

In [None]:
generator = pipeline("text2text-generation", model="t5-base")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]



In [None]:
# Summarize
generator("summarize: Machine Learning in production environments is largely seen as the ultimate goal. Sometimes, deploying models can be difficult when automation is not part of the workflow. Creating a foundational process that is reliable and automated is complex and requires commitment from the team and the organization as a whole")



[{'generated_text': 'machine learning is a key to a successful production environment . a foundational process'}]

In [None]:
# Sentiment
out_put = generator("sst2 sentence: Automation takes hard work but allows you to have a solid deployment")



In [None]:
out_put[0]['generated_text']

'positive'

In [None]:
# Questions
generator("question: Is deploying models into production hard?")

[{'generated_text': 'not_entailment'}]

In [None]:
# Translation
generator("translate English to French: Automation takes hard work but allows you to have a solid deployment")

[{'generated_text': "L'automatisation exige beaucoup de travail, mais vous permet d'avoir un dé"}]

In [None]:
gpt2_generator = pipeline("text-generation", model="gpt2")

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [None]:
gpt2_text=gpt2_generator("some phrase here was thought to be", max_new_tokens=512)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [None]:
gpt2_text[0]['generated_text']

'some phrase here was thought to be more accurate. When the story first began, his son had been named Jack and was referred to as the man he was supposed to be, as well as his mother, mother\'s maiden name - and a picture of her life, but there was no mention of anything else. His mother died as a result, and the story was kept secret from him for over forty years. On his birthday, Mr. Cottrell was a student at the City College of Staten Island. He said he saw a picture of his mother in a museum there, but didn\'t go to see her because he thought she looked like a girl. He decided to play baseball at the time, and when he played he was in his first game, and he called his friend, who was a little more mature and had more than him, on a birthday call. At that time, his father didn\'t think he was doing well and was afraid of letting his family down. But in the summer of 1956, when he was nineteen-one, his aunt was called on a call from his old friend, who mentioned that he was one of th