<a href="https://colab.research.google.com/github/harnalashok/LLMs/blob/main/Quickstart%20pipelines%20in%20huggingface%20simple.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Last amended: 22nd April, 2024

In [None]:
# It is assumed that your huggingface access token is saved in colab
#  as HF_TOKEN
# https://www.kdnuggets.com/2023/02/simple-nlp-pipelines-huggingface-transformers.html

# KDnuggets [reference](https://www.kdnuggets.com/2023/02/simple-nlp-pipelines-huggingface-transformers.html)

In [None]:
!pip install transformers

In [None]:
from transformers import pipeline

# What are `pipelines`

`pipelines` are objects that abstract (or hide) complex code and operations to perform one stated NLP task. Pipeline takes some text as input and provides the output. Depending upon the specific task to be performed, `pipeline` assembles the needed components. Hence, while constructing the `pipeline`, parameters of components are also be be passed on. Most of the time, a `pipeline` works with default parameters.

<div align = "center" ;><h1>1. Text Summarization</h1></div>



The input to this task is a corpus of text and the model will output a summary of it based on the expected length mentioned in the parameters. Here, we have kept minimum length as 5 and maximum length as 30.   
One can also choose from the other options of models that have been fine-tuned for the summarization task -
- `bart-large-cnn`,
- `t5-small`,
- `t5-large`,
- `t5-3b`,
- `t5-11b`.

You can check out the complete list of available text summarization models [here](https://huggingface.co/models?pipeline_tag=summarization&sort=downloads).

In [None]:
summarizer = pipeline(
                      "summarization",
                       model="t5-base",
                       tokenizer="t5-base",
                       framework="tf"
                     )

input = "Parents need to know that Top Gun is a blockbuster 1980s action thriller starring Tom Cruise that's chock full of narrow escapes, chases, and battles. But there are also violent and upsetting scenes, particularly the death of a main character, which make it too intense for younger kids. There's also one graphic-for-its-time sex scene (though no explicit nudity) and quite a few shirtless men in locker rooms and, in one iconic sequence, on a beach volleyball court. Winning is the most important thing to all the pilots, who try to intimidate one another with plenty of posturing and banter -- though when push comes to shove, loyalty and friendship have important roles to play, too. While sexism is noticeable and almost all characters are men, two strong women help keep some of the objectification in check."

summarizer(input, min_length=5, max_length=30)

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

All PyTorch model weights were used when initializing TFT5ForConditionalGeneration.

All the weights of TFT5ForConditionalGeneration were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

[{'summary_text': '1980s action thriller starring Tom Cruise is chock full of chases and battles . there are also violent and upsetting scenes,'}]

# 2. Question Answering


In this task, we provide a question and a context. The model will choose the answer from the context based on the highest probability score. It also provides the starting and ending positions of the text.   

Refer [here](https://huggingface.co/models?pipeline_tag=question-answering), to check the full list of available models for the Question-Answering task.

In [None]:
qa_pipeline = pipeline(model="deepset/roberta-base-squad2")

qa_pipeline(
            question="Where do I work?",
            context="I work as a Data Scientist at a lab in University of Montreal. I like to develop my own algorithms.",
            )

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/496M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/79.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

{'score': 0.6422624588012695,
 'start': 39,
 'end': 61,
 'answer': 'University of Montreal'}

# 3. Named Entity Recognition


Named Entity Recognition deals with identifying and classifying the words based on the names of persons, organizations, locations and so on. The input is basically a sentence and the model will determine the named entity along with its category and its corresponding location in the text.   

Check out [here](https://huggingface.co/models?pipeline_tag=token-classification&sort=downloads), for other options of available models.

In [None]:
ner_classifier = pipeline(
                          model="dslim/bert-base-NER-uncased",
                          aggregation_strategy="simple"
                          )

sentence = "I like to travel in Montreal."
entity = ner_classifier(sentence)
print(entity)

config.json:   0%|          | 0.00/1.26k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

Some weights of the model checkpoint at dslim/bert-base-NER-uncased were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/39.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

[{'entity_group': 'LOC', 'score': 0.9976744, 'word': 'montreal', 'start': 20, 'end': 28}]


# 4. Part-of-Speech Tagging


PoS Tagging is useful to classify the text and provide its relevant parts of speech such as whether a word is a noun, pronoun, verb and so on. The model returns PoS tagged words along with their probability scores and respective locations.

In [None]:
pos_tagger = pipeline(
                      model="vblagoje/bert-english-uncased-finetuned-pos",
                      aggregation_strategy="simple",
                     )

pos_tagger("I am an artist and I live in Dublin")

config.json:   0%|          | 0.00/1.06k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

Some weights of the model checkpoint at vblagoje/bert-english-uncased-finetuned-pos were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

[{'entity_group': 'PRON',
  'score': 0.9995395,
  'word': 'i',
  'start': 0,
  'end': 1},
 {'entity_group': 'AUX',
  'score': 0.99849665,
  'word': 'am',
  'start': 2,
  'end': 4},
 {'entity_group': 'DET',
  'score': 0.9992281,
  'word': 'an',
  'start': 5,
  'end': 7},
 {'entity_group': 'NOUN',
  'score': 0.9985084,
  'word': 'artist',
  'start': 8,
  'end': 14},
 {'entity_group': 'CCONJ',
  'score': 0.99920964,
  'word': 'and',
  'start': 15,
  'end': 18},
 {'entity_group': 'PRON',
  'score': 0.99952817,
  'word': 'i',
  'start': 19,
  'end': 20},
 {'entity_group': 'VERB',
  'score': 0.9986745,
  'word': 'live',
  'start': 21,
  'end': 25},
 {'entity_group': 'ADP',
  'score': 0.99944466,
  'word': 'in',
  'start': 26,
  'end': 28},
 {'entity_group': 'PROPN',
  'score': 0.99815035,
  'word': 'dublin',
  'start': 29,
  'end': 35}]

# 5. Text Classification


We will perform sentiment analysis and classify the text based on the tone.   
The full list of models for text classification can be found [here](https://huggingface.co/models?pipeline_tag=text-classification).

In [None]:
text_classifier = pipeline(
                           model="distilbert-base-uncased-finetuned-sst-2-english"
                          )

text_classifier("This movie is horrible!")

config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

[{'label': 'NEGATIVE', 'score': 0.9997865557670593}]

# 6. Text Generation:

Access the full list of models for text generation [here](https://huggingface.co/models?pipeline_tag=text-generation).

In [None]:
text_generator = pipeline(model="gpt2")

text_generator("If it is sunny today then ", do_sample=False)

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'If it is sunny today then \xa0it will be cloudy tomorrow.\nI have been using this for a while now and I am very happy with it. I have been using it for a while now and I am very happy with it. I'}]

# 7. Text Translation:



Here, we will translate the language of text from one language to another. For example, we have chosen translation from English to French. We have used the basic `t5-small` model but you can access other advanced models [here](https://huggingface.co/models?pipeline_tag=translation).

In [None]:
en_fr_translator = pipeline(
                            "translation_en_to_fr",
                            model='t5-small'
                            )

en_fr_translator("Hi, How are you?")

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

[{'translation_text': 'Bonjour, Comment êtes-vous ?'}]

More On This Topic from KDnuggets

- [Answering Questions with HuggingFace Pipelines and Streamlit](https://www.kdnuggets.com/2021/10/simple-question-answering-web-app-hugging-face-pipelines.html)
- [Transform speech into knowledge with Huggingface/Facebook AI and expert.ai](https://www.kdnuggets.com/2021/09/expert-ai-speech-huggingface-facebook.html)
- [Fine-Tuning BERT for Tweets Classification with HuggingFace](https://www.kdnuggets.com/2022/01/finetuning-bert-tweets-classification-ft-hugging-face.html)
- [HuggingFace Has Launched a Free Deep Reinforcement Learning Course](https://www.kdnuggets.com/2022/05/huggingface-launched-free-deep-reinforcement-learning-course.html)
- [Master Transformers with This Free Stanford Course!](https://www.kdnuggets.com/2022/09/master-transformers-free-stanford-course.html)
- [Comparing Natural Language Processing Techniques: RNNs, Transformers, BERT](https://www.kdnuggets.com/comparing-natural-language-processing-techniques-rnns-transformers-bert)

In [None]:
############## DONE ###############