## Pipeline:

* The most basic object in the 🤗 Transformers library is the pipeline() function. It connects a model with its necessary preprocessing and postprocessing steps, allowing us to directly input any text and get an intelligible answer:

In [1]:
!pip install transformers
from transformers import pipeline

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.25.1-py3-none-any.whl (5.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.8/5.8 MB[0m [31m44.9 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m63.9 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.10.0
  Downloading huggingface_hub-0.11.1-py3-none-any.whl (182 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m182.4/182.4 KB[0m [31m12.9 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.11.1 tokenizers-0.13.2 transformers-4.25.1


## Sentiment Analysis


In [2]:
classifier = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/268M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

In [8]:
classifier("I've been waiting for a HuggingFace course my whole life.")

[{'label': 'POSITIVE', 'score': 0.9598049521446228}]

In [None]:
classifier("routine story poor execution few laughs between the scenes")

[{'label': 'NEGATIVE', 'score': 0.9997280240058899}]

### can pass several sentences

In [None]:
classifier(["I've been waiting for a HuggingFace course my whole life.", "I hate this so much!"])

[{'label': 'POSITIVE', 'score': 0.9598049521446228},
 {'label': 'NEGATIVE', 'score': 0.9994558691978455}]

* By default, this pipeline selects a particular pretrained model that has been fine-tuned for sentiment analysis in English. The model is downloaded and cached when you create the classifier object. If you rerun the command, the cached model will be used instead and there is no need to download the model again.

* There are three main steps involved when you pass some text to a pipeline:

1. The text is preprocessed into a format the model can understand.
2. The preprocessed inputs are passed to the model.
3. The predictions of the model are post-processed, so you can make sense of them.

## Zero-shot classification

* We’ll start by tackling a more challenging task where we need to classify texts that haven’t been labelled. This is a common scenario in real-world projects because annotating text is usually time-consuming and requires domain expertise. For this use case, the zero-shot-classification pipeline is very powerful: it allows you to specify which labels to use for the classification, so you don’t have to rely on the labels of the pretrained model. You’ve already seen how the model can classify a sentence as positive or negative using those two labels, but it can also classify the text using any other set of labels you like.

In [None]:
classifier = pipeline("zero-shot-classification")
classifier(
    "This is a course about the Transformers library",
    candidate_labels=["education", "politics", "business"],
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

{'sequence': 'This is a course about the Transformers library',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.8445988297462463, 0.11197440326213837, 0.04342682659626007]}

In [None]:
classifier(["This is a course about the Transformers library","now a days education is business thanks to hugging face"],
    candidate_labels=["education", "politics", "business"],
)

[{'sequence': 'This is a course about the Transformers library',
  'labels': ['education', 'business', 'politics'],
  'scores': [0.8445988297462463, 0.11197440326213837, 0.04342682659626007]},
 {'sequence': 'now a days education is business thanks to hugging face',
  'labels': ['education', 'business', 'politics'],
  'scores': [0.5859395265579224, 0.4079386293888092, 0.006121867336332798]}]

This pipeline is called zero-shot because you don’t need to fine-tune the model on your data to use it. It can directly return probability scores for any list of labels you want!

## Text generation

* Now let’s see how to use a pipeline to generate some text. The main idea here is that you provide a prompt and the model will auto-complete it by generating the remaining text. This is similar to the predictive text feature that is found on many phones. Text generation involves randomness, so it’s normal if you don’t get the same results as shown below.

In [None]:
generator = pipeline("text-generation")

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/548M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [None]:
generator("hi, let me tell you about my self ",max_new_tokens = 100, pad_token_id=50256 )

[{'generated_text': 'hi, let me tell you about my self " The crowd roared. Suddenly, a huge roar sounded in all directions from the top of the tallest tower. The sound was deafening...\n\nFeng Yunsheng slowly took out his long gun.\n\nBut the bullet hit his face. The area below the hole was almost completely sealed under cultivation level one.\n\nA cold chill seeped down at him: "It\'s cold outside, it\'s a pain, what, what could happen?"\n\nHe didn\'t want'}]

## using other models

In [None]:
generator_distilgpt2 = pipeline("text-generation", model="distilgpt2")


In [None]:
generator("hi, let me tell you about my self ",max_length=1000,num_return_sequences=2, pad_token_id=50256 )

[{'generated_text': 'hi, let me tell you about my self ʿadın, a young man who was a teacher, a physician, a doctor and a musician once. I heard about this great story of this young boy, who would play a violin, that was on his way up to the highest stage in the history of music, in his university career when he saw a violin playing on the stage and went into the room to open it and hear the whole performance. So, even though he was an amateur teacher, he was able to play violin and that is it. Then he would take the violin with him and write that great story of a father playing the violin and learning about violin and about his father, and as a teacher, when this young man sang along with him for the moment. And this good story, that he played it, he wrote a great song in the studio for him. He always wanted to be a part of this concert, he wanted to be able to sing. It\'s great.\n\nIn my view, this kind of teacher is a lot like a musician. As a teacher you have to be able to create a 

## Mask filling

* The next pipeline you’ll try is fill-mask. The idea of this task is to fill in the blanks in a given text:

In [None]:
unmasker = pipeline("fill-mask")

No model was supplied, defaulted to distilroberta-base and revision ec58a5b (https://huggingface.co/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [None]:
unmasker("This course will teach you all about <mask> models.", top_k=2)


[{'score': 0.19619810581207275,
  'token': 30412,
  'token_str': ' mathematical',
  'sequence': 'This course will teach you all about mathematical models.'},
 {'score': 0.04052736610174179,
  'token': 38163,
  'token_str': ' computational',
  'sequence': 'This course will teach you all about computational models.'}]

> The top_k argument controls how many possibilities you want to be displayed. Note that here the model fills in the special <mask> word, which is often referred to as a mask token. Other mask-filling models might have different mask tokens, so it’s always good to verify the proper mask word when exploring other models. One way to check it is by looking at the mask word used in the widget.

## Named entity recognition

> Named entity recognition (NER) is a task where the model has to find which parts of the input text correspond to entities such as persons, locations, or organizations. Let’s look at an example:

In [None]:
ner = pipeline("ner", grouped_entities=True)

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading:   0%|          | 0.00/998 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/213k [00:00<?, ?B/s]



In [None]:
ner("Hi i am santhosh kurnapally i work at infosys in hyderabad location i love to eat biryani.")

[{'entity_group': 'PER',
  'score': 0.8463554,
  'word': 'santhos',
  'start': 8,
  'end': 15},
 {'entity_group': 'PER',
  'score': 0.5987679,
  'word': 'k',
  'start': 17,
  'end': 18},
 {'entity_group': 'LOC',
  'score': 0.9607418,
  'word': '##yderabad',
  'start': 50,
  'end': 58}]

## Translation

> For translation, you can use a default model if you provide a language pair in the task name (such as "translation_en_to_fr"), but the easiest way is to pick the model you want to use on the Model Hub. Here we’ll try translating from English to Hindi:

In [None]:
pip install transformers[sentencepiece]

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting sentencepiece!=0.1.92,>=0.1.91
  Downloading sentencepiece-0.1.97-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m16.0 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: sentencepiece
Successfully installed sentencepiece-0.1.97


In [None]:
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-hi")


Downloading:   0%|          | 0.00/812k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.07M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/2.10M [00:00<?, ?B/s]



In [None]:
translator("hi how are you ?")

[{'translation_text': 'हाय तुम कैसे हो?'}]

In [None]:
translator(["hi how are you ?","i want to talk to nellore pedda reddy right now"])

[{'translation_text': 'हाय तुम कैसे हो?'},
 {'translation_text': 'मैं अभी नेथ-हीन लाल करने के लिए बात करना चाहते हैं'}]