<a href="https://colab.research.google.com/github/iamhasanhumane/Hugging_Face/blob/main/Chapter_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
!pip install huggingface_hub
!pip install transformers



## Pipeline Function

The Pipeline function returns an end-to-end object that performs an NLP task on one or several texts

 Pre-Processing ------ Model ----- Post-Processing

In [3]:
from transformers import pipeline

### Sentiment Analysis

The first task we will try the pipeline API on is sentiment analysis. It classifies texts as positive or negative

In [4]:
classifier = pipeline("sentiment-analysis")
classifier("I've been waiting for a HuggingFace course my whole life")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Device set to use cpu


[{'label': 'POSITIVE', 'score': 0.9516071081161499}]

we can pass multiple texts to the object returned by a pipeline to treat them together

In [8]:
classifier([
    "I've been waiting for a AI Engineer Job my whole life",
    "I hate this so much!"
])

[{'label': 'NEGATIVE', 'score': 0.9938697814941406},
 {'label': 'NEGATIVE', 'score': 0.9994558691978455}]

### Zero Shot Classification

The zero-shot-classification pipeline lets you select the labels for classification

In [9]:
classifier = pipeline("zero-shot-classification")
classifier(
    "This is a course about the Transformers library",
    candidate_labels=["education","politics","business"]
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision d7645e1 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cpu


{'sequence': 'This is a course about the Transformers library',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.8445994257926941, 0.11197380721569061, 0.04342673346400261]}

In [10]:
classifier(
    "Patience is the key to succeed in business",
    candidate_labels=["education","politics","business"]
)

{'sequence': 'Patience is the key to succeed in business',
 'labels': ['business', 'education', 'politics'],
 'scores': [0.994697093963623, 0.002830005483701825, 0.0024728807620704174]}

In [12]:
classifier(
    "Democracy is thr rule by the people , for the people and to the people",
    candidate_labels=["education","politics","business"]
)

{'sequence': 'Democracy is thr rule by the people , for the people and to the people',
 'labels': ['politics', 'business', 'education'],
 'scores': [0.784670352935791, 0.11012627184391022, 0.10520333051681519]}

### Text Generation

The text generation pipeline uses an input prompt to generate text

In [14]:
generator = pipeline("text-generation")
generator("In this course , we will teach you how to ")

No model was supplied, defaulted to openai-community/gpt2 and revision 607a30d (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course , we will teach you how to \xa0settle during the training process along with \xa0a basic level of self-control.\nWhat is Inadequate?\nIn each of the two basic courses described in this article,'}]

### Loading with Distill GPT2 Model

Here is another text generation pipeline , using the distillgpt2 model

In [16]:
distill_generator = pipeline("text-generation", model = "distilgpt2")
distill_generator(
    "In this course , we will teach you how to ",
    max_length   = 30,
    num_return_sequences = 2
)

Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "In this course , we will teach you how to \xa0 write a great blog. If you'd like to continue reading or listen to our podcast,"},
 {'generated_text': 'In this course , we will teach you how to 𝒥𝒰𝒰𝒰𝒰𝒰�'}]