In [None]:
#Install the Transformers, Datasets, and Evaluate libraries to run this notebook.
!pip install Datasets Evaluate Transformers[sentencepiece]

Collecting Datasets
  Downloading datasets-2.18.0-py3-none-any.whl (510 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m510.5/510.5 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting Evaluate
  Downloading evaluate-0.4.1-py3-none-any.whl (84 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
Collecting dill<0.3.9,>=0.3.0 (from Datasets)
  Downloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m13.4 MB/s[0m eta [36m0:00:00[0m
Collecting xxhash (from Datasets)
  Downloading xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m10.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting multiprocess (from Datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl (134 kB)
[2K     [90m━━━━━━━

In [None]:
from transformers import pipeline

#####Sentiment-analysis = Sentiment analysis is an application of natural language processing (NLP) technologies that train computer software to understand text in ways similar to humans. The analysis typically goes through several stages before providing the final result.

In [None]:
classifier = pipeline("sentiment-analysis")
classifier(["I love dogs","I hate cats","She is not liking me"])

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.999713122844696},
 {'label': 'NEGATIVE', 'score': 0.9943310022354126},
 {'label': 'NEGATIVE', 'score': 0.9996800422668457}]

Some pipeline examples
*   feature-extraction (get the vector representation of a text)
*   fill-mask
*   ner (named entity recognition)
*   question-answering
*   sentiment-analysis
*   summarization
*   text-generation
*   translation
*   zero-shot-classification










**Zero-shot classification:-** We’ll start by tackling a more challenging task where we need to classify texts that haven’t been labelled. This is a common scenario in real-world projects because annotating text is usually time-consuming and requires domain expertise. For this use case, the zero-shot-classification pipeline is very powerful: it allows you to specify which labels to use for the classification, so you don’t have to rely on the labels of the pretrained model. You’ve already seen how the model can classify a sentence as positive or negative using those two labels — but it can also classify the text using any other set of labels you like.





In [None]:
from transformers import pipeline
classifier = pipeline("zero-shot-classification")
classifier(
    ["This is course about Transformers library.","Australia won the world cup","Current prime minister is Narendra Modi"],
    candidate_labels = ["education","sports","politics"]
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'sequence': 'This is course about Transformers library.',
  'labels': ['education', 'sports', 'politics'],
  'scores': [0.933477520942688, 0.039382971823215485, 0.027139505371451378]},
 {'sequence': 'Australia won the world cup',
  'labels': ['sports', 'politics', 'education'],
  'scores': [0.9810143113136292, 0.01218472234904766, 0.006801076233386993]},
 {'sequence': 'Current prime minister is Narendra Modi',
  'labels': ['politics', 'education', 'sports'],
  'scores': [0.9814908504486084, 0.009600541554391384, 0.008908591233193874]}]

**Text Generation:-** Now let’s see how to use a pipeline to generate some text. The main idea here is that you provide a prompt and the model will auto-complete it by generating the remaining text. This is similar to the predictive text feature that is found on many phones. Text generation involves randomness, so it’s normal if you don’t get the same results as shown below.

In [None]:
from transformers import pipeline
generator = pipeline("text-generation")
generator("Elon musk is ")
#response = generator(["Elon Musk","Information about Cane Corso dog."],num_return_sequences=1,max_length=200)

No model was supplied, defaulted to openai-community/gpt2 and revision 6c0e608 (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Elon musk is \xa0the largest of the three, having two massive horns and a broad neck with a long front, and an open, open neck and wide, broad back.\nIf I was to look at our list and pick off'}]

In [None]:
#Using any model from the hub in the pipeline
generator = pipeline("text-generation", model = "distilgpt2")
generator("Tesla car is", max_length = 40, num_return_sequences=2)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Tesla car is made into some very cool, well made stuff. I can‹t wait to test and see what it sounds like!\nI am very, very happy with this car\nSo'},
 {'generated_text': 'Tesla car is expected to hit road next year.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n'}]

fill-mask:- The idea of this task is to fill in the blanks in a given text:

In [None]:
unmasker = pipeline("fill-mask")
unmasker("Labrador is a <mask> dog",top_k=2)
#Using Bert Base model
#unmasker = pipeline("fill-mask",model="bert-base-cased")
#unmasker("Labrador is a [MASK] dog",top_k=2)

No model was supplied, defaulted to distilbert/distilroberta-base and revision ec58a5b (https://huggingface.co/distilbert/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at distilbert/distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'score': 0.07461053133010864,
  'token': 3906,
  'token_str': ' rescue',
  'sequence': 'Labrador is a rescue dog'},
 {'score': 0.02195952646434307,
  'token': 4716,
  'token_str': ' pet',
  'sequence': 'Labrador is a pet dog'}]

Named entity recognition:- Named entity recognition (NER) is a task where the model has to find which parts of the input text correspond to entities such as persons, locations, or organizations.

In [None]:
from transformers import pipeline

ner = pipeline("ner", grouped_entities = True)
ner("Narendra Damodardas Modi born 17 September 1950) is an Indian politician who has served as the 14th prime minister of India since May 2014. Modi was the chief minister of Gujarat from 2001 to 2014 and is the Member of Parliament (MP) for Varanasi. He is a member of the Bharatiya Janata Party (BJP) and of the Rashtriya Swayamsevak Sangh (RSS), a right wing Hindu nationalist paramilitary volunteer organisation. He is the longest-serving prime minister from outside the Indian National Congress.")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'entity_group': 'PER',
  'score': 0.9958556,
  'word': 'Narendra Damodardas Modi',
  'start': 0,
  'end': 24},
 {'entity_group': 'MISC',
  'score': 0.99216545,
  'word': 'Indian',
  'start': 55,
  'end': 61},
 {'entity_group': 'LOC',
  'score': 0.99635786,
  'word': 'India',
  'start': 118,
  'end': 123},
 {'entity_group': 'PER',
  'score': 0.99704623,
  'word': 'Modi',
  'start': 140,
  'end': 144},
 {'entity_group': 'LOC',
  'score': 0.99801093,
  'word': 'Gujarat',
  'start': 171,
  'end': 178},
 {'entity_group': 'LOC',
  'score': 0.9948859,
  'word': 'Varanasi',
  'start': 238,
  'end': 246},
 {'entity_group': 'ORG',
  'score': 0.9995317,
  'word': 'Bharatiya Janata Party',
  'start': 270,
  'end': 292},
 {'entity_group': 'ORG',
  'score': 0.9995197,
  'word': 'BJP',
  'start': 294,
  'end': 297},
 {'entity_group': 'ORG',
  'score': 0.9966948,
  'word': 'Rashtriya Swayamsevak Sangh',
  'start': 310,
  'end': 337},
 {'entity_group': 'ORG',
  'score': 0.9990833,
  'word': 'RSS',
  

Question answering:- The question-answering pipeline answers questions using information from a given context:

In [None]:
from transformers import pipeline

question_answerer = pipeline("question-answering")
question_answerer(
    question="Who is narendra modi?",
    context = "Narendra Damodardas Modi born 17 September 1950) is an Indian politician who has served as the 14th prime minister of India since May 2014. Modi was the chief minister of Gujarat from 2001 to 2014 and is the Member of Parliament (MP) for Varanasi. He is a member of the Bharatiya Janata Party (BJP) and of the Rashtriya Swayamsevak Sangh (RSS), a right wing Hindu nationalist paramilitary volunteer organisation. He is the longest-serving prime minister from outside the Indian National Congress."
)

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'score': 0.3242959976196289,
 'start': 52,
 'end': 72,
 'answer': 'an Indian politician'}

Summarization:- Summarization is the task of reducing a text into a shorter text while keeping all (or most) of the important aspects referenced in the text.

In [None]:
from transformers import pipeline

summary = pipeline("summarization")
summary("""
  Narendra Damodardas Modi born 17 September 1950) is an Indian politician who has served as the 14th prime minister of India since May 2014. Modi was the chief minister of Gujarat from 2001 to 2014 and is the Member of Parliament (MP) for Varanasi. He is a member of the Bharatiya Janata Party (BJP) and of the Rashtriya Swayamsevak Sangh (RSS), a right wing Hindu nationalist paramilitary volunteer organisation. He is the longest-serving prime minister from outside the Indian National Congress.

Modi was born and raised in Vadnagar in northeastern Gujarat, where he completed his secondary education. He was introduced to the RSS at the age of eight. His account of helping his father sell tea at the Vadnagar railway station has not been reliably corroborated. At age 18, he was married to Jashodaben Modi, whom he abandoned soon after, only publicly acknowledging her four decades later when legally required to do so. Modi became a full-time worker for the RSS in Gujarat in 1971. The RSS assigned him to the BJP in 1985 and he held several positions within the party hierarchy until 2001, rising to the rank of general secretary.

In 2001, Modi was appointed Chief Minister of Gujarat and elected to the legislative assembly soon after. His administration is considered complicit in the 2002 Gujarat riots, and has been criticised for its management of the crisis. A little over 1,000 people were killed, according to official records, three-quarters of whom were Muslim; independent sources estimated 2,000 deaths, mostly Muslim. A Special Investigation Team appointed by the Supreme Court of India in 2012 found no evidence to initiate prosecution proceedings against him.[e] While his policies as chief minister, which were credited for encouraging economic growth, were praised, Modi's administration was criticised for failing to significantly improve health, poverty and education indices in the state.[f] In the 2014 Indian general election, Modi led the BJP to a parliamentary majority, the first for a party since 1984. His administration increased direct foreign investment, and it reduced spending on healthcare, education, and social-welfare programmes. Modi began a high-profile sanitation campaign, controversially initiated the 2016 demonetisation of high-denomination banknotes and introduced the Goods and Services Tax, and weakened or abolished environmental and labour laws.
""")

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

[{'summary_text': ' Narendra Damodardas Modi is the 14th prime minister of India since May 2014 . He is a member of the Bharatiya Janata Party (BJP) and of the Rashtriya Swayamsevak Sangh (RSS) Modi was the chief minister of Gujarat from 2001 to 2014 . His administration is considered complicit in the 2002 Gujarat riots, and has been criticised for its management of the crisis .'}]

In [None]:
from transformers import pipeline

translate = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")
translate("Je m'appelle Kaushal et je travaille chez Vaayushop.")

[{'translation_text': 'My name is Kaushal and I work at Vaayushop.'}]

In [None]:
#Feature Extraction

#from transformers import pipeline

#extraction = pipeline("feature-extraction")
#extraction("My")