# TransformersPipeline object

This notebook leverages the powerful `pipeline` object to demonstrate a diverse set of Natural Language Processing (NLP) tasks. Each task showcases the versatility and capabilities of the underlying models.

1. **Sentiment Analysis:**
   Explore the sentiment behind textual content, whether it's positive, negative, or neutral. This is particularly useful for understanding user opinions and emotions in reviews, social media, or customer feedback.

2. **Zero-Shot Classification:**
   Perform text classification even when specific labels or categories are not predefined. The model has the ability to generalize and predict the most relevant classes without prior training on those specific labels.

3. **Text Generation:**
   Generate coherent and contextually relevant text based on a given prompt. This can be applied to creative writing, content creation, or even automated responses.

4. **Fill Mask:**
   Fill in the missing or masked parts of a sentence to enhance understanding or complete sentences. This is particularly useful in scenarios where information retrieval or completion is required.

5. **Named Entity Recognition (NER):**
   Identify and classify named entities (such as people, organizations, locations, etc.) within a given text. This task is essential for information extraction and structuring unstructured text data.

6. **Question and Answer:**
   Create a system that can answer questions based on a given context. This is commonly used in chatbots, virtual assistants, and information retrieval systems.

7. **Summarization:**
   Generate concise and informative summaries of longer pieces of text. This is valuable for quickly understanding the key points in articles, documents, or other lengthy content.

8. **Translation:**
   Translate text from one language to another, enabling cross-language communication and understanding. This can be beneficial for breaking language barriers in various applications.

9. **Feature Extraction:**
   Extract relevant features from text data, providing valuable insights for further analysis or machine learning tasks. This step is crucial in representing textual information in a format suitable for downstream tasks.

Feel free to experiment with each task, exploring the capabilities of the pipeline for a comprehensive understanding of NLP model functionalities.


In [8]:
!pip install datasets evaluate transformers[sentencepiece]

Collecting datasets
  Downloading datasets-2.16.1-py3-none-any.whl (507 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m507.1/507.1 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting evaluate
  Downloading evaluate-0.4.1-py3-none-any.whl (84 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
Collecting dill<0.3.8,>=0.3.0 (from datasets)
  Downloading dill-0.3.7-py3-none-any.whl (115 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.3/115.3 kB[0m [31m10.4 MB/s[0m eta [36m0:00:00[0m
Collecting multiprocess (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl (134 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m10.5 MB/s[0m eta [36m0:00:00[0m
Collecting responses<0.19 (from evaluate)
  Downloading responses-0.18.0-py3-none-any.whl (38 kB)
INFO: pip is looking at multiple versions of multiprocess 

# 1.  Sentiment Analysis

In [11]:
# Import the pipeline function from the transformers library
from transformers import pipeline

# Initialize a pipeline for sentiment analysis
classifier = pipeline("sentiment-analysis")

# Use the classifier to analyze the sentiment of a text
classifier("I love AI and want to know more about the current geneative AI.")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9966887831687927}]

In [15]:
#
classifier(
    ["cricekt is a very fun game, i love it.", "I hate this so much!"]
)

[{'label': 'POSITIVE', 'score': 0.9998769760131836},
 {'label': 'NEGATIVE', 'score': 0.9994558691978455}]

# 2. Zero-Shot Classification

In [16]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier(
    "This is a course on generative AI",
    candidate_labels=["education", "politics", "business"],
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

{'sequence': 'This is a course on generative AI',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.5734741687774658, 0.31685498356819153, 0.10967091470956802]}

#3. Text Generation

In [17]:
from transformers import pipeline

generator = pipeline("text-generation")
generator("In Generative AI course, we will teach you how to")

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In Generative AI course, we will teach you how to use neural networks to generate a realistic, real-time algorithm.\n\nThe course begins with the introduction to Deep Learning in software and then moves to practical systems for training neural networks that can'}]

In [19]:
from transformers import pipeline

generator = pipeline("text-generation", model="distilgpt2")
generator(
    "In this course, we will teach you how to",
    max_length=30,
    num_return_sequences=4,
)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, we will teach you how to practice the ability to learn in real time. But if an instructor has a lot of trouble with how'},
 {'generated_text': 'In this course, we will teach you how to create a new, dynamic way that can be a useful part of a healthy, well-being and'},
 {'generated_text': 'In this course, we will teach you how to apply various tools and technologies to improve your life, and how to practice those techniques, so you can'},
 {'generated_text': 'In this course, we will teach you how to navigate the world of chess. In the course we will present you how to explore the way chess is'}]

#4. Fill Mask

In [21]:
from transformers import pipeline

unmasker = pipeline("fill-mask")
unmasker("This AI course will teach you all about <mask> models.", top_k=3)

No model was supplied, defaulted to distilroberta-base and revision ec58a5b (https://huggingface.co/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'score': 0.15313869714736938,
  'token': 27930,
  'token_str': ' predictive',
  'sequence': 'This AI course will teach you all about predictive models.'},
 {'score': 0.07508847117424011,
  'token': 30412,
  'token_str': ' mathematical',
  'sequence': 'This AI course will teach you all about mathematical models.'},
 {'score': 0.05678444355726242,
  'token': 26739,
  'token_str': ' neural',
  'sequence': 'This AI course will teach you all about neural models.'}]

#5. Named-Entity-Recognition

In [22]:
from transformers import pipeline

ner = pipeline("ner", grouped_entities=True)
ner("My name is abdul samad and I teach NL course at Habib university Karachi.")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]



[{'entity_group': 'PER',
  'score': 0.97433543,
  'word': 'abdul samad',
  'start': 11,
  'end': 22},
 {'entity_group': 'ORG',
  'score': 0.73682445,
  'word': 'Habib',
  'start': 48,
  'end': 53},
 {'entity_group': 'LOC',
  'score': 0.9917813,
  'word': 'Karachi',
  'start': 65,
  'end': 72}]

#6. Question Answer

In [25]:
from transformers import pipeline

question_answerer = pipeline("question-answering")
question_answerer(
    question="What kind of place is Habib university?",
    context="My Name is abdul Samad, I work at Habib University, It is Liberal arts school",
)

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'score': 0.6395870447158813, 'start': 58, 'end': 70, 'answer': 'Liberal arts'}

#7. Summarization

In [26]:
from transformers import pipeline

summarizer = pipeline("summarization")
summarizer(
    """
    In the country of 241 million people, two-thirds are below the age of 30. A citizen becomes eligible to vote at the age of 18.

It is also a vast country, spanning mountainous terrain in its north, multiple deserts and a 990km (615 miles) coastline. On February 8, 90,582 polling stations will service voters who want to cast their ballots.

In the contest are 5,121 candidates. They belong either to Pakistan’s 167 registered political parties or are independents. The Pakistan Tehreek-e-Insaf (PTI) party of former Prime Minister Imran Khan has been barred from using its election symbol, the cricket bat, so its candidates will also be contesting as independents this time.

Only a little more than half of Pakistan’s electorate voted in the 2018 elections.

With a crackdown against Khan’s party ongoing, it is unclear whether the February 8 elections will see a lower turnout or a surge in the form of a silent protest vote in favour of PTI-aligned candidates..
"""
)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

[{'summary_text': ' On February 8, 90,582 polling stations will service voters who want to cast their ballots . In the country of 241 million people, two-thirds are below the age of 30 . Pakistan Tehreek-e-Insaf (PTI) party of former Prime Minister Imran Khan has been barred from using its election symbol, the cricket bat .'}]

#8. Translation

In [27]:
from transformers import pipeline

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")
translator("Ce cours est produit par Hugging Face.")

config.json:   0%|          | 0.00/1.42k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/301M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/802k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/778k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.34M [00:00<?, ?B/s]



[{'translation_text': 'This course is produced by Hugging Face.'}]

#9. Feature Extraction

In [None]:
from transformers import pipeline

# Instantiate the pipeline
pipe = pipeline("feature-extraction")

# Use the pipeline
features = pipe("This restaurant is awesome")

print(len(features[0][0]))


No model was supplied, defaulted to distilbert-base-cased and revision 935ac13 (https://huggingface.co/distilbert-base-cased).
Using a pipeline without specifying a model name and revision in production is not recommended.


768
