<a href="https://colab.research.google.com/github/iamsachinbagale/NLP/blob/main/Huggingface_Transformers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### How to use Hugging face transformers?

**Agenda:** Using Huggingface Transformers pre-trained models

**Resources:** https://huggingface.co/course/chapter2/2?fw=tf

#Pipeline

In [3]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis", device=0)
classifier("I've been waiting for a HuggingFace course my whole life.")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9598046541213989}]

**Behind the pipeline.**

pipeline groups together three steps: preprocessing, passing the inputs through the model, and postprocessing:




















In [5]:
classifier("Movie was waste of time")

[{'label': 'NEGATIVE', 'score': 0.9997877478599548}]

In [6]:
# https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment
classifier = pipeline("sentiment-analysis",model="cardiffnlp/twitter-roberta-base-sentiment",device=0)
classifier("Food was amazing")

config.json:   0%|          | 0.00/747 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/499M [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/150 [00:00<?, ?B/s]

[{'label': 'LABEL_2', 'score': 0.9718621969223022}]

In [7]:
classifier(
    ["Food was amazing", "Moview was waste of time"]
)

[{'label': 'LABEL_2', 'score': 0.9718621969223022},
 {'label': 'LABEL_0', 'score': 0.9271150231361389}]

## Question Answering

In [8]:
from transformers import pipeline

question_answerer = pipeline("question-answering", device=0)
question_answerer(
    question="Who is Sachin?",
    context="My name is Sachin and I work as AI Engineer",
)

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

{'score': 0.5901213884353638, 'start': 32, 'end': 43, 'answer': 'AI Engineer'}

## Named-Entity Recognition

In [10]:
from transformers import Pipeline

# ner = pipeline("ner", device=0)
ner = pipeline("ner", aggregation_strategy="simple", device=0)
ner("Sundar Pichai is CEO of Google")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'entity_group': 'PER',
  'score': 0.99562234,
  'word': 'Sundar Pichai',
  'start': 0,
  'end': 13},
 {'entity_group': 'ORG',
  'score': 0.9987669,
  'word': 'Google',
  'start': 24,
  'end': 30}]

## Summarization

In [11]:
from transformers import pipeline

summarizer = pipeline("summarization", device=0)
text = """
    America has changed dramatically during recent years. Not only has the number of
    graduates in traditional engineering disciplines such as mechanical, civil,
    electrical, chemical, and aeronautical engineering declined, but in most of
    the premier American universities engineering curricula now concentrate on
    and encourage largely the study of engineering science. As a result, there
    are declining offerings in engineering subjects dealing with infrastructure,
    the environment, and related issues, and greater concentration on high
    technology subjects, largely supporting increasingly complex scientific
    developments. While the latter is important, it should not be at the expense
    of more traditional engineering.
"""
summarizer(text, max_length=30, min_length=10)


No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

[{'summary_text': ' America has changed dramatically during recent years . There are declining offerings in engineering subjects dealing with infrastructure,  the environment, and related issues .'}]

## Text Generation

In [12]:
from transformers import pipeline

generator = pipeline("text-generation", device=0)
generator("In this course, we will teach you how to")

No model was supplied, defaulted to openai-community/gpt2 and revision 6c0e608 (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, we will teach you how to achieve success in the real world. In all three chapters, you will experience:\n\nThe joy of achieving success;\n\nThe challenges of working hard and making it happen,\n\nRecogn'}]