<a href="https://colab.research.google.com/github/sachinkun21/HuggingFace_lessons/blob/main/IntroToHuggingFace.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install transformers

Collecting transformers
  Downloading transformers-4.18.0-py3-none-any.whl (4.0 MB)
[K     |████████████████████████████████| 4.0 MB 5.2 MB/s 
[?25hCollecting tokenizers!=0.11.3,<0.13,>=0.11.1
  Downloading tokenizers-0.12.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)
[K     |████████████████████████████████| 6.6 MB 37.7 MB/s 
[?25hCollecting sacremoses
  Downloading sacremoses-0.0.49-py3-none-any.whl (895 kB)
[K     |████████████████████████████████| 895 kB 43.8 MB/s 
Collecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 52.8 MB/s 
[?25hCollecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.5.1-py3-none-any.whl (77 kB)
[K     |████████████████████████████████| 77 kB 5.8 MB/s 
Installing collected packages: pyyaml, tokenizers, sacremoses, huggingface-hub, transformers
  Attempting uninstall: p

In [None]:
# !pip install transformers

from transformers import pipeline
from pprint import pprint
import pandas as pd

sample_text = """Apple Pencil (2nd generation) brings your work to life.
 With imperceptible lag, pixel-perfect precision, and tilt and pressure sensitivity, 
 it transforms into your favorite creative instrument, your paint brush, your charcoal, or your pencil."""

### Text Classification:
 By default, the text-classification pipeline uses a model that’s designed for sentiment analysis, but it also supports multiclass and multilabel classification.

In [None]:
classifier = pipeline("text-classification")
output = classifier(sample_text)
pprint(output)

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


Downloading:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/255M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

[{'label': 'POSITIVE', 'score': 0.9995018243789673}]


### Batching inference:

In [None]:
classifier = pipeline("text-classification")

batch_text = ["Apple Pencil (2nd generation) brings your work to life", 
              "Apple pencil is not good for drawings.",
              "It's decent while working with illustrations but not very good.",
              "I have to goto work today"]

output = classifier(batch_text)

pd.DataFrame(output)

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


Unnamed: 0,label,score
0,POSITIVE,0.999691
1,NEGATIVE,0.999791
2,NEGATIVE,0.984072
3,POSITIVE,0.921294


### NER: Named Entity Recognition:


Predicting the sentiment of customer feedback is a good first step, but you often want to know if the feedback was about a particular item or service. In NLP, real-world objects like products, places, and people are called named entities, and extracting them from text is called named entity recognition (NER). We can apply NER by loading the corresponding pipeline and feeding our sample_text to it:

In [None]:
translator_En_to_De = pipeline("ner")                   # model args can be provided. For example: pipeline("translation_en_to_de", model = "Helsinki-NLP/opus-mt-en-de")     

sample_text = "Ben joined Apple conference last year in WWDC, California."
output = translator_En_to_De(sample_text)
pd.DataFrame(output)

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english)


Unnamed: 0,entity,score,index,word,start,end
0,I-PER,0.997855,1,Ben,0,3
1,I-ORG,0.992856,3,Apple,11,16
2,I-LOC,0.833059,8,W,41,42
3,I-LOC,0.949345,9,##WD,42,44
4,I-LOC,0.944252,10,##C,44,45
5,I-LOC,0.999409,12,California,47,57


### Question-Answering:
In question answering, we provide the model with a passage of text called the context, along with a question whose answer we’d like to extract. The model then returns the span of text corresponding to the answer. Let’s see what we get when we ask a specific question about our customer 

In [None]:
QnA = pipeline("question-answering")                      

sample_text = """Hi, I bought 1st gen apple pencil for my ipad 2nd gen but it doesn't work. I want to exchange it with 2nd apple pencil that works with 2nd gen ipad"""

question = "What does the customer want?"
output = QnA(question  = question , context = sample_text )
pprint(output)

No model was supplied, defaulted to distilbert-base-cased-distilled-squad (https://huggingface.co/distilbert-base-cased-distilled-squad)


{'answer': 'exchange it with 2nd apple pencil',
 'end': 118,
 'score': 0.3618432283401489,
 'start': 85}


### Summarization:
The goal of text summarization is to take a long text as input and generate a short version with all the relevant facts. This is a much more complicated task than the previous ones since it requires the model to generate coherent text. In what should be a familiar pattern by now, we can instantiate a summarization pipeline as follows:

In [None]:
summarizer = pipeline("summarization")                   # model args can be provided. For example: pipeline("translation_en_to_de", model = "Helsinki-NLP/opus-mt-en-de")     

sample_text = """Apple Pencil (2nd generation) brings your work to life.
 With imperceptible lag, pixel-perfect precision, and tilt and pressure sensitivity, 
 it transforms into your favorite creative instrument, your paint brush, your charcoal, or your pencil."""

output = summarizer(sample_text, max_length = 10 )
pprint(output)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 (https://huggingface.co/sshleifer/distilbart-cnn-12-6)


Downloading:   0%|          | 0.00/1.76k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.14G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Your min_length=56 must be inferior than your max_length=10.


[{'summary_text': ' Apple Pencil (2nd generation'}]


### Translation:
Like summarization, translation is a task where the output consists of generated text. Let’s use a translation pipeline to translate an English text to German:

In [None]:
translator = pipeline("translation_en_to_de")                   # model args can be provided. For example: pipeline("translation_en_to_de", model = "Helsinki-NLP/opus-mt-en-de")     

sample_text = """Apple Pencil (2nd generation) brings your work to life.
 With imperceptible lag, pixel-perfect precision, and tilt and pressure sensitivity, 
 it transforms into your favorite creative instrument, your paint brush, your charcoal, or your pencil."""

output = translator(sample_text )
pprint(output)

No model was supplied, defaulted to t5-base (https://huggingface.co/t5-base)


Downloading:   0%|          | 0.00/1.17k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/850M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/773k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.32M [00:00<?, ?B/s]

[{'translation_text': 'Apple Pencil (2. Generation) bringt Ihre Arbeit zum '
                      'Leben: Mit unerkennbarer Verzögerung, pixelperfekter '
                      'Präzision und Neigungs- und Druckempfindlichkeit '
                      'verwandelt es sich in Ihr Lieblingswerkzeug, Ihren '
                      'Malpinsel, Ihre Kohle oder Ihren Bleistift.'}]


### Text-Generation:
Let’s say you would like to be able to provide faster replies to customer feedback by having access to an autocomplete function. With a text generation model you can do this as follows:

In [None]:
generator = pipeline("text-generation")                   # model args can be provided. For example: pipeline("translation_en_to_de", model = "Helsinki-NLP/opus-mt-en-de")     

sample_text = """Hi, I bought apple pencil for my ipad 2nd gen but it doesn't work. It would be really helpful i could exchange it with new gen pencil that works with 2nd gen ipad."""
sample_output = """Dear Customer, I am sorry to hear that your apple pencil wasn't working"""

prompt = sample_text+ " Automated Response from bot: " +sample_output

output = generator(prompt , max_length = 80)
pprint(output[0]['generated_text'])

No model was supplied, defaulted to gpt2 (https://huggingface.co/gpt2)


Downloading:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/523M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/0.99M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


("Hi, I bought apple pencil for my ipad 2nd gen but it doesn't work. It would "
 'be really helpful i could exchange it with new gen pencil that works with '
 '2nd gen ipad. Automated Response from bot: Dear Customer, I am sorry to hear '
 "that your apple pencil wasn't working correctly with the other ipad. Please "
 'confirm with the IPAD that the device supports IP')
