* Huggingface.co has a repository of state of art models available for transfer learning on 
  *  Natural Language Processing: text classification, named entity recognition, question answering, language modeling, summarization, translation, multiple choice, and text generation.
  * Computer Vision: image classification, object detection, and segmentation.
  * Audio: automatic speech recognition and audio classification.
  * Multimodal: table question answering, optical character recognition, information extraction from scanned documents, video classification, and visual question answering.
* Check the list of models available on Huggingface: https://huggingface.co/docs/transformers/index

* Huggingface Transformers support framework interoperability between PyTorch, TensorFlow, and JAX.

In [1]:
# Black Panther: Wakanda Forever Review

review_wakanda = "So much action, emotion and loss. Prepare your tissues because Black Panther Wakanda Forever will have you crying from the losses and defeats in characters. Major character returns and big moments which set up major movie plans. Get ready for the most action packed marvel film of 2022. The best way to end marvel’s 2022 from seeing the poor production and selection of characters in previous films and shows. Black Panther Wakanda Forever makes up for all of the mistakes Marvel Studios made this year."

In [2]:
# install huggingface with tensorlfow interface
!pip install transformers[tf-cpu]

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers[tf-cpu]
  Downloading transformers-4.24.0-py3-none-any.whl (5.5 MB)
[K     |████████████████████████████████| 5.5 MB 34.3 MB/s 
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
[K     |████████████████████████████████| 7.6 MB 55.4 MB/s 
Collecting huggingface-hub<1.0,>=0.10.0
  Downloading huggingface_hub-0.10.1-py3-none-any.whl (163 kB)
[K     |████████████████████████████████| 163 kB 72.1 MB/s 
Collecting tensorflow-cpu>=2.3
  Downloading tensorflow_cpu-2.10.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (214.4 MB)
[K     |████████████████████████████████| 214.4 MB 21 kB/s 
[?25hCollecting tensorflow-text
  Downloading tensorflow_text-2.10.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.9 MB)
[K     |████████████████████████████████| 5.

## Sentiment Analysis
* Supervised learning and classification
* Learning already done by Hugging Face model
* Load the model and performing testing on your dataset
* use "sentiment_analysis" in the pipeline

In [3]:
# import pipline from transformers
from transformers import pipeline

# create a classifier from a trained model on sentiment analysis
classifier = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/268M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

In [4]:
# Classify the movie review
classifier(review_wakanda)

[{'label': 'POSITIVE', 'score': 0.9965795874595642}]

In [5]:
# prepare text data for movie review "Monica O My Darling"
review_momd = "Unpredictability is one of the key ingredients to make a binge-worthy thriller. The audience is always on the edge trying to guess and figure what next? Director Vasan Bala (of 'Mard ko Dard Nahi Hota' fame) and his writer Yogesh Chandekar get that aspect right for the most part, but they miss out on creating an equally engaging plot and its execution. The story focuses largely on the actions of Jayant (Rajkummar Rao), a talented robotics expert, who has shrewdly scaled the corporate ladder at his organisation. Winning the trust of the company CEO and the love of his daughter has got him a plum position and a fat salary, but his success has also earned him an equal number of enemies and frenemies. Amidst all this, he gets sexually involved with the hottest girl in the office Monica (Huma Qureshi), who is notorious for her flings with multiple men at her workplace. But her greed gets the better of her and a diabolical plan is set in motion that soon gets out of control and dead bodies start piling up. From here on, this whodunit starts getting messier and more convoluted. At first, the sudden twists are enjoyable but after a point, as more characters are thrown into the mix, the screenplay starts losing pace. The plot twists become more bizarre and unconvincing and as you wait for the big reveal, even the unpredictability begins to fade. Before the last act, one can quite easily comprehend the finale and now it’s just about getting it over with. Rajkummar Rao has done so many such films and identical roles that it seems, the actor is now not even trying to be different. Huma Qureshi shines the brightest and seems to be having the maximum fun playing a well-defined and powerful role of Monica. Radhika Apte’s character of a cheeky cop is borderline annoying while Sikander Kher does a fine job of a disgruntled heir of a company, whose owner and his father clearly don’t value his presence. A string of cameos by actors like Radhika Mandan and Gulshan Deviah don’t add much value. The background score and the title track are perhaps the most enjoyable aspects of this clunky Netlflix thriller. OTT platforms are full of riveting and dark murder mysteries of all kinds."

In [6]:
classifier(review_momd)

[{'label': 'NEGATIVE', 'score': 0.969738245010376}]

## Zero-shot classification
* Zero-shot classification classifies texts that haven’t been labelled. 
* This is a common scenario in real-world projects because annotating text is usually time-consuming and requires domain expertise. 
* Use "zero-shot-classification" in pipeline

In [7]:
classifier = pipeline("zero-shot-classification")

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [8]:
classifier( #random quote
    "I'm unpredictable, I never know where I'm going until I get there, I'm so random, I'm always growing, learning, changing, I'm never the same person twice. But one thing you can be sure of about me; is I will always do exactly what I want to do.",
    candidate_labels=["education", "politics", "business"],
)

{'sequence': "I'm unpredictable, I never know where I'm going until I get there, I'm so random, I'm always growing, learning, changing, I'm never the same person twice. But one thing you can be sure of about me; is I will always do exactly what I want to do.",
 'labels': ['education', 'business', 'politics'],
 'scores': [0.507673442363739, 0.29703909158706665, 0.19528742134571075]}

In [9]:
classifier( #quote about course
    "I hate the course on Deep Learning for Business Applications",
    candidate_labels=["education", "politics", "business"],
)

{'sequence': 'I hate the course on Deep Learning for Business Applications',
 'labels': ['business', 'education', 'politics'],
 'scores': [0.9643696546554565, 0.022010477259755135, 0.013619836419820786]}

In [10]:
classifier( #quote about course
    "I hate the course on Deep Learning",
    candidate_labels=["education", "politics", "business"],
)

{'sequence': 'I hate the course on Deep Learning',
 'labels': ['business', 'education', 'politics'],
 'scores': [0.4819011986255646, 0.35492682456970215, 0.1631719321012497]}

## Text Generation
* Provide a prompt and the model will auto-complete it by generating the remaining text. 
* This is similar to the predictive text feature that is found on many phones. 
* Text generation involves randomness, so it’s normal if you don’t get the same results.

In [11]:
generator = pipeline("text-generation")

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/548M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [12]:
generator("The deep learning course, we will teach you how to")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'The deep learning course, we will teach you how to use your machine learning expertise in real-world tasks that are difficult for most people to get to.\n\nWe are very excited about the opportunity to help you get to grips with Python and Ruby'}]

Reference: https://huggingface.co/course/chapter1/3?fw=pt

## Mask Filling
* The idea of this task is to fill in the blanks in a given text
* use "fill-mask" in pipeline

In [13]:
unmasker = pipeline("fill-mask")

No model was supplied, defaulted to distilroberta-base and revision ec58a5b (https://huggingface.co/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading:   0%|          | 0.00/480 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/331M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [14]:
unmasker("If you do well in this exam next year, then my efforts <mask> successful.", top_k=1)

[{'score': 0.5706638097763062,
  'token': 58,
  'token_str': ' were',
  'sequence': 'If you do well in this exam next year, then my efforts were successful.'}]

In [15]:
unmasker("Students are not prepared <mask> that kind of question." , top_k=1)

[{'score': 0.9746550917625427,
  'token': 13,
  'token_str': ' for',
  'sequence': 'Students are not prepared for that kind of question.'}]

## Named Entity Recognition
* A task where the model has to find which parts of the input text correspond to entities such as persons, locations, or organizations. 
* use "ner" in pipeline

In [16]:
ner = pipeline("ner", grouped_entities=True)

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading:   0%|          | 0.00/998 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/213k [00:00<?, ?B/s]

  "`grouped_entities` is deprecated and will be removed in version v5.0.0, defaulted to"


In [17]:
ner("My name is Bidla and I run a business in Pilani.")

[{'entity_group': 'PER',
  'score': 0.99769133,
  'word': 'Bidla',
  'start': 11,
  'end': 16},
 {'entity_group': 'LOC',
  'score': 0.99510026,
  'word': 'Pilani',
  'start': 41,
  'end': 47}]

## Question & Answering
The question-answering pipeline answers questions using information from a given context.
* use "question-answering" in pipeline

In [18]:
question_answerer = pipeline("question-answering")

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading:   0%|          | 0.00/473 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/261M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/436k [00:00<?, ?B/s]

In [19]:
question_answerer(
    question="Where do I work?",
    context="My name is Bidla and I run a business in Pilani.",
)

{'score': 0.9543034434318542, 'start': 41, 'end': 47, 'answer': 'Pilani'}

In [20]:
question_answerer(
    question="What do I do?",
    context="My name is Bidla and I run a business in Pilani.",
)

{'score': 0.38804349303245544,
 'start': 23,
 'end': 37,
 'answer': 'run a business'}

## Summarization
* Summarization is the task of reducing a text into a shorter text while keeping all (or most) of the important aspects referenced in the text
* use "summarization" in pipeline

In [21]:
summarizer = pipeline("summarization")

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

In [22]:
summarizer(review_momd)

[{'summary_text': ' The story focuses largely on the actions of Jayant (Rajkummar Rao), a talented robotics expert, who has shrewdly scaled the corporate ladder at his organisation . Huma Qureshi shines the brightest and seems to be having the maximum fun playing a well-defined and powerful role of Monica . Sikander Kher does a fine job of a disgruntled heir of a company, whose owner and his father clearly don’t value his presence .'}]

## Translation
* Translation uses a default model if you provide a language pair in the task name (such as "translation_en_to_fr")
* The easiest way is to pick the model is to use on the Model Hub. 
* Use "translation" in pipeline
* Illustration: translating from French to English:

In [23]:
# sentencepiece is needed for translation
!pip install transformers[sentencepiece] dataset

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting dataset
  Downloading dataset-1.5.2-py2.py3-none-any.whl (18 kB)
Collecting banal>=1.0.1
  Downloading banal-1.0.6-py2.py3-none-any.whl (6.1 kB)
Collecting alembic>=0.6.2
  Downloading alembic-1.8.1-py3-none-any.whl (209 kB)
[K     |████████████████████████████████| 209 kB 42.5 MB/s 
[?25hCollecting Mako
  Downloading Mako-1.2.3-py3-none-any.whl (78 kB)
[K     |████████████████████████████████| 78 kB 8.3 MB/s 
Collecting sentencepiece!=0.1.92,>=0.1.91
  Downloading sentencepiece-0.1.97-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[K     |████████████████████████████████| 1.3 MB 52.7 MB/s 
Installing collected packages: Mako, sentencepiece, banal, alembic, dataset
Successfully installed Mako-1.2.3 alembic-1.8.1 banal-1.0.6 dataset-1.5.2 sentencepiece-0.1.97


In [24]:
#from transformers import sentencepiece
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")

Downloading:   0%|          | 0.00/1.42k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/301M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

ValueError: ignored

In [None]:
translator("Je voudrais un café .")

## Custom translation say from Hindi to English
* Use the model: Helsinki-NLP/opus-mt-hi-en
* Hugging Face Link: https://huggingface.co/Helsinki-NLP/opus-mt-hi-en

In [None]:
hindi_translator = pipeline("translation", model="Helsinki-NLP/opus-mt-hi-en")

In [None]:
hindi_translator("क्या मैं आपकी मदद कर सकता /सकती हुँ?")