# Deep Learning with RNNs (Sequence Models)

## Abstract

This Jupyter Notebook is for applying Hugging Face to various transformer models -  its library downloads pretrained models for Natural Language Understanding (NLU) tasks, such as analyzing the sentiment of a text, and Natural Language Generation (NLG), such as completing a prompt with new text or translating in another language.

Transformers provides general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between Jax, PyTorch and TensorFlow.

In this notebook, it will implement fill-mask model to generate inputs and labels from texts, question answering model can be used for answering questions, summarization is to summarize a document or an article into a shorter text, text generation to create a coherent portion of text that is a continuation from the given contextm amd some classification model is for classifying the texts, tokens and features. The sentences similarity and translation are operating below as well.

With each transformer model, at least one example will be given in the loaded models to show its results meet our task requirements or not. 

Adapted from Transformers Model in [Hugging Face](https://huggingface.co/), and [HuggingFace Crash Course](https://www.youtube.com/watch?v=GSt00_-0ncQ) in Youtube

## Table of contents

* [Installation](#installation)


* [Implementation](#implementation)

   
   * [Fill-Mask](#fill-mask)
   
       * [Explanation](#fm_explanation)
       
    
   * [Question Answering](#question_answering)
       
       * [Explanation](#qa_explanation)
       
    
   * [Summarization](#summarization)
   
       * [Explanation](#sm_explanation)
       
    
   * [Text Classification](#text_classification)
       
       * [Explanation](#tc_explanation)
       
    
   * [Text Generation](#text_generation)
   
       * [Explanation](#tg_explanation)
       
    
   * [Text2Text Generation](#text2text_generation)
   
       * [Explanation](#t2t_explanation)
       
    
   * [Token Classification](#token_classification)
   
       * [Explanation](#tkc_explanation)
       
    
   * [Translation](#translation)
   
       * [Explanation](#trans_explanation)
       
    
   * [Zero-Shot Classification](#zero-shot_classification)
   
       * [Explanation](#zs_explanation)
       
    
   * [Sentence Similarity](#sentence_similarity)
   
       * [Explanation](#ss_explanation)
       
    


## 0. Installing Transformers and Importing Dependencies <a class="anchor" id="installation"></a>

In [1]:
pip install transformers

Note: you may need to restart the kernel to use updated packages.


In [2]:
from transformers import pipeline

In [3]:
conda install pytorch torchvision -c pytorch

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.


Note: you may need to restart the kernel to use updated packages.


## 1. Loading Hugging Face Models  <a class="anchor" id="implementation"></a>

### 1-1. Load Fill-Mask Pipeline  <a class="anchor" id="fill-mask"></a>

Fill-Mask (10 Points)

Run a [Fill-Mask](https://huggingface.co/models?pipeline_tag=fill-mask&sort=downloads) language model. Explain the theory behind your model, and run it.  Analyze how well you think it worked.

In [4]:
unmasker = pipeline('fill-mask', model='bert-base-uncased')

Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/420M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/455k [00:00<?, ?B/s]

In [15]:
s = "Excuse me sir, do you speak [MASK]?"

In [16]:
unmasker(s)

[{'sequence': 'excuse me sir, do you speak english?',
  'score': 0.7569448947906494,
  'token': 2394,
  'token_str': 'english'},
 {'sequence': 'excuse me sir, do you speak french?',
  'score': 0.09214285016059875,
  'token': 2413,
  'token_str': 'french'},
 {'sequence': 'excuse me sir, do you speak italian?',
  'score': 0.02898499369621277,
  'token': 3059,
  'token_str': 'italian'},
 {'sequence': 'excuse me sir, do you speak spanish?',
  'score': 0.026109404861927032,
  'token': 3009,
  'token_str': 'spanish'},
 {'sequence': 'excuse me sir, do you speak german?',
  'score': 0.02030790224671364,
  'token': 2446,
  'token_str': 'german'}]

In [19]:
t = "I'm a student from [MASK] University."

In [20]:
unmasker(t)

[{'sequence': "i'm a student from the university.",
  'score': 0.22754523158073425,
  'token': 1996,
  'token_str': 'the'},
 {'sequence': "i'm a student from northwestern university.",
  'score': 0.0344078503549099,
  'token': 7855,
  'token_str': 'northwestern'},
 {'sequence': "i'm a student from stanford university.",
  'score': 0.032529089599847794,
  'token': 8422,
  'token_str': 'stanford'},
 {'sequence': "i'm a student from duke university.",
  'score': 0.03200804069638252,
  'token': 3804,
  'token_str': 'duke'},
 {'sequence': "i'm a student from columbia university.",
  'score': 0.031160930171608925,
  'token': 3996,
  'token_str': 'columbia'}]

### Explanation - Fill-Mask <a class="anchor" id="fm_explanation"></a>

It was a pretrained model on English language using a masked language modeling (MLM). 

Masked language modeling (MLM): taking a sentence, the model randomly masks 15% of the words in the input then run the entire masked sentence through the model and has to predict the masked words. This is different from traditional recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like GPT which internally mask the future tokens. It allows the model to learn a bidirectional representation of the sentence.

The model learns an inner representation of the English language that can then be used to extract features useful for downstream tasks, and can train a standard classifier using the features produced by the BERT model as inputs.

From the result we got, the model can analyze the masked token and their accuracy score. We can see when I set a sentence whose marked token is obviously language. The model perfectly gives me results of different languages. Then I set another example which is the name of the university is marked, and we check their results. The first result is the definite article, this one is obviously not the answer I expected. However, it makes sense. The rest of the results are great to complete the sentence and match the meaning. Generally speaking, the words predicted by the fill-mask model are quite accurate, and basically have the meaning of comparing the words before and after. But it can't give a complete text, it can only give an accuracy analysis in the filling of the blanks of the words.

### 1-2. Load Question Answering Pipeline  <a class="anchor" id="question_answering"></a>


Question Answering (10 Points)

Run a [Question Answering](https://huggingface.co/models?pipeline_tag=question-answering&sort=downloads) language model. Explain the theory behind your model, and run it.  Analyze how well you think it worked.

In [26]:
from transformers import pipeline

In [27]:
question_answerer = pipeline("question-answering")
context = """
🤗 Transformers is backed by the three most popular deep learning libraries — Jax, PyTorch and TensorFlow — with a seamless integration
between them. It's straightforward to train your models with one before loading them for inference with the other.
"""
question = "Which deep learning libraries back 🤗 Transformers?"
question_answerer(question=question, context=context)

No model was supplied, defaulted to distilbert-base-cased-distilled-squad (https://huggingface.co/distilbert-base-cased-distilled-squad)


Downloading:   0%|          | 0.00/473 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/249M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/208k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/426k [00:00<?, ?B/s]

{'score': 0.9777289032936096,
 'start': 78,
 'end': 105,
 'answer': 'Jax, PyTorch and TensorFlow'}

In [28]:
long_context = """
🤗 Transformers: State of the Art NLP

🤗 Transformers provides thousands of pretrained models to perform tasks on texts such as classification, information extraction,
question answering, summarization, translation, text generation and more in over 100 languages.
Its aim is to make cutting-edge NLP easier to use for everyone.

🤗 Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets and
then share them with the community on our model hub. At the same time, each python module defining an architecture is fully standalone and
can be modified to enable quick research experiments.

Why should I use transformers?

1. Easy-to-use state-of-the-art models:
  - High performance on NLU and NLG tasks.
  - Low barrier to entry for educators and practitioners.
  - Few user-facing abstractions with just three classes to learn.
  - A unified API for using all our pretrained models.
  - Lower compute costs, smaller carbon footprint:

2. Researchers can share trained models instead of always retraining.
  - Practitioners can reduce compute time and production costs.
  - Dozens of architectures with over 10,000 pretrained models, some in more than 100 languages.

3. Choose the right framework for every part of a model's lifetime:
  - Train state-of-the-art models in 3 lines of code.
  - Move a single model between TF2.0/PyTorch frameworks at will.
  - Seamlessly pick the right framework for training, evaluation and production.

4. Easily customize a model or an example to your needs:
  - We provide examples for each architecture to reproduce the results published by its original authors.
  - Model internals are exposed as consistently as possible.
  - Model files can be used independently of the library for quick experiments.

🤗 Transformers is backed by the three most popular deep learning libraries — Jax, PyTorch and TensorFlow — with a seamless integration
between them. It's straightforward to train your models with one before loading them for inference with the other.
"""

In [29]:
question_answerer(
    question=question,
    context=long_context
)

  return array(a, dtype, copy=False, order=order)


{'score': 0.9714912176132202,
 'start': 1892,
 'end': 1919,
 'answer': 'Jax, PyTorch and TensorFlow'}

### Explanation - Question Answering <a class="anchor" id="qa_explanation"></a>

This is a model which finds the answer to questions in given context. The question-answering pipeline is initialized to easily create the Question Answering pipeline, because it utilizes the DistilBERT model fine-tuned to SQuAD. After finding the possible answer with the best score, the offset mappings are used to find the corresponding answer in the context. When the context is very long, it might get truncated by the tokenizer. Then the most likely answer will be selected for each feature and the final answer is the one with the best score.

In the part of implementations, there are two examples to be adapted. A short context was provided and then ask the question about the context was. The model analyzes the answer and gets its score briefly. After that, we take a look at a long context and set the same question. We can know that the same answer we got from the longer context as well, yet an answer from a pair of start and end positions are different from the short context.

### 1-3. Load Summarization Pipeline  <a class="anchor" id="summarization"></a>

Summarization (10 Points)

Run a [Summarization](https://huggingface.co/models?pipeline_tag=summarization&sort=downloads) language model. Explain the theory behind your model, and run it.  Analyze how well you think it worked.

In [38]:
from transformers import pipeline

In [39]:
summarizer = pipeline("summarization")

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 (https://huggingface.co/sshleifer/distilbart-cnn-12-6)


Downloading:   0%|          | 0.00/1.76k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.14G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

#### Summarize Text

In [40]:
ARTICLE = """
You don’t always have to give your boss the finger
Maybe it’s your first day on the job. Perhaps your manager just made an announcement. You’ve been asked to scan your fingerprint every time you clock in and out. Is that even allowed?
From Hooters to Hyatt Hotels, employers tantalized by the promise of a futuristic, streamlined way to track workers’ attendance are starting to use time clock machines that fingerprint employees.
Vendors like Kronos and Allied Time say that because the machines are tied to your biometric information — unique characteristics such as your face, fingerprints, how you talk, and even how you walk — they provide a higher level of workplace security and limit employees’ ability to commit “time theft” by punching in for one another.
But the benefits for your boss may come at a cost to you — both your privacy and possibly your health.
With the global outbreak of COVID-19, your personal health could be at risk when using frequently touched screens and fingerprint scanners. The Centers for Disease Control says that coronavirus can remain on surfaces for hours, so screens and scanners should be regularly disinfected with cleaning spray or wipes. And you should wash your hands for 20 seconds or use alcohol-based hand sanitizer immediately after using one.
In addition to these health concerns, critics argue that biometric devices pose massive personal security issues, exposing workers to potential identity theft and subjecting them to possible surveillance from corporations and law enforcement.
In an amicus brief in a case before a federal court of appeals, a group of privacy advocates, including the ACLU and the EFF, wrote that “the immutability of biometric information” puts people “at risk of irreparable harm in the form of identity theft and/or tracking.”
“You can get a new phone, you can change your password, you can even change your Social Security number; you can’t change your face,” said Kade Crockford, the Technology for Liberty program director at ACLU of Massachusetts.
Companies facing legal action over their use of the machines range from fast food joints like McDonald’s and Wendy’s, to hotel chains like Marriott and Hyatt, to airlines like United and Southwest.
In some cases, the companies have countered in the lawsuits that their employees’ union agreement allows the use of the machines: “Southwest and United contend that the plaintiffs’ unions have consented — either expressly or through the collective bargaining agreements’ management-rights clauses — and that any required notice has been provided to the unions,” the court’s opinion states.
Other companies have not responded to requests for comment or have said they cannot comment on active litigation.
Privacy and labor laws have lagged behind the shifts in the American workplace. But in some places, you have the right to refuse and even sue.

Biometric Privacy Laws
As the collection and use of biometrics has exploded, lawmakers in three states have responded by passing laws restricting its deployment.
"""

In [43]:
summary = summarizer(ARTICLE, max_length=130, min_length=30, do_sample=False)

In [44]:
summary[0]['summary_text']

' Employers are starting to use time clock machines that fingerprint employees . The machines are tied to your unique characteristics such as your face, fingerprints, how you talk, and even how you walk . The Centers for Disease Control says that coronavirus can remain on surfaces for hours .'

### Explanation - Summarization <a class="anchor" id="sm_explanation"></a>

Summarization model is the task of summarizing a document or an article into a shorter text. BART is particularly effective when fine-tuned for text generation (e.g. summarization, translation) but also works well for comprehension tasks (e.g. text classification, question answering). This particular checkpoint has been fine-tuned on CNN Daily Mail, a large collection of text-summary pairs.

There are two types of Text Summarization, one is Extractive Type and another one is Abstractive Type. Extractive summarization takes the original text and extracts information that is identical to it. In other words, rather than providing a unique summary based on the full content, it will rate each sentence in the document against all others, based on how well each line explains.

Hugging Face Transformer falls under abstractive type text summarization technique.

From the result, the given article was summarized and we can see that it summarized by the requirements, such as max_length=130, min length=30. It provides a unique summary based on the full content and can get good scores even when pre-training with a very small sample.

### 1-4. Load Text Classification Pipeline   <a class="anchor" id="text_classification "></a>

Text Classification (10 Points)

Run a [Text Classification](https://huggingface.co/models?pipeline_tag=text-classification&sort=downloads) language model. Explain the theory behind your model, and run it.  Analyze how well you think it worked.

In [72]:
from transformers import pipeline

In [73]:
classifier = pipeline("text-classification",model='bhadresh-savani/distilbert-base-uncased-emotion', return_all_scores=True)

Downloading:   0%|          | 0.00/768 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/255M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/291 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

In [74]:
prediction = classifier("I love using transformers. The best part is wide range of support and its easy to use", )

In [75]:
print(prediction)

[[{'label': 'sadness', 'score': 0.000679271062836051}, {'label': 'joy', 'score': 0.9959298968315125}, {'label': 'love', 'score': 0.0009452462545596063}, {'label': 'anger', 'score': 0.001805522944778204}, {'label': 'fear', 'score': 0.0004111044108867645}, {'label': 'surprise', 'score': 0.00022885717044118792}]]


### Explanation - Text Classification <a class="anchor" id="tc_explanation"></a>

This model is created with knowledge distillation during the pre-training phase which reduces the size of a BERT model by 40%, while retaining 97% of its language understanding. The model was fine-tuned and evaluated on datasets from diverse text sources to enhance generalization across different types of texts. 

In this model, I only use a small text to test the classifier and we got its sentiment analysis and accuracy scores of different emotional texts. The above example is a good example of positive emotions. We can see its keywords about joy, so this prediction is very accurate and precise.

### 1-5. Load Text Generation Pipeline   <a class="anchor" id="text_generation"></a>

Text Generation (10 Points)

Run a [Text Generation](https://huggingface.co/models?pipeline_tag=text-generation&sort=downloads) language model. Explain the theory behind your model, and run it.  Analyze how well you think it worked.

In [77]:
from transformers import pipeline, set_seed

In [78]:
generator = pipeline('text-generation', model='gpt2')

Downloading:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/523M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/0.99M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

In [79]:
set_seed(42)

In [82]:
generator("Hello, I'm a graduate student in Northeastern University,", max_length=30, num_return_sequences=5)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "Hello, I'm a graduate student in Northeastern University, having moved to Boston to spend summer after being admitted to Harvard. My thesis is my passion"},
 {'generated_text': "Hello, I'm a graduate student in Northeastern University, so I need a full-time internship with an existing college so that my students may not"},
 {'generated_text': "Hello, I'm a graduate student in Northeastern University, and had a few years as a professor there, but I'm not doing anything in graduate"},
 {'generated_text': "Hello, I'm a graduate student in Northeastern University, and I'm looking for a job. Would an applicant have a high-interest, high"},
 {'generated_text': "Hello, I'm a graduate student in Northeastern University, a Ph.D. Candidate in Economics and a J.D. candidate in Economics and"}]

### Explanation - Text Generation <a class="anchor" id="tg_explanation"></a>

It pretrained model on English language using a causal language modeling (CLM) objective. GPT-2 is a transformers model pretrained on a very large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it was trained to guess the next word in sentences. The inputs are sequences of continuous text of a certain length and the targets are the same sequence, shifted one token (word or piece of word) to the right. The model uses internally a mask-mechanism to make sure the predictions for the token i only uses the inputs from 1 to i but not the future tokens.

This way, the model learns an inner representation of the English language that can then be used to extract features useful for downstream tasks. The model is best at what it was pretrained for however, which is generating texts from a prompt.

The generator generates several possible partial sentences according to its maximum length. We can know that the content it generates is related and mostly reasonable, but try to think about the logic of some sentence contexts. The logic may be wrong or contradictory. Therefore, this model needs more training or tuning to achieve better performance.

### 1-6. Load Text2Text Generation Pipeline  <a class="anchor" id="text2text_generation"></a>

Text2Text Generation (10 Points)

Run a [Text2Text](https://huggingface.co/models?pipeline_tag=text2text-generation&sort=downloads) language model. Explain the theory behind your model, and run it.  Analyze how well you think it worked.

In [2]:
from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer

In [3]:
hi_text = "मुझे बोस्टन में रहना पसंद है।"
chinese_text = "我喜歡住在波士頓。"

In [4]:
pip install sentencepiece

Note: you may need to restart the kernel to use updated packages.


In [5]:
model = M2M100ForConditionalGeneration.from_pretrained("facebook/m2m100_418M")
tokenizer = M2M100Tokenizer.from_pretrained("facebook/m2m100_418M")

Downloading:   0%|          | 0.00/3.54M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/2.31M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/272 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.11k [00:00<?, ?B/s]

In [6]:
# translate Hindi to French
tokenizer.src_lang = "hi"
encoded_hi = tokenizer(hi_text, return_tensors="pt")
generated_tokens = model.generate(**encoded_hi, forced_bos_token_id=tokenizer.get_lang_id("fr"))
tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)

['J’aime rester à Boston.']

In [7]:
# translate Chinese to English
tokenizer.src_lang = "zh"
encoded_zh = tokenizer(chinese_text, return_tensors="pt")
generated_tokens = model.generate(**encoded_zh, forced_bos_token_id=tokenizer.get_lang_id("en"))
tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)

['I like to live in Boston.']

### Explanation - Text2Text Generation <a class="anchor" id="t2t_explanation"></a>

M2M100 is a multilingual encoder-decoder (seq-to-seq) model trained for Many-to-Many multilingual translation. The model that can directly translate between the 9,900 directions of 100 languages. To translate into a target language, the target language id is forced as the first generated token. To force the target language id as the first generated token, pass the forced_bos_token_id parameter to the generate method. M2M100Tokenizer depends on sentencepiece, so we install it before running the example.

At the first, I define a sentence in 2 kind of languages - Hindi and Traditional Chinese. The sentence we set means 'I like to live in Boston.' In tokenizer.get_lang_id("fr"), which means the language you want to transfer, so you can set any languages which are provided by. After that, we can see if it works or not. We transfer Hindi to French, then Traditional Chinese to English. After checking, it works well.

### 1-7. Load Token Classification Pipeline  <a class="anchor" id="token_classification"></a>

Token Classification (10 Points)

Run a [Token Classification](https://huggingface.co/models?pipeline_tag=token-classification&sort=downloads) language model. Explain the theory behind your model, and run it.  Analyze how well you think it worked.

In [8]:
from transformers import AutoTokenizer, AutoModelForTokenClassification

In [9]:
from transformers import pipeline

In [10]:
tokenizer = AutoTokenizer.from_pretrained("dslim/bert-base-NER")
model = AutoModelForTokenClassification.from_pretrained("dslim/bert-base-NER")

Downloading:   0%|          | 0.00/59.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/829 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/208k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/2.00 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/413M [00:00<?, ?B/s]

In [11]:
nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "My name is Julia and I live in Boston"

In [12]:
ner_results = nlp(example)

In [13]:
print(ner_results)

[{'entity': 'B-PER', 'score': 0.99860495, 'index': 4, 'word': 'Julia', 'start': 11, 'end': 16}, {'entity': 'B-LOC', 'score': 0.99624014, 'index': 9, 'word': 'Boston', 'start': 31, 'end': 37}]


### Explanation - Token Classification <a class="anchor" id="tkc_explanation"></a>

bert-base-NER is a fine-tuned BERT model that is ready to use for Named Entity Recognition and achieves state-of-the-art performance for the NER task. It has been trained to recognize four types of entities: location (LOC), organizations (ORG), person (PER) and Miscellaneous (MISC). this model is a bert-base-cased model that was fine-tuned on the English version of the standard CoNLL-2003 Named Entity Recognition dataset.

In the example, the simple sentence is defined and we get the result about each key token. We know that there are two key components which are Julia and Boston (Name and Location). The result print these tokens and their accuracy scores.

### 1-8. Load Translation Pipeline  <a class="anchor" id="translation"></a>

Translation (10 Points)

Run a [Translation](https://huggingface.co/models?pipeline_tag=translation&sort=downloads) language model. Explain the theory behind your model, and run it.  Analyze how well you think it worked.

In [39]:
from transformers import pipeline

In [41]:
translator = pipeline("translation_en_to_de")

No model was supplied, defaulted to t5-base (https://huggingface.co/t5-base)


In [42]:
print(translator("Hugging Face is a technology company based in New York and Paris", max_length=40))

[{'translation_text': 'Hugging Face ist ein Technologieunternehmen mit Sitz in New York und Paris.'}]


### Explanation - Translation <a class="anchor" id="trans_explanation"></a>

Translation is the task of translating a text from one language to another. We try to use an example of a translation dataset is the WMT English to German dataset, which has sentences in English as the input data and the corresponding sentences in German as the target data.

The translator pipeline is different from the pipeline of the Text2Text generator, but in general, they all can do translation between different languages. 

In the translator pipeline, we directly specify a specific language conversion in the pipeline.

### 1-9. Load Zero-Shot Classification Pipeline  <a class="anchor" id="zero-shot_classification"></a>

Zero-Shot Classification (10 Points)

Run a [Zero-Shot](https://huggingface.co/models?pipeline_tag=zero-shot-classification&sort=downloads) language model. Explain the theory behind your model, and run it.  Analyze how well you think it worked.

In [17]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")

No model was supplied, defaulted to facebook/bart-large-mnli (https://huggingface.co/facebook/bart-large-mnli)


Downloading:   0%|          | 0.00/1.13k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.52G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

In [18]:
sequence_to_classify = "one day I will see the world"
candidate_labels = ['travel', 'cooking', 'dancing', 'exploration']
classifier(sequence_to_classify, candidate_labels, multi_class=True)

The `multi_class` argument has been deprecated and renamed to `multi_label`. `multi_class` will be removed in a future version of Transformers.


{'sequence': 'one day I will see the world',
 'labels': ['travel', 'exploration', 'dancing', 'cooking'],
 'scores': [0.994511067867279,
  0.9383883476257324,
  0.00570622319355607,
  0.0018192946445196867]}

### Explanation - Zero-Shot Classification <a class="anchor" id="zs_explanation"></a>

It is a method for using pre-trained NLI models as a ready-made zero-shot sequence classifiers. The method works by posing the sequence to be classified as the NLI premise and to construct a hypothesis from each candidate label. The probabilities for entailment and contradiction are then converted to label probabilities.

For the sequence classification, we set the candidate labels and test which category it belongs to. After classifying, and we can understand what kind of category it is, and its results. The better score, the closer it is to that category, and the lower score, the less relevant it is to that category.

### 1-10. Load Sentence Similarity Pipeline  <a class="anchor" id="sentence_similarity"></a>

Sentence Similarity (10 Points)

Run a [Sentence Similarity](https://huggingface.co/models?pipeline_tag=sentence-similarity&sort=downloads) language model. Explain the theory behind your model, and run it.  Analyze how well you think it worked.


In [26]:
pip install -q sentence_transformers

Note: you may need to restart the kernel to use updated packages.


In [27]:
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
from pprint import pprint

In [28]:
model = SentenceTransformer('paraphrase-MiniLM-L6-v2')

Downloading:   0%|          | 0.00/690 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/3.69k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/122 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/229 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/314 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [29]:
sentences = \
    ['Nothing much to say as it is a macbook. the M1 processor works like a charm', 'Amazing laptop, super performance with M1, its blazing fast',
     'Working very slow and takes 15-20 minutes to start thus not worth for money',
     'This is not a good laptop. It is very slow. It is taking 20 minutes to start'
     ]

In [30]:
sentence_embeddings = model.encode(sentences)

In [31]:
for sentence, embedding in zip(sentences, sentence_embeddings):
    print("Sentence:", sentence)
    print("Embedding:", embedding)
    print("")

Sentence: Nothing much to say as it is a macbook. the M1 processor works like a charm
Embedding: [-2.46488184e-01 -2.08190918e-01 -1.11375190e-01 -1.02354307e-02
 -1.57813579e-01 -3.62280786e-01 -4.32110488e-01  9.86033026e-03
 -1.07506171e-01 -2.81675994e-01  1.49234027e-01  5.82048893e-02
 -4.16482836e-01 -3.29557598e-01  1.99979141e-01 -2.30781794e-01
  4.69206721e-01 -3.71777952e-01  1.01849034e-01  3.07776809e-01
 -1.57086685e-01 -2.26009130e-01  1.97037272e-02  5.11454344e-02
 -4.27461788e-02  5.73057175e-01  6.20062649e-01  6.87860847e-01
 -2.13596582e-01 -2.89904028e-01 -1.45346090e-01  3.32339764e-01
  1.22200176e-01  9.80937034e-02 -1.61603801e-02  3.91541533e-02
  3.21472287e-01 -3.11650872e-01 -4.86470520e-01 -6.87191308e-01
 -2.24827200e-01  6.89012110e-01  5.74313819e-01  5.49121380e-01
  7.78109014e-01  5.74669018e-02 -2.31808618e-01 -2.95517266e-01
  6.50321543e-02 -1.25565469e-01 -2.27927536e-01 -1.22406743e-01
  1.90444648e-01 -8.23577195e-02 -9.82053056e-02  6.479011

In [32]:
len(sentence_embeddings)

4

In [33]:
len(sentence_embeddings[0])

384

In [34]:
print('Similarity between {} and {} is {}'.format(sentences[0],
       sentences[1],
       cosine_similarity(sentence_embeddings[0].reshape(1, -1),
       sentence_embeddings[1].reshape(1, -1))[0][0]))

Similarity between Nothing much to say as it is a macbook. the M1 processor works like a charm and Amazing laptop, super performance with M1, its blazing fast is 0.6136189103126526


In [35]:
print('Similarity between {} and {} is {}'.format(sentences[0],
       sentences[2],
       cosine_similarity(sentence_embeddings[0].reshape(1, -1),
       sentence_embeddings[2].reshape(1, -1))[0][0]))

Similarity between Nothing much to say as it is a macbook. the M1 processor works like a charm and Working very slow and takes 15-20 minutes to start thus not worth for money is 0.20321230590343475


In [36]:
print('Similarity between {} and {} is {}'.format(sentences[2],
       sentences[3],
       cosine_similarity(sentence_embeddings[2].reshape(1, -1),
       sentence_embeddings[3].reshape(1, -1))[0][0]))

Similarity between Working very slow and takes 15-20 minutes to start thus not worth for money and This is not a good laptop. It is very slow. It is taking 20 minutes to start is 0.5791779160499573


### Explanation - Sentence Similarity <a class="anchor" id="ss_explanation"></a>

We can use this framework to compute sentence / text embeddings for more than 100 languages. These embeddings can then be compared e.g. with cosine-similarity to find sentences with a similar meaning. 

In our examples, we use the sentence_transformers model to compare different similarities. It compares each similarity between the sentence_embeddings, and we can get the higher similarity if they are much more similar.  

# References

# Copyright and Licensing

BSD 3-Clause License

Copyright (c) 2021, Shu-Ya Hsu
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this
  list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
  this list of conditions and the following disclaimer in the documentation
  and/or other materials provided with the distribution.

* Neither the name of the copyright holder nor the names of its
  contributors may be used to endorse or promote products derived from
  this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

In [43]:
# Copyright (c) Hsu-Ya Hsu.
# Distributed under the terms of the 3-Clause BSD License.

You are free to use or adapt this notebook for any purpose you'd like. However, please respect the [Modified BSD License](https://jupyter.org/governance/projectlicense.html) that governs its use.