## Transformers NLP Models

Using **Transformers** (v2.3.0) APIs - State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

**Transformers** provides general-purpose architectures that include BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet for Natural Language tasks.

- ***Sentence Classification / Sentiment Analysis***
- ***Token Classification***
- ***Question-Answering***
- ***Summarization***
- ***Translation***
- ***Feature Extraction***

**Using the Pipelines**

```python
from transformers import pipeline

# using default model and tokenizer for the task
pipeline("<task-name>")

# using a user-specified model
pipeline("<task-name>", model="<model_name>")

# using custom model/tokenizer as str
pipeline('<task-name>', model='<model name>', tokenizer='<tokenizer_name>')
```

**NLP tasks**
 
- **Tokenization**: Splits the initial input into multiple sub-entities (tokens).
- **Inference**: Maps tokens into a more meaningful representation. 
- **Decoding**: Uses the above representation to generate and/or extract the final output for the underlying task.

In [1]:
# chevking python version
! python -V

Python 3.7.3


In [2]:
# installing transformers
!pip install -q transformers

In [3]:
# importing packages
from __future__ import print_function
from transformers import pipeline

import ipywidgets as widgets
import numpy as np
import warnings

warnings.filterwarnings('ignore')

I1218 17:37:06.823713 4436350400 file_utils.py:39] PyTorch version 1.5.0 available.
I1218 17:37:09.961281 4436350400 file_utils.py:55] TensorFlow version 2.1.0 available.


## 1. Sentence Classification - Sentiment Analysis
Indicates if the overall sentence is either *positive* or *negative*.

In [4]:
nlp_sent_classification = pipeline('sentiment-analysis')
nlp_sent_classification('It is a sunny day today !')

I1218 17:37:10.841156 4436350400 connectionpool.py:735] Starting new HTTPS connection (1): s3.amazonaws.com
I1218 17:37:12.172346 4436350400 configuration_utils.py:265] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/distilbert-base-uncased-config.json from cache at /Users/msonjap/.cache/torch/transformers/a41e817d5c0743e29e86ff85edc8c257e61bc8d88e4271bb1b243b6e7614c633.8949e27aafafa845a18d98a0e3a88bc2d248bbc32a1b75947366664658f23b1c
I1218 17:37:12.173588 4436350400 configuration_utils.py:301] Model config DistilBertConfig {
  "activation": "gelu",
  "architectures": [
    "DistilBertForMaskedLM"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "initializer_range": 0.02,
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "pad_token_id": 0,
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "tie_weights_": true,
  "vocab_size": 30522


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=230.0, style=ProgressStyle(description_…

I1218 17:37:16.609274 4436350400 file_utils.py:441] storing https://s3.amazonaws.com/models.huggingface.co/bert/distilbert-base-uncased-finetuned-sst-2-english-modelcard.json in cache at /Users/msonjap/.cache/torch/transformers/57ded08a298ef01c397973781194aa0abf6176e6f720f660a2b93e8199dc0bc7.455d944f3d1572ab55ed579849f751cf37f303e3388980a42d94f7cd57a4e331
I1218 17:37:16.610188 4436350400 file_utils.py:444] creating metadata file for /Users/msonjap/.cache/torch/transformers/57ded08a298ef01c397973781194aa0abf6176e6f720f660a2b93e8199dc0bc7.455d944f3d1572ab55ed579849f751cf37f303e3388980a42d94f7cd57a4e331
I1218 17:37:16.611377 4436350400 filelock.py:318] Lock 4422496944 released on /Users/msonjap/.cache/torch/transformers/57ded08a298ef01c397973781194aa0abf6176e6f720f660a2b93e8199dc0bc7.455d944f3d1572ab55ed579849f751cf37f303e3388980a42d94f7cd57a4e331.lock
I1218 17:37:16.611916 4436350400 modelcard.py:161] loading model card file https://s3.amazonaws.com/models.huggingface.co/bert/distilbert-




I1218 17:37:17.448534 4436350400 modeling_utils.py:650] loading weights file https://cdn.huggingface.co/distilbert-base-uncased-finetuned-sst-2-english-pytorch_model.bin from cache at /Users/msonjap/.cache/torch/transformers/dd75d79590ca5f697e98574177cada739798baad3b771fd8f500e08f9f8b78cf.461f3160566473d3587f9f4776a5131b1ed527b0d5fccb4b5f06003f457154bc


[{'label': 'POSITIVE', 'score': 0.9998055100440979}]

## 2. Token Classification - Named Entity Recognition

Assigns a label for each token in the input.

In [5]:
nlp_token_class = pipeline('ner')
nlp_token_class('Hugging Face is a French company based in New-York.')

I1218 17:37:18.697077 4436350400 connectionpool.py:735] Starting new HTTPS connection (1): s3.amazonaws.com
I1218 17:37:19.526192 4436350400 configuration_utils.py:265] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased-config.json from cache at /Users/msonjap/.cache/torch/transformers/90deb4d9dd705272dc4b3db1364d759d551d72a9f70a91f60e3a1f5e278b985d.9019d8d0ae95e32b896211ae7ae130d7c36bb19ccf35c90a9e51923309458f70
I1218 17:37:19.527809 4436350400 configuration_utils.py:301] Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "directionality": "bidi",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "pad_token_id": 0,
  "pooler_fc_size": 768,
  "pooler_num_

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=230.0, style=ProgressStyle(description_…

I1218 17:37:23.563438 4436350400 file_utils.py:441] storing https://s3.amazonaws.com/models.huggingface.co/bert/dbmdz/bert-large-cased-finetuned-conll03-english/modelcard.json in cache at /Users/msonjap/.cache/torch/transformers/260d6de20be053fffff53f4a6d9a1be612fbbc78418a86c6b71d34123d67cce6.455d944f3d1572ab55ed579849f751cf37f303e3388980a42d94f7cd57a4e331
I1218 17:37:23.564213 4436350400 file_utils.py:444] creating metadata file for /Users/msonjap/.cache/torch/transformers/260d6de20be053fffff53f4a6d9a1be612fbbc78418a86c6b71d34123d67cce6.455d944f3d1572ab55ed579849f751cf37f303e3388980a42d94f7cd57a4e331
I1218 17:37:23.565248 4436350400 filelock.py:318] Lock 5619698320 released on /Users/msonjap/.cache/torch/transformers/260d6de20be053fffff53f4a6d9a1be612fbbc78418a86c6b71d34123d67cce6.455d944f3d1572ab55ed579849f751cf37f303e3388980a42d94f7cd57a4e331.lock
I1218 17:37:23.565989 4436350400 modelcard.py:161] loading model card file https://s3.amazonaws.com/models.huggingface.co/bert/dbmdz/bert




I1218 17:37:24.213640 4436350400 modeling_utils.py:650] loading weights file https://cdn.huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english/pytorch_model.bin from cache at /Users/msonjap/.cache/torch/transformers/4b02c1fe04cf7f7e6972536150e9fb329c7b3d5720b82afdac509bd750c705d2.6dcb154688bb97608a563afbf68ba07ae6f7beafd9bd98b5a043cd269fcc02b4


[{'word': 'Hu', 'score': 0.9970937967300415, 'entity': 'I-ORG', 'index': 1},
 {'word': '##gging',
  'score': 0.9345752000808716,
  'entity': 'I-ORG',
  'index': 2},
 {'word': 'Face', 'score': 0.9787060618400574, 'entity': 'I-ORG', 'index': 3},
 {'word': 'French',
  'score': 0.9981995820999146,
  'entity': 'I-MISC',
  'index': 6},
 {'word': 'New', 'score': 0.9983047246932983, 'entity': 'I-LOC', 'index': 10},
 {'word': '-', 'score': 0.891345739364624, 'entity': 'I-LOC', 'index': 11},
 {'word': 'York', 'score': 0.9979523420333862, 'entity': 'I-LOC', 'index': 12}]

## 3. Question Answering
Based on a ***question*** x ***context*** tuple, the model finds a span of text in **content** answering the **question**.

In [6]:
nlp_q_a = pipeline('question-answering')
nlp_q_a(context='Hugging Face is a French company based in New-York.', question='Where is based Hugging Face ?')

I1218 17:37:31.358564 4436350400 connectionpool.py:735] Starting new HTTPS connection (1): s3.amazonaws.com
I1218 17:37:32.542737 4436350400 configuration_utils.py:265] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/distilbert-base-cased-config.json from cache at /Users/msonjap/.cache/torch/transformers/774d52b0be7c2f621ac9e64708a8b80f22059f6d0e264e1bdc4f4d71c386c4ea.f44aaaab97e2ee0f8d9071a5cd694e19bf664237a92aea20ebe04ddb7097b494
I1218 17:37:32.543925 4436350400 configuration_utils.py:301] Model config DistilBertConfig {
  "activation": "gelu",
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "initializer_range": 0.02,
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "output_past": true,
  "pad_token_id": 0,
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "tie_weights_": true,
  "vocab_size": 28996
}

I1218 17:37:32.546403 44363504

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=230.0, style=ProgressStyle(description_…

I1218 17:37:35.279170 4436350400 file_utils.py:441] storing https://s3.amazonaws.com/models.huggingface.co/bert/distilbert-base-cased-distilled-squad-modelcard.json in cache at /Users/msonjap/.cache/torch/transformers/414816afc2ab8922d082f893dbf90bcb9a43f09838039249c6c8ca3e8b77921f.455d944f3d1572ab55ed579849f751cf37f303e3388980a42d94f7cd57a4e331
I1218 17:37:35.280104 4436350400 file_utils.py:444] creating metadata file for /Users/msonjap/.cache/torch/transformers/414816afc2ab8922d082f893dbf90bcb9a43f09838039249c6c8ca3e8b77921f.455d944f3d1572ab55ed579849f751cf37f303e3388980a42d94f7cd57a4e331
I1218 17:37:35.281239 4436350400 filelock.py:318] Lock 4423689720 released on /Users/msonjap/.cache/torch/transformers/414816afc2ab8922d082f893dbf90bcb9a43f09838039249c6c8ca3e8b77921f.455d944f3d1572ab55ed579849f751cf37f303e3388980a42d94f7cd57a4e331.lock
I1218 17:37:35.281746 4436350400 modelcard.py:161] loading model card file https://s3.amazonaws.com/models.huggingface.co/bert/distilbert-base-cased




I1218 17:37:36.071654 4436350400 configuration_utils.py:265] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/distilbert-base-cased-distilled-squad-config.json from cache at /Users/msonjap/.cache/torch/transformers/c2341a51039a311cb3c7dc71b3d21970e6a127876f067f379f8bcd77ef870389.6a09face0659d64f93c9919f323e2ad4543ca9af5d2417b1bfb1a36f2f6b94a4
I1218 17:37:36.073202 4436350400 configuration_utils.py:301] Model config DistilBertConfig {
  "activation": "gelu",
  "architectures": [
    "DistilBertForQuestionAnswering"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "initializer_range": 0.02,
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "output_past": true,
  "pad_token_id": 0,
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": true,
  "tie_weights_": true,
  "vocab_size": 28996
}

I1218 17:37:36.075690 4436350400 connectionpool.py:735] Star

{'score': 0.9632966867654424, 'start': 42, 'end': 50, 'answer': 'New-York.'}

## 4. Text Generation - Mask Filling

Suggests possible word or words to fill a masked input with respect to the provided **context**.

In [7]:
nlp_mask_fill = pipeline('fill-mask')
nlp_mask_fill('Hugging Face is a French company based in ' + nlp_mask_fill.tokenizer.mask_token)

I1218 17:37:40.369240 4436350400 connectionpool.py:735] Starting new HTTPS connection (1): s3.amazonaws.com
I1218 17:37:41.293112 4436350400 configuration_utils.py:265] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/distilroberta-base-config.json from cache at /Users/msonjap/.cache/torch/transformers/d52ced8fd31ba6aa311b6eeeae65178cca00ddd6333c087be4601dc46c20bd96.1221e000e415a518ec3c28f32c14c1dd3baa36dc0537db4848b7236b38f50313
I1218 17:37:41.295007 4436350400 configuration_utils.py:301] Model config RobertaConfig {
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 12,
  "num_hidden_layers": 6,
  "pad_token_id": 1,
  "type_vocab_si

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=230.0, style=ProgressStyle(description_…

I1218 17:37:45.086636 4436350400 file_utils.py:441] storing https://s3.amazonaws.com/models.huggingface.co/bert/distilroberta-base-modelcard.json in cache at /Users/msonjap/.cache/torch/transformers/8df1a5d2cc59a6b2d440c323b3d4287c795d8b2d33d2b7f37cb7290d6ecfc70a.455d944f3d1572ab55ed579849f751cf37f303e3388980a42d94f7cd57a4e331
I1218 17:37:45.087546 4436350400 file_utils.py:444] creating metadata file for /Users/msonjap/.cache/torch/transformers/8df1a5d2cc59a6b2d440c323b3d4287c795d8b2d33d2b7f37cb7290d6ecfc70a.455d944f3d1572ab55ed579849f751cf37f303e3388980a42d94f7cd57a4e331
I1218 17:37:45.088644 4436350400 filelock.py:318] Lock 5618441016 released on /Users/msonjap/.cache/torch/transformers/8df1a5d2cc59a6b2d440c323b3d4287c795d8b2d33d2b7f37cb7290d6ecfc70a.455d944f3d1572ab55ed579849f751cf37f303e3388980a42d94f7cd57a4e331.lock
I1218 17:37:45.089156 4436350400 modelcard.py:161] loading model card file https://s3.amazonaws.com/models.huggingface.co/bert/distilroberta-base-modelcard.json from c




I1218 17:37:45.871578 4436350400 configuration_utils.py:265] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/distilroberta-base-config.json from cache at /Users/msonjap/.cache/torch/transformers/d52ced8fd31ba6aa311b6eeeae65178cca00ddd6333c087be4601dc46c20bd96.1221e000e415a518ec3c28f32c14c1dd3baa36dc0537db4848b7236b38f50313
I1218 17:37:45.873450 4436350400 configuration_utils.py:301] Model config RobertaConfig {
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 12,
  "num_hidden_layers": 6,
  "pad_token_id": 1,
  "type_vocab_size": 1,
  "vocab_size": 50265
}

I1218 17:37:45.878400 4436350400 connectionpool.py:735] Starting new HTTPS 

[{'sequence': '<s> Hugging Face is a French company based in Paris</s>',
  'score': 0.23106759786605835,
  'token': 2201},
 {'sequence': '<s> Hugging Face is a French company based in Lyon</s>',
  'score': 0.08198274672031403,
  'token': 12790},
 {'sequence': '<s> Hugging Face is a French company based in Geneva</s>',
  'score': 0.047694865614175797,
  'token': 11559},
 {'sequence': '<s> Hugging Face is a French company based in Brussels</s>',
  'score': 0.04762241616845131,
  'token': 6497},
 {'sequence': '<s> Hugging Face is a French company based in France</s>',
  'score': 0.041305918246507645,
  'token': 1470}]

## 5. Summarization
Summarizes a text to a shorter text. Supported by **Bart** and **T5**.

In [8]:
text = """ 
New York (CNN)When Liana Barrientos was 23 years old, she got married in Westchester County, New York. 
A year later, she got married again in Westchester County, but to a different man and without divorcing her first husband. 
Only 18 days after that marriage, she got hitched yet again. Then, Barrientos declared "I do" five more times, sometimes only within two weeks of each other. 
In 2010, she married once more, this time in the Bronx. In an application for a marriage license, she stated it was her "first and only" marriage. 
Barrientos, now 39, is facing two criminal counts of "offering a false instrument for filing in the first degree," referring to her false statements on the 
2010 marriage license application, according to court documents. 
Prosecutors said the marriages were part of an immigration scam. 
On Friday, she pleaded not guilty at State Supreme Court in the Bronx, according to her attorney, Christopher Wright, who declined to comment further. 
After leaving court, Barrientos was arrested and charged with theft of service and criminal trespass for allegedly sneaking into the New York subway through an emergency exit, said Detective 
Annette Markowski, a police spokeswoman. In total, Barrientos has been married 10 times, with nine of her marriages occurring between 1999 and 2002. 
All occurred either in Westchester County, Long Island, New Jersey or the Bronx. She is believed to still be married to four men, and at one time, she was married to eight men at once, prosecutors say. 
Prosecutors said the immigration scam involved some of her husbands, who filed for permanent residence status shortly after the marriages. 
Any divorces happened only after such filings were approved. It was unclear whether any of the men will be prosecuted. 
The case was referred to the Bronx District Attorney\'s Office by Immigration and Customs Enforcement and the Department of Homeland Security\'s 
Investigation Division. Seven of the men are from so-called "red-flagged" countries, including Egypt, Turkey, Georgia, Pakistan and Mali. 
Her eighth husband, Rashid Rajput, was deported in 2006 to his native Pakistan after an investigation by the Joint Terrorism Task Force. 
If convicted, Barrientos faces up to four years in prison.  Her next court appearance is scheduled for May 18.
"""

nlp_summarizer = pipeline('summarization')
nlp_summarizer(text)

I1218 17:37:50.875981 4436350400 connectionpool.py:735] Starting new HTTPS connection (1): s3.amazonaws.com
I1218 17:37:51.877252 4436350400 configuration_utils.py:265] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/facebook/bart-large-cnn/config.json from cache at /Users/msonjap/.cache/torch/transformers/5f0de1d2bbb8eb1a3b69656622293b3328b06b701663a9d4109359751cb4e739.34ba3f954356a2e95322ec0779c171c5321b7ee576d59f741bfb8a4ebe69e978
I1218 17:37:51.883985 4436350400 configuration_utils.py:301] Model config BartConfig {
  "_num_labels": 3,
  "activation_dropout": 0.0,
  "activation_function": "gelu",
  "add_bias_logits": false,
  "add_final_layer_norm": false,
  "attention_dropout": 0.0,
  "bos_token_id": 0,
  "classif_dropout": 0.0,
  "d_model": 1024,
  "decoder_attention_heads": 16,
  "decoder_ffn_dim": 4096,
  "decoder_layerdrop": 0.0,
  "decoder_layers": 12,
  "decoder_start_token_id": 2,
  "dropout": 0.1,
  "early_stopping": true,
  "encoder_attention

[{'summary_text': 'Liana Barrientos has been married 10 times, sometimes within two weeks of each other. Prosecutors say the marriages were part of an immigration scam. She is believed to still be married to four men, and at one time, she was married to eight. Her eighth husband, Rashid Rajput, was deported in 2006.'}]

## 6. Translation
Translates a text input from a language to another language.
Translation is supported by **T5**.

In [9]:
# translating from English to French
nlp_translator = pipeline('translation_en_to_fr')
nlp_translator("HuggingFace is a French company that is based in New York City. HuggingFace's mission is to solve NLP one commit at a time")

I1218 17:38:20.157927 4436350400 connectionpool.py:735] Starting new HTTPS connection (1): s3.amazonaws.com
I1218 17:38:21.294101 4436350400 configuration_utils.py:265] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/t5-base-config.json from cache at /Users/msonjap/.cache/torch/transformers/40578967d1f029acb6162b36db9d8b4307063e885990ccd297c2c5be1cf1b3d7.2995d650f5eba18c8baa4146e210d32d56165e90d374281741fc78b872cd6c9b
I1218 17:38:21.300246 4436350400 configuration_utils.py:301] Model config T5Config {
  "architectures": [
    "T5WithLMHeadModel"
  ],
  "d_ff": 3072,
  "d_kv": 64,
  "d_model": 768,
  "decoder_start_token_id": 0,
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "layer_norm_epsilon": 1e-06,
  "model_type": "t5",
  "n_positions": 512,
  "num_heads": 12,
  "num_layers": 12,
  "output_past": true,
  "pad_token_id": 0,
  "relative_attention_num_buckets": 32,
  "task_specific_params": {
    

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=230.0, style=ProgressStyle(description_…

I1218 17:38:24.715365 4436350400 file_utils.py:441] storing https://s3.amazonaws.com/models.huggingface.co/bert/t5-base-modelcard.json in cache at /Users/msonjap/.cache/torch/transformers/a45b34a35bc7bc991ed18a0d7234585a7b23ed17d16055794fc643dd6022fdd2.455d944f3d1572ab55ed579849f751cf37f303e3388980a42d94f7cd57a4e331
I1218 17:38:24.717125 4436350400 file_utils.py:444] creating metadata file for /Users/msonjap/.cache/torch/transformers/a45b34a35bc7bc991ed18a0d7234585a7b23ed17d16055794fc643dd6022fdd2.455d944f3d1572ab55ed579849f751cf37f303e3388980a42d94f7cd57a4e331
I1218 17:38:24.718586 4436350400 filelock.py:318] Lock 5618654792 released on /Users/msonjap/.cache/torch/transformers/a45b34a35bc7bc991ed18a0d7234585a7b23ed17d16055794fc643dd6022fdd2.455d944f3d1572ab55ed579849f751cf37f303e3388980a42d94f7cd57a4e331.lock
I1218 17:38:24.719194 4436350400 modelcard.py:161] loading model card file https://s3.amazonaws.com/models.huggingface.co/bert/t5-base-modelcard.json from cache at /Users/msonjap




I1218 17:38:25.541326 4436350400 configuration_utils.py:265] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/t5-base-config.json from cache at /Users/msonjap/.cache/torch/transformers/40578967d1f029acb6162b36db9d8b4307063e885990ccd297c2c5be1cf1b3d7.2995d650f5eba18c8baa4146e210d32d56165e90d374281741fc78b872cd6c9b
I1218 17:38:25.542505 4436350400 configuration_utils.py:301] Model config T5Config {
  "architectures": [
    "T5WithLMHeadModel"
  ],
  "d_ff": 3072,
  "d_kv": 64,
  "d_model": 768,
  "decoder_start_token_id": 0,
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "layer_norm_epsilon": 1e-06,
  "model_type": "t5",
  "n_positions": 512,
  "num_heads": 12,
  "num_layers": 12,
  "output_past": true,
  "pad_token_id": 0,
  "relative_attention_num_buckets": 32,
  "task_specific_params": {
    "summarization": {
      "early_stopping": true,
      "length_penalty": 2.0,
      "max_length": 200,
     

[{'translation_text': 'HuggingFace est une entreprise française basée à New York.'}]

In [10]:
# translating from English to German
nlp_translator = pipeline('translation_en_to_de')
nlp_translator("The history of natural language processing (NLP) generally started in the 1950s, although work can be found from earlier periods.")

I1218 17:38:36.747773 4436350400 connectionpool.py:735] Starting new HTTPS connection (1): s3.amazonaws.com
I1218 17:38:37.694487 4436350400 configuration_utils.py:265] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/t5-base-config.json from cache at /Users/msonjap/.cache/torch/transformers/40578967d1f029acb6162b36db9d8b4307063e885990ccd297c2c5be1cf1b3d7.2995d650f5eba18c8baa4146e210d32d56165e90d374281741fc78b872cd6c9b
I1218 17:38:37.696357 4436350400 configuration_utils.py:301] Model config T5Config {
  "architectures": [
    "T5WithLMHeadModel"
  ],
  "d_ff": 3072,
  "d_kv": 64,
  "d_model": 768,
  "decoder_start_token_id": 0,
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "layer_norm_epsilon": 1e-06,
  "model_type": "t5",
  "n_positions": 512,
  "num_heads": 12,
  "num_layers": 12,
  "output_past": true,
  "pad_token_id": 0,
  "relative_attention_num_buckets": 32,
  "task_specific_params": {
    

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=230.0, style=ProgressStyle(description_…

I1218 17:38:40.669040 4436350400 file_utils.py:441] storing https://s3.amazonaws.com/models.huggingface.co/bert/t5-base-modelcard.json in cache at /Users/msonjap/.cache/torch/transformers/a45b34a35bc7bc991ed18a0d7234585a7b23ed17d16055794fc643dd6022fdd2.455d944f3d1572ab55ed579849f751cf37f303e3388980a42d94f7cd57a4e331
I1218 17:38:40.670114 4436350400 file_utils.py:444] creating metadata file for /Users/msonjap/.cache/torch/transformers/a45b34a35bc7bc991ed18a0d7234585a7b23ed17d16055794fc643dd6022fdd2.455d944f3d1572ab55ed579849f751cf37f303e3388980a42d94f7cd57a4e331
I1218 17:38:40.671439 4436350400 filelock.py:318] Lock 9288667880 released on /Users/msonjap/.cache/torch/transformers/a45b34a35bc7bc991ed18a0d7234585a7b23ed17d16055794fc643dd6022fdd2.455d944f3d1572ab55ed579849f751cf37f303e3388980a42d94f7cd57a4e331.lock
I1218 17:38:40.672067 4436350400 modelcard.py:161] loading model card file https://s3.amazonaws.com/models.huggingface.co/bert/t5-base-modelcard.json from cache at /Users/msonjap




I1218 17:38:41.463797 4436350400 configuration_utils.py:265] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/t5-base-config.json from cache at /Users/msonjap/.cache/torch/transformers/40578967d1f029acb6162b36db9d8b4307063e885990ccd297c2c5be1cf1b3d7.2995d650f5eba18c8baa4146e210d32d56165e90d374281741fc78b872cd6c9b
I1218 17:38:41.464915 4436350400 configuration_utils.py:301] Model config T5Config {
  "architectures": [
    "T5WithLMHeadModel"
  ],
  "d_ff": 3072,
  "d_kv": 64,
  "d_model": 768,
  "decoder_start_token_id": 0,
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "layer_norm_epsilon": 1e-06,
  "model_type": "t5",
  "n_positions": 512,
  "num_heads": 12,
  "num_layers": 12,
  "output_past": true,
  "pad_token_id": 0,
  "relative_attention_num_buckets": 32,
  "task_specific_params": {
    "summarization": {
      "early_stopping": true,
      "length_penalty": 2.0,
      "max_length": 200,
     

[{'translation_text': 'Die Geschichte der natürlichen Sprachenverarbeitung (NLP) begann im Allgemeinen in den 1950er Jahren, obwohl Arbeit aus früheren Zeiten gefunden werden kann.'}]

## 7. Text Generation
Generates text based on an input. Text generation is currently supported by GPT-2, OpenAi-GPT, TransfoXL, XLNet, CTRL and Reformer.

In [11]:
nlp_text_generation = pipeline("text-generation")
nlp_text_generation("Today is a beautiful day and I will")

I1218 17:38:49.682775 4436350400 connectionpool.py:735] Starting new HTTPS connection (1): s3.amazonaws.com
I1218 17:38:51.007750 4436350400 configuration_utils.py:265] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-config.json from cache at /Users/msonjap/.cache/torch/transformers/4be02c5697d91738003fb1685c9872f284166aa32e061576bbe6aaeb95649fcf.db13c9bc9c7bdd738ec89e069621d88e05dc670366092d809a9cbcac6798e24e
I1218 17:38:51.013189 4436350400 configuration_utils.py:301] Model config GPT2Config {
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_layer": 12,
  "n_positions": 1024,
  "resid_pdrop": 0.1,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": 

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=230.0, style=ProgressStyle(description_…

I1218 17:38:55.934335 4436350400 file_utils.py:441] storing https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-modelcard.json in cache at /Users/msonjap/.cache/torch/transformers/39d5aa249312bb8567522d9b217edcc808dba1b86812526f68747f5343853c89.455d944f3d1572ab55ed579849f751cf37f303e3388980a42d94f7cd57a4e331
I1218 17:38:55.935508 4436350400 file_utils.py:444] creating metadata file for /Users/msonjap/.cache/torch/transformers/39d5aa249312bb8567522d9b217edcc808dba1b86812526f68747f5343853c89.455d944f3d1572ab55ed579849f751cf37f303e3388980a42d94f7cd57a4e331
I1218 17:38:55.936557 4436350400 filelock.py:318] Lock 8890552448 released on /Users/msonjap/.cache/torch/transformers/39d5aa249312bb8567522d9b217edcc808dba1b86812526f68747f5343853c89.455d944f3d1572ab55ed579849f751cf37f303e3388980a42d94f7cd57a4e331.lock
I1218 17:38:55.937008 4436350400 modelcard.py:161] loading model card file https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-modelcard.json from cache at /Users/msonjap/.cach




I1218 17:38:56.746193 4436350400 configuration_utils.py:265] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-config.json from cache at /Users/msonjap/.cache/torch/transformers/4be02c5697d91738003fb1685c9872f284166aa32e061576bbe6aaeb95649fcf.db13c9bc9c7bdd738ec89e069621d88e05dc670366092d809a9cbcac6798e24e
I1218 17:38:56.747279 4436350400 configuration_utils.py:301] Model config GPT2Config {
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_layer": 12,
  "n_positions": 1024,
  "resid_pdrop": 0.1,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-gener

[{'generated_text': 'Today is a beautiful day and I will certainly pass on the words of all those whose courage and selfless actions inspired the day. It will also be remembered that those we care about for their long-term well-being. Our heroes are our closest'}]

## 8. Features Extraction 
Maps the input to a higher, multi-dimensional space learned from the data.

In [12]:
nlp_features = pipeline('feature-extraction')
output = nlp_features('Hugging Face is a French company based in Paris')
np.array(output).shape

I1218 17:39:01.986297 4436350400 connectionpool.py:735] Starting new HTTPS connection (1): s3.amazonaws.com
I1218 17:39:02.975017 4436350400 configuration_utils.py:265] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/distilbert-base-cased-config.json from cache at /Users/msonjap/.cache/torch/transformers/774d52b0be7c2f621ac9e64708a8b80f22059f6d0e264e1bdc4f4d71c386c4ea.f44aaaab97e2ee0f8d9071a5cd694e19bf664237a92aea20ebe04ddb7097b494
I1218 17:39:02.977472 4436350400 configuration_utils.py:301] Model config DistilBertConfig {
  "activation": "gelu",
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "initializer_range": 0.02,
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "output_past": true,
  "pad_token_id": 0,
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "tie_weights_": true,
  "vocab_size": 28996
}

I1218 17:39:02.979733 44363504

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=230.0, style=ProgressStyle(description_…

I1218 17:39:05.974174 4436350400 file_utils.py:441] storing https://s3.amazonaws.com/models.huggingface.co/bert/distilbert-base-cased-modelcard.json in cache at /Users/msonjap/.cache/torch/transformers/e7d48e6fc39207404b4bb02c7e5b18c589596b890807864fe4b46fcd88e54e28.455d944f3d1572ab55ed579849f751cf37f303e3388980a42d94f7cd57a4e331
I1218 17:39:05.975421 4436350400 file_utils.py:444] creating metadata file for /Users/msonjap/.cache/torch/transformers/e7d48e6fc39207404b4bb02c7e5b18c589596b890807864fe4b46fcd88e54e28.455d944f3d1572ab55ed579849f751cf37f303e3388980a42d94f7cd57a4e331
I1218 17:39:05.976592 4436350400 filelock.py:318] Lock 9150378840 released on /Users/msonjap/.cache/torch/transformers/e7d48e6fc39207404b4bb02c7e5b18c589596b890807864fe4b46fcd88e54e28.455d944f3d1572ab55ed579849f751cf37f303e3388980a42d94f7cd57a4e331.lock
I1218 17:39:05.977312 4436350400 modelcard.py:161] loading model card file https://s3.amazonaws.com/models.huggingface.co/bert/distilbert-base-cased-modelcard.json 




I1218 17:39:06.814496 4436350400 configuration_utils.py:265] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/distilbert-base-cased-config.json from cache at /Users/msonjap/.cache/torch/transformers/774d52b0be7c2f621ac9e64708a8b80f22059f6d0e264e1bdc4f4d71c386c4ea.f44aaaab97e2ee0f8d9071a5cd694e19bf664237a92aea20ebe04ddb7097b494
I1218 17:39:06.815608 4436350400 configuration_utils.py:301] Model config DistilBertConfig {
  "activation": "gelu",
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "initializer_range": 0.02,
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "output_past": true,
  "pad_token_id": 0,
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "tie_weights_": true,
  "vocab_size": 28996
}

I1218 17:39:06.819107 4436350400 connectionpool.py:735] Starting new HTTPS connection (1): cdn.huggingface.co
I1218 17:39:07.652715 443635

(1, 12, 768)

**Pipelines**

In [13]:
task = widgets.Dropdown(
    options=['sentiment-analysis', 'ner', 'fill_mask'],
    value='sentiment-analysis',
    description='Task:',
    disabled=False
)

input = widgets.Text(
    value='',
    placeholder='Enter something',
    description='Your input:',
    disabled=False
)

def forward(_):
    if len(input.value) > 0: 
        if task.value == 'ner':
            output = nlp_token_class(input.value)
        elif task.value == 'sentiment-analysis':
            output = nlp_sentence_classif(input.value)
        else:
            if input.value.find('<mask>') == -1:
                output = nlp_fill(input.value + ' <mask>')
            else:
                output = nlp_fill(input.value)                
        print(output)

input.on_submit(forward)
display(task, input)

Dropdown(description='Task:', options=('sentiment-analysis', 'ner', 'fill_mask'), value='sentiment-analysis')

Text(value='', description='Your input:', placeholder='Enter something')

In [14]:
context = widgets.Textarea(
    value='Einstein is famous for the general theory of relativity',
    placeholder='Enter something',
    description='Context:',
    disabled=False
)

query = widgets.Text(
    value='Why is Einstein famous for ?',
    placeholder='Enter something',
    description='Question:',
    disabled=False
)

def forward(_):
    if len(context.value) > 0 and len(query.value) > 0: 
        output = nlp_qa(question=query.value, context=context.value)            
        print(output)

query.on_submit(forward)
display(context, query)

Textarea(value='Einstein is famous for the general theory of relativity', description='Context:', placeholder=…

Text(value='Why is Einstein famous for ?', description='Question:', placeholder='Enter something')