In [1]:
from transformers import pipeline

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# Install below if not already in this venv. Reopen the notebook and comment them.
# pip install transformers sentencepiece
# !pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

In [3]:
classifier = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


In [4]:
classifier("this course has outdated setup information")

[{'label': 'NEGATIVE', 'score': 0.9997847676277161}]

In [5]:
classifier(["I'd continue learning the rest of the modules","this will help me learn concepts of transformers"])

[{'label': 'NEGATIVE', 'score': 0.6077219843864441},
 {'label': 'POSITIVE', 'score': 0.9988834261894226}]

In [6]:
classifier=pipeline("zero-shot-classification")
classifier("this is the transformer training to make you understand more about NLP.",
           candidate_labels=['robotics','education','music'])

No model was supplied, defaulted to facebook/bart-large-mnli and revision d7645e1 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


{'sequence': 'this is the transformer training to make you understand more about NLP.',
 'labels': ['education', 'robotics', 'music'],
 'scores': [0.9831095933914185, 0.011466334573924541, 0.005424054339528084]}

In [7]:
classifier("this is the transformer training to make you understand more about drums.",
           candidate_labels=['robotics','education','music'])

{'sequence': 'this is the transformer training to make you understand more about drums.',
 'labels': ['education', 'music', 'robotics'],
 'scores': [0.6152130365371704, 0.3754134774208069, 0.009373457171022892]}

In [8]:
generator = pipeline("text-generation")

No model was supplied, defaulted to openai-community/gpt2 and revision 607a30d (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


In [9]:
generator("I am about to go",num_return_sequences=2,max_length=20)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'I am about to go live and do all my duties. That makes me such a real pleasure to'},
 {'generated_text': "I am about to go home and play video games, so I will play videogames whenever I'm"}]

In [10]:
question_answerer = pipeline("question-answering",model="distilbert-base-cased-distilled-squad")

Device set to use cpu


In [11]:
question_answerer(question="where is bob seated?",context="Alice is sitting on the bench. Bob is sitting next to her")

{'score': 0.21272516250610352, 'start': 24, 'end': 29, 'answer': 'bench'}

In [12]:
unmasker=pipeline("fill-mask",model='bert-base-cased')

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


In [13]:
unmasker("what the [MASK] ?", top_k=3)

[{'score': 0.745980978012085,
  'token': 2630,
  'token_str': 'hell',
  'sequence': 'what the hell?'},
 {'score': 0.19725626707077026,
  'token': 9367,
  'token_str': 'fuck',
  'sequence': 'what the fuck?'},
 {'score': 0.0323406457901001,
  'token': 26913,
  'token_str': 'heck',
  'sequence': 'what the heck?'}]

In [14]:
unmasker=pipeline("fill-mask")

No model was supplied, defaulted to distilbert/distilroberta-base and revision fb53ab8 (https://huggingface.co/distilbert/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at distilbert/distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


In [15]:
# Notice the difference in MASK and token scores
unmasker("what the <mask> ?", top_k=3)

[{'score': 0.43072301149368286,
  'token': 7105,
  'token_str': ' hell',
  'sequence': 'what the hell ?'},
 {'score': 0.29795730113983154,
  'token': 26536,
  'token_str': ' fuck',
  'sequence': 'what the fuck ?'},
 {'score': 0.22147269546985626,
  'token': 17835,
  'token_str': ' heck',
  'sequence': 'what the heck ?'}]

In [16]:
ner=pipeline("ner", grouped_entities=True) # if not grouped, it breaks the names 

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


In [17]:
ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")

[{'entity_group': 'PER',
  'score': np.float32(0.9981694),
  'word': 'Sylvain',
  'start': 11,
  'end': 18},
 {'entity_group': 'ORG',
  'score': np.float32(0.9796019),
  'word': 'Hugging Face',
  'start': 33,
  'end': 45},
 {'entity_group': 'LOC',
  'score': np.float32(0.9932106),
  'word': 'Brooklyn',
  'start': 49,
  'end': 57}]

In [18]:
%pip install sentencepiece

Note: you may need to restart the kernel to use updated packages.


In [19]:
translator=pipeline("translation",model="Helsinki-NLP/opus-mt-fr-en")

Device set to use cpu


In [20]:
translator("C'est la vie")

[{'translation_text': "It's life."}]