In [1]:
import pickle
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

from transformers import pipeline
import torch

When no model is supplied, the pipeline automatically uses the default model associated with each task. You can find all available model choices in the following website: https://huggingface.co/models There are even some models for foreign languages.

In [2]:
classifier_s = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [3]:
sentences_s = [
    "This is a testing sentence",
    "This is another testing sentence, which is literally testing the pipeline function",
    "I am not quite sure how to use this transformer libary, but I do like its high level functionalities",
    "Hello my friend",
    "Hello, Bart?"
]

In [5]:
classifier_s(sentences_s)

[{'label': 'NEGATIVE', 'score': 0.913568913936615},
 {'label': 'NEGATIVE', 'score': 0.9940857887268066},
 {'label': 'POSITIVE', 'score': 0.9987115859985352},
 {'label': 'POSITIVE', 'score': 0.9990572333335876},
 {'label': 'POSITIVE', 'score': 0.9946510195732117}]

In [6]:
classifier_z = pipeline("zero-shot-classification")
sentences_z = [
    "This is a course about the Transformers library",
    "I have used my keyboard for more than 10 years",
    "I like to study Artificial Intelligence"
]
labels_z = ["Education", "Politics", "Business", "University"]

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [7]:
classifier_z(sentences_z, labels_z)

[{'sequence': 'This is a course about the Transformers library',
  'labels': ['University', 'Education', 'Business', 'Politics'],
  'scores': [0.32160961627960205,
   0.3018229603767395,
   0.262361079454422,
   0.11420632153749466]},
 {'sequence': 'I have used my keyboard for more than 10 years',
  'labels': ['Business', 'University', 'Education', 'Politics'],
  'scores': [0.3691573143005371,
   0.26054561138153076,
   0.2151063233613968,
   0.15519072115421295]},
 {'sequence': 'I like to study Artificial Intelligence',
  'labels': ['University', 'Business', 'Education', 'Politics'],
  'scores': [0.6254051327705383,
   0.16458508372306824,
   0.12717516720294952,
   0.08283459395170212]}]

In [8]:
generator_g = pipeline("text-generation")

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [9]:
generator_g("In this course, we will teach you how to")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, we will teach you how to create and manage accounts and how to change and manage email from a mobile device.\n\nThe Course Overview in this course is aimed at students looking for a start in using Google Gmail with Google account management'}]

In [10]:
generator_g("In this course, we will teach you how to")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, we will teach you how to develop and utilize more of your personal data. This course will be the first step in the transformation of your lives — in your own data, in your personal conversations…\n\n*This course is for'}]

In [11]:
generator_g("In this course, we will teach you how to", max_length=100)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, we will teach you how to construct a simple Python command-line program for you to use with your Python-enabled devices.\n\nTo make the command line program easy to create, use python and make.\n\nIt will compile the current directory, then execute the python-config command.\n\nThen it will run all the commands you may already have. It is recommended to ensure that the directories in your current directory are identical.\n\nTo make sure that your Python'}]

In [12]:
# Using the lighter version of GPT2 created by the Hugging Face team
generator_gl = pipeline("text-generation", model="distilgpt2")

In [13]:
generator_gl("In the year 2023, it is expected to", max_length=100, num_return_sequences=2)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In the year 2023, it is expected to add at least five to five more years, with the average annual value of about $10,000 per year.'},
 {'generated_text': 'In the year 2023, it is expected to see more women entering the workforce. If so, the United States currently stands at 25,000 women, the U.S. would be the 12th most active state in terms of workforce participation since 1972.'}]

In [14]:
# It appears that there is no randomness in this model.
unmasker = pipeline("fill-mask")
unmasker("This course will teach you all about <mask> models.", top_k=2)

No model was supplied, defaulted to distilroberta-base and revision ec58a5b (https://huggingface.co/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'score': 0.1961977779865265,
  'token': 30412,
  'token_str': ' mathematical',
  'sequence': 'This course will teach you all about mathematical models.'},
 {'score': 0.04052717983722687,
  'token': 38163,
  'token_str': ' computational',
  'sequence': 'This course will teach you all about computational models.'}]

In [15]:
# Named Entity Recognition
# grouped_entities=True makes the model to group words in the same entity.
ner = pipeline("ner", grouped_entities=True)
ner("My name is Jinyoung and I study Artificial Intelligence in the University of Edinburgh.")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'entity_group': 'PER',
  'score': 0.9965486,
  'word': 'Jinyoung',
  'start': 11,
  'end': 19},
 {'entity_group': 'MISC',
  'score': 0.73366773,
  'word': '##ial Intelligence',
  'start': 39,
  'end': 55},
 {'entity_group': 'ORG',
  'score': 0.9847757,
  'word': 'University of Edinburgh',
  'start': 63,
  'end': 86}]

In [16]:
ner("Jinyoung, who is a student of UOE, is studying AI. He is living in Edinburgh.")

[{'entity_group': 'PER',
  'score': 0.98528486,
  'word': 'Jinyoung',
  'start': 0,
  'end': 8},
 {'entity_group': 'ORG',
  'score': 0.99805605,
  'word': 'UOE',
  'start': 30,
  'end': 33},
 {'entity_group': 'MISC',
  'score': 0.93869734,
  'word': 'AI',
  'start': 47,
  'end': 49},
 {'entity_group': 'LOC',
  'score': 0.9982577,
  'word': 'Edinburgh',
  'start': 67,
  'end': 76}]

In [17]:
question_answerer = pipeline("question-answering")

question = "Where do I study?"
question_wrong = "Wher I stdy do?"

context = "My name is Jinyoung, and I study AI at the University of Edinburgh"
context_wrong = "my nam is Jinyoung, and study I ai at univ of edinburgh"

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [18]:
question_answerer(question=question, context=context)

{'score': 0.5356670618057251,
 'start': 43,
 'end': 66,
 'answer': 'University of Edinburgh'}

In [19]:
question_answerer(question=question_wrong, context=context_wrong)

{'score': 0.25569677352905273,
 'start': 24,
 'end': 55,
 'answer': 'study I ai at univ of edinburgh'}

In [20]:
summariser = pipeline("summarization")

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [21]:
full_text = """The UK's largest supermarket, Tesco, and discounter Aldi have said they are putting limits of three per customer on sales of tomatoes, peppers, and cucumbers. 
Asda has capped sales of lettuce, salad bags, broccoli, cauliflowers and raspberry punnets to three per customer, along with tomatoes, peppers and cucumbers. 
And Morrisons has set limits of two on cucumbers, tomatoes, lettuce and peppers. 
Tomatoes and peppers seem to be the worst affected at both retailers but its unclear whether this is because they are popular.
Other major UK supermarkets have also been hit by the shortages but have not yet introduced limits for customers."""

In [22]:
summariser(full_text)

Your max_length is set to 142, but you input_length is only 141. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=70)


[{'summary_text': ' Tesco and Aldi have said they are putting limits of three per customer on sales of tomatoes, peppers, and cucumbers . Asda has capped sales of lettuce, salad bags, broccoli, cauliflowers and raspberry punnets . Morrisons has set limits of two on cucumbers, tomatoes, lettuce and peppers .'}]

In [23]:
translator_fe = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")

Downloading:   0%|          | 0.00/802k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/778k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.34M [00:00<?, ?B/s]



In [24]:
translator_fe("Ce cours est produit par Hugging Face.")

[{'translation_text': 'This course is produced by Hugging Face.'}]