learning from:
https://huggingface.co/course/chapter1/3?fw=pt

In [2]:
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModelForMaskedLM.from_pretrained("bert-base-uncased")

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


# Working with pipelines

In [1]:
from transformers import pipeline

In [4]:
classifier = pipeline("sentiment-analysis")
classifier("I've been waiting for a HuggingFace course my whole life.")

[{'label': 'POSITIVE', 'score': 0.9598048329353333}]

In [5]:
classifier([
    "I've been waiting for a HuggingFace course my whole life.", 
    "I hate this so much!"
])

[{'label': 'POSITIVE', 'score': 0.9598048329353333},
 {'label': 'NEGATIVE', 'score': 0.9994558095932007}]

# Zero-shot classification

In [6]:
classifier = pipeline("zero-shot-classification")
classifier(
    "This is a course about the Transformers library",
    candidate_labels=["education", "politics", "business"],
)

{'sequence': 'This is a course about the Transformers library',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.844597339630127, 0.11197531968355179, 0.04342734441161156]}

In [7]:
classifier(
    "I should learn something",
    candidate_labels=["education", "politics", "business"],
)

{'sequence': 'I should learn something',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.8801374435424805, 0.08418574184179306, 0.035676758736371994]}

In [15]:
classifier(
    "I should work hard",
    candidate_labels=['motivation', 'dedication', 'neutral','yoyo','no'],
)

{'sequence': 'I should work hard',
 'labels': ['motivation', 'dedication', 'yoyo', 'no', 'neutral'],
 'scores': [0.5192444324493408,
  0.38820451498031616,
  0.03955971822142601,
  0.039405904710292816,
  0.013585428707301617]}

# Text generation

In [6]:
generator = pipeline("text-generation")

In [7]:
generator("Let us start",max_length=70,
    num_return_sequences=2,)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Let us start with some facts:\n\nNo. 1. The average level of disability for women is 14%, compared to 15% for men (4). Since women are still underrepresented in tech, the median IQ, which is the measure for average IQ in the workforce, is about 10. This means that women can read, write and be'},
 {'generated_text': 'Let us start with the idea that it is highly unlikely that the people that carry this message will have any other ideas. Their idea is that they will be killed by the machine, they may need money to survive in the future, or they may just be scared of having their lives stolen out again. That the people who carry it are not likely to'}]

# Using any model from the Hub in a pipeline

In [24]:
generator2 = pipeline("text-generation", model="distilgpt2")
generator2(
    "Let us start",
    max_length=70,
    num_return_sequences=2,
)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "Let us start to understand some of their actions and actions.\n\nHere is a list of the triggers for each of our actions and actions, click here to read how to get started on your website.\nClick here to find out what we're doing using this tool\nFor those who haven't already, click here to browse through our plugins:"},
 {'generated_text': 'Let us start the week, then it\'s best not to do it until you come home. As the deadline grows longer, and you find yourself in a situation where you have to stop doing so and start doing what you did a couple of months ago after you left a long long-running fight with the UFC."'}]

# Mask filling

In [25]:
unmasker = pipeline("fill-mask")
unmasker("This course will teach you all about <mask> models.", top_k=2)

HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=480.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=331070498.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=898823.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=456318.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=1355863.0), HTML(value='')))




[{'sequence': 'This course will teach you all about mathematical models.',
  'score': 0.19619838893413544,
  'token': 30412,
  'token_str': ' mathematical'},
 {'sequence': 'This course will teach you all about computational models.',
  'score': 0.040527306497097015,
  'token': 38163,
  'token_str': ' computational'}]

# Named entity recognition


In [26]:
ner = pipeline("ner", grouped_entities=True)
ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")

HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=998.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=1334448817.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=213450.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=60.0), HTML(value='')))




[{'entity_group': 'PER',
  'score': 0.9981693774461746,
  'word': 'Sylvain',
  'start': 11,
  'end': 18},
 {'entity_group': 'ORG',
  'score': 0.9796020189921061,
  'word': 'Hugging Face',
  'start': 33,
  'end': 45},
 {'entity_group': 'LOC',
  'score': 0.9932105541229248,
  'word': 'Brooklyn',
  'start': 49,
  'end': 57}]

# Question answering

In [27]:
question_answerer = pipeline("question-answering")
question_answerer(
    question="Where do I work?",
    context="My name is Sylvain and I work at Hugging Face in Brooklyn"
)

HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=473.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=260793700.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=213450.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=435797.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=29.0), HTML(value='')))




{'score': 0.6949757933616638, 'start': 33, 'end': 45, 'answer': 'Hugging Face'}

# Summarization

In [2]:
summarizer = pipeline("summarization")

In [28]:
summarizer("""
    America has changed dramatically during recent years. Not only has the number of 
    graduates in traditional engineering disciplines such as mechanical, civil, 
    electrical, chemical, and aeronautical engineering declined, but in most of 
    the premier American universities engineering curricula now concentrate on 
    and encourage largely the study of engineering science. As a result, there 
    are declining offerings in engineering subjects dealing with infrastructure, 
    the environment, and related issues, and greater concentration on high 
    technology subjects, largely supporting increasingly complex scientific 
    developments. While the latter is important, it should not be at the expense 
    of more traditional engineering.

    Rapidly developing economies such as China and India, as well as other 
    industrial countries in Europe and Asia, continue to encourage and advance 
    the teaching of engineering. Both China and India, respectively, graduate 
    six and eight times as many traditional engineers as does the United States. 
    Other industrial countries at minimum maintain their output, while America 
    suffers an increasingly serious decline in the number of engineering graduates 
    and a lack of well-educated engineers.
""")

HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=1802.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=1222317369.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=898822.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=456318.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=26.0), HTML(value='')))




[{'summary_text': ' America has changed dramatically during recent years . The number of engineering graduates in the U.S. has declined in traditional engineering disciplines such as mechanical, civil,    electrical, chemical, and aeronautical engineering . Rapidly developing economies such as China and India continue to encourage and advance the teaching of engineering .'}]

In [4]:
summarizer("""I am on Oxford Academy’s Speech and Debate Team, in both the Parliamentary Debate division and the Lincoln-Douglass debate division. I write screenplays, short stories, and opinionated blogs and am a regular contributor to my school literary magazine, The Gluestick. I have accumulated over 300 community service hours that includes work at homeless shelters, libraries, and special education youth camps. I have been evaluated by the College Board and have placed within the top percentile.

But I am not any of these things. I am not a test score, nor a debater, nor a writer. I am an anti-nihilist punk rockphilosopher. And I became so when I realized three things:

1) That the world is ruled by underwear. There is a variety of underwear for a variety of people. You have your ironed briefs for your businessmen, your soft cottons for the average, and hemp-based underwear for your environmental romantics. But underwear do not only tell us about who we are, they also influence our daily interactions in ways most of us don't even understand. For example, I have a specific pair of underwear that is holey, worn out but surprisingly comfortable. And despite how trivial underwear might be, when I am wearing my favorite pair, I feel as if I am on top of the world. In any case, these articles of clothing affect our being and are the unsung heroes of comfort.

2) When I realized I cannot understand the world. I recently debated at the Orange County Speech League Tournament, within the Parliamentary Division. This specific branch of debate is an hour long, and consists of two parties debating either side of a current political issue. In one particular debate, I was assigned the topic: “Should Nation States eliminate nuclear arms?” It so happened that I was on the negative side and it was my job to convince the judges that countries should continue manufacturing nuclear weapons. During the debate, something strange happened: I realized that we are a special breed of species, that so much effort and resources are invested to ensure mutual destruction. And I felt that this debate in a small college classroom had elucidated something much more profound about the scale of human existence. In any case, I won 1st place at the tournament, but as the crowd cheered when my name was called to stand before an audience of hundreds of other debaters, and I flashed a victorious smile at the cameras, I couldn’t help but imagine that somewhere at that moment a nuclear bomb was being manufactured, adding to an ever-growing stockpile of doom. And that's when I realized that the world was something I will never understand.

3) When I realized I was a punk rocker philosopher. One summer night, my friend took me to an underground hardcore punk rock show. It was inside a small abandoned church. After the show, I met and became a part of this small community. Many were lost and on a constant soul-search, and to my surprise, many, like myself, did not have a blue Mohawk or a nose piercing. Many were just ordinary people discussing Nietzsche, string theory, and governmental ideologies. Many were also artists creating promotional posters and inventive slogans for stickers. They were all people my age who could not afford to be part of a record label and did something extraordinary by playing in these abandoned churches, making their own CDs and making thousands of promotional buttons by hand. I realized then that punk rock is not about music nor is it a guy with a blue Mohawk screaming protests. Punk rock is an attitude, a mindset, and very much a culture. It is an antagonist to the conventional. It means making the best with what you have to contribute to a community. This was when I realized that I was a punk rock philosopher.

The world I come from consists of underwear, nuclear bombs, and punk rockers. And I love this world. My world is inherently complex, mysterious, and anti-nihilist. I am David Phan, somebody who spends his weekends debating in a three piece suit, other days immersed within the punk rock culture, and some days writing opinionated blogs about underwear.

But why college? I want a higher education. I want more than just the textbook fed classrooms in high school. A community which prizes revolutionary ideals, a sharing of multi-dynamical perspectives, an environment that ultimately acts as a medium for movement, similar to the punk rock community. I do not see college as a mere stepping stone for a stable career or a prosperous life, but as a supplement for knowledge and self-empowerment; it is a social engine that will jettison us to our next paradigm shift.
""")

[{'summary_text': ' The world is ruled by underwear, says John Defterios . He says he is not a test score, nor a writer, but a punk rocker . He is a member of the Oxford Academy’s Speech and Debate Team . He has accumulated over 300 community service hours that includes work at homeless shelters, libraries and special schools .'}]

# Translation

In [1]:
from transformers import pipeline
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")
translator("Ce cours est produit par Hugging Face.")

HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=802397.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=778395.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=1339166.0), HTML(value='')))




HBox(children=(HTML(value='Downloading'), FloatProgress(value=0.0, max=42.0), HTML(value='')))




[{'translation_text': 'This course is produced by Hugging Face.'}]

In [30]:
!pip install sentencepiece

Collecting sentencepiece
  Downloading sentencepiece-0.1.95-cp38-cp38-manylinux2014_x86_64.whl (1.2 MB)
[K     |████████████████████████████████| 1.2 MB 4.8 MB/s eta 0:00:01
[?25hInstalling collected packages: sentencepiece
Successfully installed sentencepiece-0.1.95
