# Huggingface Transformers

Huggingface has many pipelines to explore. I am following the <a href="https://huggingface.co/course/chapter1/3?fw=pt">course</a> from Huggingface and trying out my own examples.

In [1]:
# For google colab, run this.
# !pip install "transformers[sentencepiece]"

## Sentiment-analysis

In [1]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier("I've been waiting for a HuggingFace course my whole life.")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


[{'label': 'POSITIVE', 'score': 0.9598048329353333}]

Just from downloading a model, we can easily use it. Let's try something else.

In [2]:
classifier("I want a hotdog.")

[{'label': 'NEGATIVE', 'score': 0.994057297706604}]

In [3]:
classifier("Oh, I really can't wait to waste my time at work today. :)")

[{'label': 'NEGATIVE', 'score': 0.9996824264526367}]

In [4]:
classifier("I am so full.")

[{'label': 'POSITIVE', 'score': 0.9997857213020325}]

In [5]:
classifier("I am so stuffed")

[{'label': 'NEGATIVE', 'score': 0.9731490612030029}]

In [6]:
classifier("I am so stuffed!")

[{'label': 'POSITIVE', 'score': 0.795582115650177}]

I am very imprssed with results, but there are some mysterious outcomes as well. How can adding an exclamation to the same sentence change the sentiment from negative to positive? And why does "I want a hotdog." have 99% negative?

But it is nice to pass different number of sentences int othe classifier.

In [8]:
classifier(
    ["I've been waiting for a HuggingFace course my whole life.", "I hate this so much!"]
)

[{'label': 'POSITIVE', 'score': 0.9598048329353333},
 {'label': 'NEGATIVE', 'score': 0.9994558691978455}]

## Zero Shot Classification

In binder, running zero-shot-classification fails and kills the kernel. :(

In [5]:
# This cell may fail in binder.
from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier(
    "This is a course about the Transformers library",
    candidate_labels=["education", "politics", "business"],
)

No model was supplied, defaulted to facebook/bart-large-mnli (https://huggingface.co/facebook/bart-large-mnli)


{'sequence': 'This is a course about the Transformers library',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.8445993661880493, 0.11197400093078613, 0.04342660307884216]}

In [10]:
classifier(
    "I must have a hotdog for the baseball game",
    candidate_labels=["game", "sports", "politics", "business", "food"],
)

{'sequence': 'I must have a hotdog for the baseball game',
 'labels': ['food', 'sports', 'game', 'business', 'politics'],
 'scores': [0.413107693195343,
  0.4104064106941223,
  0.17006781697273254,
  0.005597190000116825,
  0.0008209337247535586]}

These are very interesting result. 

## Text generation

Generating text is very fun. I had a lot of fun making things up.

In [6]:
from transformers import pipeline

generator = pipeline("text-generation")
generator("In this course, we will teach you how to")

No model was supplied, defaulted to gpt2 (https://huggingface.co/gpt2)
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "In this course, we will teach you how to create and use a web service in Python.\n\nThis class is intended for seasoned engineers who know Python, who haven't done anything in the past when writing web applications. We'll help you build"}]

This seems fun. I can generate a story to read or start writing.

In [12]:
generator("When I go to grocery shopping, ")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'When I go to grocery shopping, \xa0the last thing I want is for people in my family or for someone in my church or any other gathering place to be wearing a hoodie and carrying any other sort of makeup, but for me that always'}]

When I don't know what I want to eat for dinner, I can generate an answer.

In [13]:
generator("For dinner, I want to eat ")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'For dinner, I want to eat \xa0a huge bowl of stew \xa0with \xa0a\xa0greater volume. Once this is done, I pour \xa0salsa of chiles in the bottom of the bowl with my white wine or'}]

I can even figure out the meaning of life.

In [14]:
generator("Meaning of life is ")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Meaning of life is \xa0the value to me of living in my own way. And as I see things at work, so too with life, I see opportunities that bring me to them in the way that I really love our lives. How'}]

With <code>num_return_sequences</code>, I can get more options to choose from.

In [15]:
generator("For the vacation, I want to ", num_return_sequences=2)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "For the vacation, I want to \xa0explain what I like, why I like it, and what I wouldn't be doing if I only had $1,000,000.00. I don't have to explain something that I'm"},
 {'generated_text': "For the vacation, I want to iced tea. There is a table to sit down to discuss my recent experience, but I'm sure anyone can make it. A quick note on how tea is different from a traditional beverage: most of the tea"}]

When I want shorter options, I can limit it with <code>max_length</code>.

In [17]:
generator("When I visited Uzbekistan, ", max_length=30, num_return_sequences=3)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'When I visited Uzbekistan, \xa0at least some aspects of my life were very familiar to me and I was still somewhat shaken up but my mind'},
 {'generated_text': 'When I visited Uzbekistan, \xa0I felt very different. I knew that all in all the people were quite generous with their taxes. It took'},
 {'generated_text': "When I visited Uzbekistan, \xa0it became clear how hard it must be to stay, let alone manage one's daily life. I was introduced"}]

Pipelines can also be used with other models. This uses distilgpt2. 

In [18]:
from transformers import pipeline

generator = pipeline("text-generation", model="distilgpt2")
generator(
    "In this course, we will teach you how to",
    max_length=30,
    num_return_sequences=2,
)

Downloading:   0%|          | 0.00/762 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/336M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/0.99M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, we will teach you how to make your own choices and will teach you how to make your own choices. While we will be teaching'},
 {'generated_text': 'In this course, we will teach you how to apply and change to the game as you prepare. You may notice that certain types of characters are used'}]

## Fill Mask

With fill-mask, it is possible to solve fill in the blanks questions.

In [19]:
from transformers import pipeline

unmasker = pipeline("fill-mask")
unmasker("This course will teach you all about <mask> models.", top_k=2)

No model was supplied, defaulted to distilroberta-base (https://huggingface.co/distilroberta-base)


Downloading:   0%|          | 0.00/480 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/316M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

[{'score': 0.19619765877723694,
  'token': 30412,
  'token_str': ' mathematical',
  'sequence': 'This course will teach you all about mathematical models.'},
 {'score': 0.04052719473838806,
  'token': 38163,
  'token_str': ' computational',
  'sequence': 'This course will teach you all about computational models.'}]

Although it is not very skilled, the unmasker knows it has to be an adjective and color. 

In [20]:
unmasker("The strawberry is a juicy, edible fruit which has a <mask> color when it is ripe.")

[{'score': 0.1129533052444458,
  'token': 14327,
  'token_str': ' purple',
  'sequence': 'The strawberry is a juicy, edible fruit which has a purple color when it is ripe.'},
 {'score': 0.09302297234535217,
  'token': 5718,
  'token_str': ' yellow',
  'sequence': 'The strawberry is a juicy, edible fruit which has a yellow color when it is ripe.'},
 {'score': 0.09059570729732513,
  'token': 9030,
  'token_str': ' golden',
  'sequence': 'The strawberry is a juicy, edible fruit which has a golden color when it is ripe.'},
 {'score': 0.0816061869263649,
  'token': 6907,
  'token_str': ' pink',
  'sequence': 'The strawberry is a juicy, edible fruit which has a pink color when it is ripe.'},
 {'score': 0.05910441651940346,
  'token': 11577,
  'token_str': ' vibrant',
  'sequence': 'The strawberry is a juicy, edible fruit which has a vibrant color when it is ripe.'}]

It is kind of weird to find out that "Humans have four legs" has higher score than "Humans have two legs."

In [21]:
unmasker("Humans have <mask> legs.")

[{'score': 0.049486029893159866,
  'token': 10941,
  'token_str': ' shorter',
  'sequence': 'Humans have shorter legs.'},
 {'score': 0.04859047755599022,
  'token': 251,
  'token_str': ' long',
  'sequence': 'Humans have long legs.'},
 {'score': 0.04257293418049812,
  'token': 237,
  'token_str': ' four',
  'sequence': 'Humans have four legs.'},
 {'score': 0.033590491861104965,
  'token': 80,
  'token_str': ' two',
  'sequence': 'Humans have two legs.'},
 {'score': 0.029865659773349762,
  'token': 765,
  'token_str': ' short',
  'sequence': 'Humans have short legs.'}]

## Named Entity Recognization

Running unmasker on bert-base-cased model on my computer does not work. I had to restart mine. Then, I had to try it out on Inference API widget on hugginface website.

In [22]:
from transformers import pipeline

ner = pipeline("ner", grouped_entities=True)
ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english)


Downloading:   0%|          | 0.00/998 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.24G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/208k [00:00<?, ?B/s]



[{'entity_group': 'PER',
  'score': 0.9981694,
  'word': 'Sylvain',
  'start': 11,
  'end': 18},
 {'entity_group': 'ORG',
  'score': 0.9796019,
  'word': 'Hugging Face',
  'start': 33,
  'end': 45},
 {'entity_group': 'LOC',
  'score': 0.9932106,
  'word': 'Brooklyn',
  'start': 49,
  'end': 57}]

In [25]:
ner = pipeline("ner", grouped_entities=True)
ner(" On her thirteenth birthday, Anne Frank began keeping " + 
    "a diary during the Nazi occupation of the Netherlands in World War II.")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english)


[{'entity_group': 'PER',
  'score': 0.9991078,
  'word': 'Anne Frank',
  'start': 29,
  'end': 39},
 {'entity_group': 'MISC',
  'score': 0.99665976,
  'word': 'Nazi',
  'start': 73,
  'end': 77},
 {'entity_group': 'LOC',
  'score': 0.99951077,
  'word': 'Netherlands',
  'start': 96,
  'end': 107},
 {'entity_group': 'MISC',
  'score': 0.97047836,
  'word': 'World War II',
  'start': 111,
  'end': 123}]

Running ner without grouped_entities can be very expensive. 

## Question Answering

Question-answering can solve questions. 

In [26]:
from transformers import pipeline

question_answerer = pipeline("question-answering")
question_answerer(
    question="Where do I work?",
    context="My name is Sylvain and I work at Hugging Face in Brooklyn",
)

No model was supplied, defaulted to distilbert-base-cased-distilled-squad (https://huggingface.co/distilbert-base-cased-distilled-squad)


Downloading:   0%|          | 0.00/473 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/249M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/208k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/426k [00:00<?, ?B/s]

{'score': 0.694976270198822, 'start': 33, 'end': 45, 'answer': 'Hugging Face'}

In [27]:
question_answerer(
    question="What was damaged from Notre-Dame fire?",
    context="The Notre-Dame fire broke out in the cathedral of Notre-Dame de Paris on 15 April 2019," + 
    " causing severe damage to the building's spire, roof, and upper walls. "
)

{'score': 0.377182275056839,
 'start': 113,
 'end': 156,
 'answer': "the building's spire, roof, and upper walls"}

I think it is pretty good. 

## Summarization

Summarization can summarize long texts. 

In [3]:
from transformers import pipeline

summarizer = pipeline("summarization")
summarizer(
    """
    America has changed dramatically during recent years. Not only has the number of 
    graduates in traditional engineering disciplines such as mechanical, civil, 
    electrical, chemical, and aeronautical engineering declined, but in most of 
    the premier American universities engineering curricula now concentrate on 
    and encourage largely the study of engineering science. As a result, there 
    are declining offerings in engineering subjects dealing with infrastructure, 
    the environment, and related issues, and greater concentration on high 
    technology subjects, largely supporting increasingly complex scientific 
    developments. While the latter is important, it should not be at the expense 
    of more traditional engineering.

    Rapidly developing economies such as China and India, as well as other 
    industrial countries in Europe and Asia, continue to encourage and advance 
    the teaching of engineering. Both China and India, respectively, graduate 
    six and eight times as many traditional engineers as does the United States. 
    Other industrial countries at minimum maintain their output, while America 
    suffers an increasingly serious decline in the number of engineering graduates 
    and a lack of well-educated engineers.
"""
)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 (https://huggingface.co/sshleifer/distilbart-cnn-12-6)


Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

[{'summary_text': ' America has changed dramatically during recent years . The number of engineering graduates in the U.S. has declined in traditional engineering disciplines such as mechanical, civil,    electrical, chemical, and aeronautical engineering . Rapidly developing economies such as China and India continue to encourage and advance the teaching of engineering .'}]

I got this text regarding sword from <a href="https://en.wikipedia.org/wiki/Sword">wikipedia</a>.  

In [6]:
summarizer("""
A sword is an edged, bladed weapon intended for manual cutting or 
thrusting. Its blade, longer than a knife or dagger, is attached 
to a hilt and can be straight or curved. A thrusting sword tends to
have a straighter blade with a pointed tip. A slashing sword is more
likely to be curved and to have a sharpened cutting edge on one or both
sides of the blade. Many swords are designed for both thrusting and 
slashing. The precise definition of a sword varies by historical epoch
and geographic region.

Historically, the sword developed in the Bronze Age, evolving from the dagger;
the earliest specimens date to about 1600 BC. The later Iron Age sword remained
fairly short and without a crossguard. The spatha, as it developed in the Late
Roman army, became the predecessor of the European sword of the Middle Ages, at
first adopted as the Migration Period sword, and only in the High Middle Ages,
developed into the classical arming sword with crossguard. The word sword 
continues the Old English, sweord.[1]

The use of a sword is known as swordsmanship or, in a modern context, as fencing.
In the Early Modern period, western sword design diverged into two forms, the 
thrusting swords and the sabers.

Thrusting swords such as the rapier and eventually the smallsword were designed to
impale their targets quickly and inflict deep stab wounds. Their long and straight
yet light and well balanced design made them highly maneuverable and deadly in a
duel but fairly ineffective when used in a slashing or chopping motion. A well
aimed lunge and thrust could end a fight in seconds with just the sword's point,
leading to the development of a fighting style which closely resembles modern fencing.

The sabre and similar blades such as the cutlass were built more heavily and were
more typically used in warfare. Built for slashing and chopping at multiple enemies,
often from horseback, the saber's long curved blade and slightly forward weight 
balance gave it a deadly character all its own on the battlefield. Most sabers also
had sharp points and double-edged blades, making them capable of piercing soldier
after soldier in a cavalry charge. Sabers continued to see battlefield use until
the early 20th century. The US Navy kept tens of thousands of sturdy cutlasses in
their armory well into World War II and many were issued to Marines in the Pacific
as jungle machetes.

Non-European weapons classified as swords include single-edged weapons such as the
Middle Eastern scimitar, the Chinese Dao and the related Japanese katana. The
Chinese jiàn 剑 is an example of a non-European double-edged sword, like the European
models derived from the double-edged Iron Age sword.
""")

[{'summary_text': ' A sword is an edged, bladed weapon intended for manual cutting or thrusting . Its blade, longer than a knife or dagger, is attached  to a hilt and can be straight or curved . The precise definition of a sword varies by historical epoch and geographic region .'}]

It is pretty much the first two sentences and the last one from the first paragraph. It is not that impressive. Maybe the model is trying way too hard to squeeze six paragraphs into three sentences. I will increase the mininum_length to see how it performs.

In [12]:
summarizer("""
A sword is an edged, bladed weapon intended for manual cutting or 
thrusting. Its blade, longer than a knife or dagger, is attached 
to a hilt and can be straight or curved. A thrusting sword tends to
have a straighter blade with a pointed tip. A slashing sword is more
likely to be curved and to have a sharpened cutting edge on one or both
sides of the blade. Many swords are designed for both thrusting and 
slashing. The precise definition of a sword varies by historical epoch
and geographic region.

Historically, the sword developed in the Bronze Age, evolving from the dagger;
the earliest specimens date to about 1600 BC. The later Iron Age sword remained
fairly short and without a crossguard. The spatha, as it developed in the Late
Roman army, became the predecessor of the European sword of the Middle Ages, at
first adopted as the Migration Period sword, and only in the High Middle Ages,
developed into the classical arming sword with crossguard. The word sword 
continues the Old English, sweord.[1]

The use of a sword is known as swordsmanship or, in a modern context, as fencing.
In the Early Modern period, western sword design diverged into two forms, the 
thrusting swords and the sabers.

Thrusting swords such as the rapier and eventually the smallsword were designed to
impale their targets quickly and inflict deep stab wounds. Their long and straight
yet light and well balanced design made them highly maneuverable and deadly in a
duel but fairly ineffective when used in a slashing or chopping motion. A well
aimed lunge and thrust could end a fight in seconds with just the sword's point,
leading to the development of a fighting style which closely resembles modern fencing.

The sabre and similar blades such as the cutlass were built more heavily and were
more typically used in warfare. Built for slashing and chopping at multiple enemies,
often from horseback, the saber's long curved blade and slightly forward weight 
balance gave it a deadly character all its own on the battlefield. Most sabers also
had sharp points and double-edged blades, making them capable of piercing soldier
after soldier in a cavalry charge. Sabers continued to see battlefield use until
the early 20th century. The US Navy kept tens of thousands of sturdy cutlasses in
their armory well into World War II and many were issued to Marines in the Pacific
as jungle machetes.

Non-European weapons classified as swords include single-edged weapons such as the
Middle Eastern scimitar, the Chinese Dao and the related Japanese katana. The
Chinese jiàn 剑 is an example of a non-European double-edged sword, like the European
models derived from the double-edged Iron Age sword.
""", min_length=120)

[{'summary_text': ' A sword is an edged, bladed weapon intended for manual cutting or thrusting . Its blade, longer than a knife or dagger, is attached  to a hilt and can be straight or curved . The precise definition of a sword varies by historical epoch and geographic region . The word sword is known as swordsmanship or, in a modern context, as fencing . In the Early Modern period, western sword design diverged into two forms, the \xa0thrusting swords and the sabers . The sabre and similar blades such as the cutlass were built more heavily and were typically used in warfare .'}]

With a larger room to work with, the summarizer works better. 

## Translation

Lastly, translator can translate from one language into another.

In [1]:
from transformers import pipeline

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")
translator("Ce cours est produit par Hugging Face.")

Downloading:   0%|          | 0.00/1.26k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/287M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/784k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/760k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.28M [00:00<?, ?B/s]



[{'translation_text': 'This course is produced by Hugging Face.'}]

Cool I guess, but I don't speak French. I have no idea what it is saying. So, let me use English to French and do a reverse translation.

In [14]:
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-fr")
translator("This course is produced by Hugging Face.")

Downloading:   0%|          | 0.00/1.26k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/287M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/760k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/784k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.28M [00:00<?, ?B/s]

[{'translation_text': 'Ce cours est produit par Hugging Face.'}]

Let's try Spanish this time.

In [5]:
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-es")
translator("Sorry, I don't speak Spanish.")

[{'translation_text': 'Lo siento, no hablo español.'}]

And the output matches with the previous input. Okay, let me try using Korean to English because I can speak Korean. (I used google translator to generate Korean. I wanted to do English to Korean, but it does not exist yet :( And I do not have Korean keyboard enabled, and it is faster to use google translate :P)

In [4]:
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-ko-en")
translator("허깅페이스에서 제작한 코스입니다.")

Downloading:   0%|          | 0.00/1.12k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/298M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/44.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/822k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/794k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.64M [00:00<?, ?B/s]

[{'translation_text': 'This is a course from Huggingspace.'}]

That is almost correct. Huggingface is a name, and it can be tricky to translate. I am surprised to see that it is translated to Hunggingspace, instead of Huggingpace. Literal translation would be Huggingpace because 'p' is always substituted for 'f' because there is no 'f' in Korean. Maybe the model is work in progress and made a mistake.

## Bias and limitation

Models have bias even if they were trained with unbibased data. Bert has a bias toward genders.

In [5]:
from transformers import pipeline

unmasker = pipeline("fill-mask", model="bert-base-uncased")
result = unmasker("Brian works as a [MASK].")
print([r["token_str"] for r in result])

result = unmasker("Amanda works as a [MASK].")
print([r["token_str"] for r in result])

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


['lawyer', 'farmer', 'teacher', 'bartender', 'mechanic']
['waitress', 'nurse', 'teacher', 'model', 'bartender']


The model also has a bias with races.

In [15]:
# The model associates an Orc with a slave.
result = unmasker("An Orc works as a [MASK].")
print([r["token_str"] for r in result])

['priest', 'shaman', 'slave', 'soldier', 'warrior']


In [8]:
# The model associates an Ogre with a slave/thief.
result = unmasker("An ogre works as a [MASK].")
print([r["token_str"] for r in result])

['blacksmith', 'priest', 'thief', 'slave', 'hunter']


In [14]:
# The model associates an Elf with a witch.
result = unmasker("An elf works as a [MASK].")
print([r["token_str"] for r in result])

['witch', 'priest', 'wizard', 'sorcerer', 'blacksmith']


In [16]:
# The model associates a Dwarf with a blacksmith.
result = unmasker("A dwarf works as a [MASK].")
print([r["token_str"] for r in result])

['cook', 'blacksmith', 'carpenter', 'priest', 'maid']


In [18]:
# The model associates a Hobbit with a messenger/thief.
result = unmasker("A hobbit works as a [MASK].")
print([r["token_str"] for r in result])

['messenger', 'magician', 'thief', 'carpenter', 'dog']


In [10]:
# Not sure what's going on here.
result = unmasker("A human works as a [MASK].")
print([r["token_str"] for r in result])

['machine', 'robot', 'priest', 'slave', 'hunter']


It is a good thing we do not have robots walking around with bias and limitations yet. A police robot with a strong bias toward race would be a disaster.