# First code
We just use the [pipeline](https://huggingface.co/docs/transformers/v4.30.0/en/main_classes/pipelines#transformers.pipeline) object to classify a text as possitive or negative.
There is a message warning that it will use a default model: [distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)

In [9]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier("I've been waiting for a HuggingFace course my whole life.")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9598048329353333}]

## Different, more difficult text
This one seems too difficult. The sentiment is not negative, but might be a little ambiguous. 

In [10]:
classifier = pipeline("sentiment-analysis")
classifier("I would say this is bad, but I would be lying")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'NEGATIVE', 'score': 0.9988917708396912}]

## Different model
A different model seems to perform slightly better, but not enough. By the way, each model has to be downloaded locally, and they can take up to a few GBs.

In [11]:
classifier = pipeline(model="ProsusAI/finbert")
classifier("I would say this is bad, but I would be lying")

Downloading: 100%|██████████| 2.55k/2.55k [00:00<00:00, 679kB/s]


[{'label': 'negative', 'score': 0.7322509288787842}]

## Test with multiple entries
Finbert does not work very well here

In [12]:
classifier1 = pipeline("sentiment-analysis")
classifier1(["This give me the creeps", "This is cool"])

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'NEGATIVE', 'score': 0.6937301158905029},
 {'label': 'POSITIVE', 'score': 0.9998584985733032}]

In [13]:
classifier2 = pipeline(model="ProsusAI/finbert")
classifier2(["This give me the creeps", "This is cool"])

Downloading: 100%|██████████| 2.55k/2.55k [00:00<00:00, 1.94MB/s]


[{'label': 'neutral', 'score': 0.7400815486907959},
 {'label': 'neutral', 'score': 0.8911979794502258}]

# Zero-shot classification
In zero-shot classification, we try to identify class members absent in the training phase. We can do that by having semantic descriptions of such classes or having similar classes contiguous in a given space and then predicting the position in the space for a given sample.

In [14]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier(
    "This is a course about the Transformers library",
    candidate_labels=["education", "politics", "business"],
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'sequence': 'This is a course about the Transformers library',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.8445973992347717, 0.11197542399168015, 0.04342719167470932]}

A different example, not from the course:

In [15]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier(
    "Real Madrid looks for a new striker while refinancing its debt",
    candidate_labels=["sports", "new", "economics", "politics", "science"],
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'sequence': 'Real Madrid looks for a new striker while refinancing its debt',
 'labels': ['sports', 'new', 'economics', 'politics', 'science'],
 'scores': [0.4511820673942566,
  0.3597985506057739,
  0.12867510318756104,
  0.04000959172844887,
  0.020334627479314804]}

As always, we can change models, selecting one for the right pipeline in the Models page. For example, for the zero-shot-classification we have these: https://huggingface.co/models?pipeline_tag=zero-shot-classification&sort=trending.

Also, you can create a space to run all these examples (a little slow, but it works.)

# Mask filling
Fills a void in a text. Pretty straightforward.

In [16]:
from transformers import pipeline

unmasker = pipeline("fill-mask")
unmasker("This course will teach you all about <mask> models.", top_k=2)

No model was supplied, defaulted to distilroberta-base and revision ec58a5b (https://huggingface.co/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'score': 0.1961982399225235,
  'token': 30412,
  'token_str': ' mathematical',
  'sequence': 'This course will teach you all about mathematical models.'},
 {'score': 0.04052727669477463,
  'token': 38163,
  'token_str': ' computational',
  'sequence': 'This course will teach you all about computational models.'}]

# Named entity recognition
Here, we look for

In [17]:
from transformers import pipeline

ner = pipeline("ner", grouped_entities=True)
ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading: 100%|██████████| 998/998 [00:00<00:00, 754kB/s]
Downloading: 100%|██████████| 1.33G/1.33G [00:13<00:00, 99.7MB/s]
Downloading: 100%|██████████| 60.0/60.0 [00:00<00:00, 81.0kB/s]
Downloading: 100%|██████████| 213k/213k [00:00<00:00, 3.73MB/s]


[{'entity_group': 'PER',
  'score': 0.9981694,
  'word': 'Sylvain',
  'start': 11,
  'end': 18},
 {'entity_group': 'ORG',
  'score': 0.97960186,
  'word': 'Hugging Face',
  'start': 33,
  'end': 45},
 {'entity_group': 'LOC',
  'score': 0.9932106,
  'word': 'Brooklyn',
  'start': 49,
  'end': 57}]

In [18]:
from transformers import pipeline

ner = pipeline("ner", grouped_entities=True)
ner("My new MacBook Pro has been delivered by Amazon to A Coruña, so I will have a good July to work at Joor")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'entity_group': 'MISC',
  'score': 0.9950081,
  'word': 'MacBook Pro',
  'start': 7,
  'end': 18},
 {'entity_group': 'ORG',
  'score': 0.997255,
  'word': 'Amazon',
  'start': 41,
  'end': 47},
 {'entity_group': 'LOC',
  'score': 0.951234,
  'word': 'A Coruña',
  'start': 51,
  'end': 59},
 {'entity_group': 'ORG',
  'score': 0.74338853,
  'word': 'Joor',
  'start': 99,
  'end': 103}]

This one is rather good, it correctly identifies A Coruña as a place and Joor as an org, and assign the Misc category to the MacBook Pro (there are only four categories in the default bert-large-cased-finetuned-conll03-english model: ORG, LOC, PER, and MISC)