Let's have a quick look at the 🤗 Transformers library features. The library downloads pretrained models for Natural Language Understanding (NLU) tasks, such as analyzing the sentiment of a text, and Natural Language Generation (NLG), such as completing a prompt with new text or translating in another language.

First we will see how to easily leverage the pipeline API to quickly use those pretrained models at inference. Then, we will dig a little bit more and see how the library gives you access to those models and helps you preprocess your data.

In [1]:
pip install transformers

Note: you may need to restart the kernel to use updated packages.


# Getting started on a task with a pipeline

The easiest way to use a pretrained model on a given task is to use pipeline. 🤗 Transformers provides the following tasks out of the box:

Sentiment analysis: is a text positive or negative?
Text generation (in English): provide a prompt and the model will generate what follows.
Name entity recognition (NER): in an input sentence, label each word with the entity it represents (person, place, etc.)
Question answering: provide the model with some context and a question, extract the answer from the context.
Filling masked text: given a text with masked words (e.g., replaced by [MASK]), fill the blanks.
Summarization: generate a summary of a long text.
Translation: translate a text in another language.
Feature extraction: return a tensor representation of the text.

In [2]:
from transformers import pipeline
classifier = pipeline('sentiment-analysis')

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.






All PyTorch model weights were used when initializing TFDistilBertForSequenceClassification.

All the weights of TFDistilBertForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForSequenceClassification for predictions without further training.


By default, the model downloaded for this pipeline is called "distilbert-base-uncased-finetuned-sst-2-english". We can look at its model page to get more information about it(https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english). It uses the DistilBERT architecture and has been fine-tuned on a dataset called SST-2 for the sentiment analysis task.

In [3]:
classifier('We are very happy to show you the Transformers library.')

[{'label': 'POSITIVE', 'score': 0.9997994303703308}]

In [4]:
classifier('The pizza is not that great but the crust is awesome.')

[{'label': 'POSITIVE', 'score': 0.9998461008071899}]

In [5]:
classifier('The pizza is very bad')

[{'label': 'NEGATIVE', 'score': 0.9998202919960022}]

In [6]:
classifier('The pizza may be good or  bad')

[{'label': 'NEGATIVE', 'score': 0.9897701740264893}]

In [7]:
results = classifier(["We are very happy to show you the Transformers library.",
           "We hope you don't hate it.", "I hope you will love to join NLP class" ])
for result in results:
    print(f"label: {result['label']}, with score: {round(result['score'],4)}")

label: POSITIVE, with score: 0.9998
label: NEGATIVE, with score: 0.5309
label: POSITIVE, with score: 0.9996


# Finetuned model


Applying the tags "French" and "text-classification" gives back a suggestion "nlptown/bert-base-multilingual-uncased-sentiment". Let's see how we can use it.

In [7]:
classifier = pipeline('sentiment-analysis', model="nlptown/bert-base-multilingual-uncased-sentiment") #multiple languages can be analyzes

tf_model.h5:   0%|          | 0.00/670M [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Some layers from the model checkpoint at nlptown/bert-base-multilingual-uncased-sentiment were not used when initializing TFBertForSequenceClassification: ['dropout_37']
- This IS expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertForSequenceClassification were initialized f

In [8]:
classifier('The pizza is very bad')

[{'label': '1 star', 'score': 0.6116584539413452}]

In [9]:
classifier('The pizza is great')

[{'label': '5 stars', 'score': 0.6953846216201782}]

In [10]:
classifier("खाना खराब है")

[{'label': '3 stars', 'score': 0.40696874260902405}]

In [12]:
classifier("ये कपड़े अच्छे हैं")

[{'label': '3 stars', 'score': 0.31204473972320557}]

In [16]:
classifier('Here chats are good')

[{'label': '4 stars', 'score': 0.4548669457435608}]