<a href="https://colab.research.google.com/github/hariomvyas/AIhub/blob/main/AIHub_Transformers_Template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# AIHub Transformers Template by Hariom Vyas

Transformers provides APIs and tools to easily download and train state-of-the-art pretrained models. Using pretrained models can reduce your compute costs, carbon footprint, and save you the time and resources required to train a model from scratch. These models support common tasks in different modalities, such as:

* 📝 Natural Language Processing: text classification, named entity recognition, question answering, language modeling, summarization, translation, multiple choice, and text generation.
* 🖼️ Computer Vision: image classification, object detection, and segmentation.
* 🗣️ Audio: automatic speech recognition and audio classification.
* 🐙 Multimodal: table question answering, optical character recognition, information extraction from scanned documents, video classification, and visual question answering.

Reference: https://huggingface.co/docs/transformers/index

In [None]:
!pip install transformers

Collecting transformers
  Downloading transformers-4.17.0-py3-none-any.whl (3.8 MB)
[K     |████████████████████████████████| 3.8 MB 7.5 MB/s 
Collecting tokenizers!=0.11.3,>=0.11.1
  Downloading tokenizers-0.11.6-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.5 MB)
[K     |████████████████████████████████| 6.5 MB 31.2 MB/s 
[?25hCollecting sacremoses
  Downloading sacremoses-0.0.49-py3-none-any.whl (895 kB)
[K     |████████████████████████████████| 895 kB 38.6 MB/s 
[?25hCollecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 42.9 MB/s 
Collecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.4.0-py3-none-any.whl (67 kB)
[K     |████████████████████████████████| 67 kB 4.2 MB/s 
Installing collected packages: pyyaml, tokenizers, sacremoses, huggingface-hub, transformers
  Attempting uninstall: pyyaml
    Fo

**Sentiment Analysis**

In [None]:
from transformers import pipeline
# Allocate a pipeline for sentiment-analysis
classifier = pipeline('sentiment-analysis')
classifier('We are very happy to introduce pipeline to the transformers repository.')

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


Downloading:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/255M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

[{'label': 'POSITIVE', 'score': 0.9996980428695679}]

**Text Summarization**

In [None]:
from transformers import pipeline
summarizer = pipeline("summarization")
summarizer("""
America has changed dramatically during recent years. Not only has the number of 
graduates in traditional engineering disciplines such as mechanical, civil,
electrical, chemical, and aeronautical engineering declined, but in most of the 
premier American universities engineering curricula now concentrate on and 
encourage largely the study of engineering science. As a result, there are 
declining offerings in engineering subjects dealing with infrastructure, the 
environment, and related issues, and greater concentration on high technology 
subjects, largely supporting increasingly complex scientific developments. While 
the latter is important, it should not be at the expense of more traditional 
engineering. Rapidly developing economies such as China and India, as well as 
other industrial countries in Europe and Asia, continue to encourage and advance 
the teaching of engineering. Both China and India, respectively, graduate six 
and eight times as many traditional engineers as does the United States. Other 
industrial countries at minimum maintain their output, while America suffers an 
increasingly serious decline in the number of engineering graduates and a lack 
of well-educated engineers. 
""")

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 (https://huggingface.co/sshleifer/distilbart-cnn-12-6)


Downloading:   0%|          | 0.00/1.76k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.14G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

[{'summary_text': ' America suffers an increasing decline in the number of engineering graduates and a lack of well-educated engineers . Rapidly developing economies such as China and India, as well as other industrial countries in Europe and Asia, continue to encourage and advance the teaching of engineering . The U.S. should not be at the expense of more traditional engineering .'}]

**Automatic Speech Recognition**

In [None]:
!pip install datasets

Collecting datasets
  Downloading datasets-2.0.0-py3-none-any.whl (325 kB)
[?25l[K     |█                               | 10 kB 25.5 MB/s eta 0:00:01[K     |██                              | 20 kB 22.3 MB/s eta 0:00:01[K     |███                             | 30 kB 16.7 MB/s eta 0:00:01[K     |████                            | 40 kB 15.0 MB/s eta 0:00:01[K     |█████                           | 51 kB 8.0 MB/s eta 0:00:01[K     |██████                          | 61 kB 9.3 MB/s eta 0:00:01[K     |███████                         | 71 kB 10.5 MB/s eta 0:00:01[K     |████████                        | 81 kB 9.4 MB/s eta 0:00:01[K     |█████████                       | 92 kB 10.4 MB/s eta 0:00:01[K     |██████████                      | 102 kB 9.0 MB/s eta 0:00:01[K     |███████████                     | 112 kB 9.0 MB/s eta 0:00:01[K     |████████████                    | 122 kB 9.0 MB/s eta 0:00:01[K     |█████████████                   | 133 kB 9.0 MB/s eta 0:00:0

In [None]:
!pip install transformers

Collecting transformers
  Downloading transformers-4.17.0-py3-none-any.whl (3.8 MB)
[K     |████████████████████████████████| 3.8 MB 8.2 MB/s 
Collecting sacremoses
  Downloading sacremoses-0.0.49-py3-none-any.whl (895 kB)
[K     |████████████████████████████████| 895 kB 46.3 MB/s 
Collecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 62.2 MB/s 
[?25hCollecting tokenizers!=0.11.3,>=0.11.1
  Downloading tokenizers-0.11.6-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.5 MB)
[K     |████████████████████████████████| 6.5 MB 44.2 MB/s 
Installing collected packages: pyyaml, tokenizers, sacremoses, transformers
  Attempting uninstall: pyyaml
    Found existing installation: PyYAML 3.13
    Uninstalling PyYAML-3.13:
      Successfully uninstalled PyYAML-3.13
Successfully installed pyyaml-6.0 sacremoses-0.0.49 tokenizers-0.11.6 

In [None]:
import datasets
import transformers
from transformers import pipeline
from transformers.pipelines.pt_utils import KeyDataset
from tqdm.auto import tqdm

pipe = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h", device=0)
dataset = datasets.load_dataset("superb", name="asr", split="test")

# KeyDataset (only *pt*) will simply return the item in the dict returned by the dataset item
# as we're not interested in the *target* part of the dataset.
for out in tqdm(pipe(KeyDataset(dataset, "file"))):
    print(out)

Downloading:   0%|          | 0.00/1.56k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/360M [00:00<?, ?B/s]

Some weights of Wav2Vec2ForCTC were not initialized from the model checkpoint at facebook/wav2vec2-base-960h and are newly initialized: ['wav2vec2.masked_spec_embed']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Downloading:   0%|          | 0.00/163 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/291 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/85.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/159 [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/7.53k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/7.02k [00:00<?, ?B/s]

Downloading and preparing dataset superb/asr (download: 6.59 GiB, generated: 12.99 MiB, post-processed: Unknown size, total: 6.60 GiB) to /root/.cache/huggingface/datasets/superb/asr/1.9.0/fc1f59e1fa54262dfb42de99c326a806ef7de1263ece177b59359a1a3354a9c9...


Downloading data files:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/338M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/347M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/6.39G [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/3 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/28539 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/2703 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/2620 [00:00<?, ? examples/s]

Dataset superb downloaded and prepared to /root/.cache/huggingface/datasets/superb/asr/1.9.0/fc1f59e1fa54262dfb42de99c326a806ef7de1263ece177b59359a1a3354a9c9. Subsequent calls will reuse this data.


  0%|          | 0/2620 [00:00<?, ?it/s]

{'text': 'HE HOPED THERE WOULD BE STEW FOR DINNER TURNIPS AND CARROTS AND BRUISED POTATOES AND FAT MUTTON PIECES TO BE LADLED OUT IN THICK PEPPERED FLOWER FAT AND SAUCE'}
{'text': 'STUFFERED INTO YOU HIS BELLY COUNSELLED HIM'}
{'text': 'AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS'}
{'text': 'HO BERTIE ANY GOOD IN YOUR MIND'}
{'text': 'NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND'}
{'text': "THE MUSIC CAME NEARER AND HE RECALLED THE WORDS THE WORDS OF SHELLY'S FRAGMENT UPON THE MOON WANDERING COMPANIONLESS PALE FOR WEARINESS"}
{'text': 'THE DULL LIGHT FELL MORE FAINTLY UPON THE PAGE WHEREON ANOTHER EQUATION BEGAN TO UNFOLD ITSELF SLOWLY AND TO SPREAD ABROAD ITS WIDENING TAIL'}
{'text': 'A COLD LUCID INDIFFERENCE REIGNED IN HIS SOUL'}
{'text': 'THE CHAOS IN WHICH HIS ARDOUR EXTINGUISHED ITSELF WAS A COLD INDIFFERENT KNOWLEDGE OF HIMSELF'}
{'text': 'AT MOST BY AN ALMS GIVEN TO A BEGGAR WHOSE BLESSING HE FLED FROM 

**Text Classifier**

In [None]:
from transformers import pipeline

classifier = pipeline("text-classification", model="roberta-large-mnli")


classifier=("""The Mars Orbiter Mission (MOM), also called Mangalyaan 
("Mars-craft", from Mangala, "Mars" and yāna, "craft, vehicle")
is a space probe orbiting Mars since 24 September 2014? It was launched on 5 
November 2013 by the Indian Space Research Organisation (ISRO). It is India's 
first interplanetary missionand it made India the fourth country to achieve Mars 
orbit, after Roscosmos, NASA, and the European Space Company. and it made India 
the first country to achieve this in the first attempt.The Mars Orbiter took off 
from the First Launch Pad at Satish Dhawan Space Centre (Sriharikota Range 
SHAR), Andhra Pradesh, using a Polar Satellite Launch Vehicle (PSLV) rocket C25 
at 09:08 UTC on 5 November 2013.
The launch window was approximately 20 days long and started on 28 October 2013.
The MOM probe spent about 36 days in  Earth orbit, where it made a series of 
seven apogee-raising orbital maneuvers before trans-Mars injectionon 30 November 
2013 (UTC).[23] After a 298-day long journey to Mars orbit, it was put into Mars
orbit on 24 September 2014.""")

Downloading:   0%|          | 0.00/688 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of the model checkpoint at roberta-large-mnli were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

In [None]:
classifier=pipeline("sentiment-analysis")
classifier("The launch window was approximately 20 days long and started on 28 October 2013")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


[{'label': 'NEGATIVE', 'score': 0.7651864886283875}]

**Question Answering**

In [None]:
context=r"""For the fifth year in a row, Finland is the world's happiest country, 
according to World Happiness Report rankings based largely on life evaluations 
from the Gallup World Poll.The Nordic country and its neighbors Denmark, Norway, 
Sweden and Iceland all score very well on the measures the report uses to 
explain its findings: healthy life expectancy, GDP per capita, social support in 
times of trouble, low corruption and high social trust, generosity in a 
community where people look after each other and freedom to make key life 
decisions.Denmark comes in at No. 2 in this year's rankings, followed by Iceland 
at No. 3. Sweden and Norway are seventh and eighth, respectively.
Switzerland, the Netherlands and Luxembourg take places 4 through 6, with Israel 
coming in at No. 9 and New Zealand rounding out the top 10."""
nlp = pipeline("question-answering")
result = nlp(question="What is happiest country?", context=context)
print(result['answer'])

No model was supplied, defaulted to distilbert-base-cased-distilled-squad (https://huggingface.co/distilbert-base-cased-distilled-squad)


Downloading:   0%|          | 0.00/473 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/249M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/208k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/426k [00:00<?, ?B/s]

Finland


**Named-Entity Recognition**

In [None]:
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

In [None]:
tokenizer = AutoTokenizer.from_pretrained("dslim/bert-base-NER")
model = AutoModelForTokenClassification.from_pretrained("dslim/bert-base-NER")

Downloading:   0%|          | 0.00/59.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/829 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/208k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/2.00 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/413M [00:00<?, ?B/s]

In [None]:
NER = pipeline("ner", model=model, tokenizer=tokenizer)
example = """For the fifth year in a row, Finland is the world's happiest country, 
according to World Happiness Report rankings based largely on life evaluations 
from the Gallup World Poll.The Nordic country and its neighbors Denmark, Norway, 
Sweden and Iceland all score very well on the measures the report uses to 
explain its findings: healthy life expectancy, GDP per capita, social support in 
times of trouble, low corruption and high social trust, generosity in a 
community where people look after each other and freedom to make key life 
decisions."""
results = NER(example)
results 

[{'end': 36,
  'entity': 'B-LOC',
  'index': 9,
  'score': 0.9998419,
  'start': 29,
  'word': 'Finland'},
 {'end': 89,
  'entity': 'B-MISC',
  'index': 22,
  'score': 0.99170893,
  'start': 84,
  'word': 'World'},
 {'end': 99,
  'entity': 'I-MISC',
  'index': 23,
  'score': 0.9969279,
  'start': 90,
  'word': 'Happiness'},
 {'end': 106,
  'entity': 'I-MISC',
  'index': 24,
  'score': 0.99759847,
  'start': 100,
  'word': 'Report'},
 {'end': 161,
  'entity': 'B-MISC',
  'index': 34,
  'score': 0.9940889,
  'start': 160,
  'word': 'G'},
 {'end': 164,
  'entity': 'B-MISC',
  'index': 35,
  'score': 0.986925,
  'start': 161,
  'word': '##all'},
 {'end': 166,
  'entity': 'I-MISC',
  'index': 36,
  'score': 0.99436355,
  'start': 164,
  'word': '##up'},
 {'end': 172,
  'entity': 'I-MISC',
  'index': 37,
  'score': 0.9967397,
  'start': 167,
  'word': 'World'},
 {'end': 177,
  'entity': 'I-MISC',
  'index': 38,
  'score': 0.9972402,
  'start': 173,
  'word': 'Poll'},
 {'end': 188,
  'entity'

**Zero-Shot Classification**

In [None]:
from transformers import pipeline
classifier=pipeline("zero-shot-classification")
sequence="""For the fifth year in a row, Finland is the world's happiest country, 
according to World Happiness Report rankings based largely on life evaluations 
from the Gallup World Poll.The Nordic country and its neighbors Denmark, Norway, 
Sweden and Iceland all score very well on the measures the report uses to 
explain its findings: healthy life expectancy, GDP per capita, social support in 
times of trouble, low corruption and high social trust, generosity in a 
community where people look after each other and freedom to make key life 
decisions."""
labels=['politics','social']
classifier(sequence, labels)

No model was supplied, defaulted to facebook/bart-large-mnli (https://huggingface.co/facebook/bart-large-mnli)


Downloading:   0%|          | 0.00/1.13k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.52G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

{'labels': ['social', 'politics'],
 'scores': [0.982653021812439, 0.017347019165754318],
 'sequence': "For the fifth year in a row, Finland is the world's happiest country, \naccording to World Happiness Report rankings based largely on life evaluations \nfrom the Gallup World Poll.The Nordic country and its neighbors Denmark, Norway, \nSweden and Iceland all score very well on the measures the report uses to \nexplain its findings: healthy life expectancy, GDP per capita, social support in \ntimes of trouble, low corruption and high social trust, generosity in a \ncommunity where people look after each other and freedom to make key life \ndecisions."}

In [None]:
!pip install transformers==4.12.4 sentencepiece

Collecting transformers==4.12.4
  Downloading transformers-4.12.4-py3-none-any.whl (3.1 MB)
[K     |████████████████████████████████| 3.1 MB 8.2 MB/s 
Collecting tokenizers<0.11,>=0.10.1
  Downloading tokenizers-0.10.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.3 MB)
[K     |████████████████████████████████| 3.3 MB 51.0 MB/s 
Installing collected packages: tokenizers, transformers
  Attempting uninstall: tokenizers
    Found existing installation: tokenizers 0.11.6
    Uninstalling tokenizers-0.11.6:
      Successfully uninstalled tokenizers-0.11.6
  Attempting uninstall: transformers
    Found existing installation: transformers 4.17.0
    Uninstalling transformers-4.17.0:
      Successfully uninstalled transformers-4.17.0
Successfully installed tokenizers-0.10.3 transformers-4.12.4


**Translation**

In [None]:
from transformers import *

In [None]:
# source & destination languages
src = "en"
dst = "de"

task_name = f"translation_{src}_to_{dst}"
model_name = f"Helsinki-NLP/opus-mt-{src}-{dst}"

translator  = pipeline(task_name, model=model_name, tokenizer=model_name)

Downloading:   0%|          | 0.00/750k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/778k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.21M [00:00<?, ?B/s]

In [None]:
translator("I love the color of your new car.")[0]["translation_text"]

'Ich liebe die Farbe deines neuen Autos.'

In [None]:
article = """
Albert Einstein ( 14 March 1879 – 18 April 1955) was a German-born theoretical physicist, widely acknowledged to be one of the greatest physicists of all time. 
Einstein is best known for developing the theory of relativity, but he also made important contributions to the development of the theory of quantum mechanics. 
Relativity and quantum mechanics are together the two pillars of modern physics. 
His mass–energy equivalence formula E = mc2, which arises from relativity theory, has been dubbed "the world's most famous equation". 
His work is also known for its influence on the philosophy of science.
He received the 1921 Nobel Prize in Physics "for his services to theoretical physics, and especially for his discovery of the law of the photoelectric effect", a pivotal step in the development of quantum theory. 
His intellectual achievements and originality resulted in "Einstein" becoming synonymous with "genius"
"""

In [None]:
translator(article)[0]["translation_text"]

'Albert Einstein (* 14. März 1879 – 18. April 1955) war ein deutscher theoretischer Physiker, der allgemein als einer der größten Physiker aller Zeiten anerkannt wurde. Einstein ist am besten für die Entwicklung der Relativitätstheorie bekannt, aber er leistete auch wichtige Beiträge zur Entwicklung der Quantenmechaniktheorie. Relativität und Quantenmechanik sind zusammen die beiden Säulen der modernen Physik. Seine Massenenergieäquivalenzformel E = mc2, die aus der Relativitätstheorie hervorgeht, wurde als „die berühmteste Gleichung der Welt" bezeichnet. Seine Arbeit ist auch für ihren Einfluss auf die Philosophie der Wissenschaft bekannt. Er erhielt 1921 den Nobelpreis für Physik „für seine Verdienste um die theoretische Physik und vor allem für seine Entdeckung des Gesetzes über den photoelektrischen Effekt", einen entscheidenden Schritt in der Entwicklung der Quantentheorie. Seine intellektuellen Leistungen und Originalität führten dazu, dass „Einstein" zum Synonym für „Genius" wur

In [None]:
# let's change target language
src = "en"
dst = "ar"

# get en-ar model & tokenizer
model, tokenizer = get_translation_model_and_tokenizer(src, dst)

Downloading:   0%|          | 0.00/44.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.12k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/782k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/895k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/2.02M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/293M [00:00<?, ?B/s]

In [None]:
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

In [None]:
def get_translation_model_and_tokenizer(src_lang, dst_lang):
  """
  Given the source and destination languages, returns the appropriate model
  See the language codes here: https://developers.google.com/admin-sdk/directory/v1/languages
  For the 3-character language codes, you can google for the code!
  """
  # construct our model name
  model_name = f"Helsinki-NLP/opus-mt-{src}-{dst}"
  # initialize the tokenizer & model
  tokenizer = AutoTokenizer.from_pretrained(model_name)
  model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
  # return them for use
  return model, tokenizer

In [None]:
# yet another example
text = """Albert Einstein ( 14 March 1879 – 18 April 1955) was a German-born theoretical physicist, widely acknowledged to be one of the greatest physicists of all time. 
Einstein is best known for developing the theory of relativity, but he also made important contributions to the development of the theory of quantum mechanics. 
Relativity and quantum mechanics are together the two pillars of modern physics."""
# tokenize the text
inputs = tokenizer.encode(text, return_tensors="pt", max_length=512, truncation=True)
# this time we use 5 beams and return 5 sequences and we can compare!
beam_outputs = model.generate(
    inputs, 
    num_beams=5, 
    num_return_sequences=5,
    early_stopping=True,
)
for i, beam_output in enumerate(beam_outputs):
  print(tokenizer.decode(beam_output, skip_special_tokens=True))
  print("="*50)

كان ألبرت آينشتاين (14 مارس 1879 ــ 18 أبريل 1955) فيزيائياً نظرياً من أصل ألماني، معترفاً به على نطاق واسع باعتباره واحداً من أعظم علماء الفيزياء على الإطلاق. ويعرف آينشتاين على أفضل وجه بتطوير نظرية النسبية، ولكنه قدم أيضاً إسهامات مهمة في تطوير نظرية ميكانيكا الكم. والواقع أن النسبية وميكانيكا الكم تشكلان معاً دعامتي الفيزياء الحديثة.
كان ألبرت آينشتاين (14 مارس 1879 ــ 18 أبريل 1955) فيزيائياً نظرياً من أصل ألماني، معترفاً به على نطاق واسع باعتباره واحداً من أعظم علماء الفيزياء على الإطلاق. ويعرف آينشتاين على أفضل وجه بتطوير نظرية النسبية، ولكنه قدم أيضاً إسهامات مهمة في تطوير نظرية ميكانيكا الكم. والواقع أن النسبية وميكانيكا الكم تشكلان معاً دعامتين للفيزياء الحديثة.
كان ألبرت آينشتاين (14 مارس 1879 ــ 18 أبريل 1955) فيزيائياً نظرياً من أصل ألماني، معترفاً به على نطاق واسع باعتباره واحداً من أعظم علماء الفيزياء على الإطلاق. ويعرف آينشتاين على أفضل وجه بتطوير نظرية النسبية، ولكنه قدم أيضاً مساهمات مهمة في تطوير نظرية ميكانيكا الكم. والواقع أن النسبية وميكانيكا الكم تشكلان معاً دعام

**Text Generation**

In [None]:
from transformers import pipeline

In [None]:
# download & load GPT-2 model
gpt2_generator = pipeline('text-generation', model='gpt2')

Downloading:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/523M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/0.99M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

In [None]:
# generate 3 different sentences
# results are sampled from the top 50 candidates
sentences = gpt2_generator("Machine learning is revolutionary", do_sample=True, top_k=50, temperature=0.6, max_length=128, num_return_sequences=3)
for sentence in sentences:
  print(sentence["generated_text"])
  print("="*50)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Machine learning is revolutionary and we are excited to present it to you!"

The new project, called "Data-Driven Learning," is being developed by a team of researchers from the University of Michigan and the University of Michigan in collaboration with the University's Center for the Study of Learning. It uses neural networks to learn to read a single text and generate a series of images. The images are then used to predict the future of a given word, and then their predictions are combined with predictions of the actual words.

In the future, researchers hope to use this approach to develop new tools for learning, such as automatic algorithms that
Machine learning is revolutionary, and Google has been working on it for some time.

Google recently unveiled a new version of Google Assistant that will let you control your Google Assistant's tasks in real-time.

It's not the first time Google has embraced the new Google Assistant. Earlier this year, Google announced that it was creating 

**Sentence Embedding**

In [None]:
!pip install sentence-transformers

Collecting sentence-transformers
  Downloading sentence-transformers-2.2.0.tar.gz (79 kB)
[?25l[K     |████▏                           | 10 kB 34.5 MB/s eta 0:00:01[K     |████████▎                       | 20 kB 21.8 MB/s eta 0:00:01[K     |████████████▍                   | 30 kB 14.7 MB/s eta 0:00:01[K     |████████████████▌               | 40 kB 13.7 MB/s eta 0:00:01[K     |████████████████████▋           | 51 kB 6.1 MB/s eta 0:00:01[K     |████████████████████████▊       | 61 kB 7.1 MB/s eta 0:00:01[K     |████████████████████████████▉   | 71 kB 8.0 MB/s eta 0:00:01[K     |████████████████████████████████| 79 kB 4.5 MB/s 
Building wheels for collected packages: sentence-transformers
  Building wheel for sentence-transformers (setup.py) ... [?25l[?25hdone
  Created wheel for sentence-transformers: filename=sentence_transformers-2.2.0-py3-none-any.whl size=120747 sha256=c381d96c97506820577451ae5a9407e5222ccfd47a97c51c4583c43222c422a2
  Stored in directory: /root/.ca

In [None]:
from sentence_transformers import SentenceTransformer
model_st = SentenceTransformer('distilroberta-base')
embeddings = model_st.encode("The price of gas has increased tremendously")
print(embeddings)

Downloading:   0%|          | 0.00/391 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.88k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/480 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/603k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/331M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/899k [00:00<?, ?B/s]

Some weights of the model checkpoint at /root/.cache/torch/sentence_transformers/distilroberta-base were not used when initializing RobertaModel: ['lm_head.bias', 'lm_head.layer_norm.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.bias', 'lm_head.dense.weight', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[-3.68155837e-02  7.17203915e-02 -3.20145674e-02 -9.08042490e-02
  3.46757919e-02 -6.59967586e-02 -6.04955629e-02 -5.56772016e-02
 -9.17435884e-02  2.19683591e-02  4.65163104e-02 -7.55525902e-02
  2.75363307e-02 -4.68752235e-02  1.50491931e-02 -3.63460518e-02
  7.36383572e-02 -8.22472945e-02  1.24368165e-02 -8.28229338e-02
  5.52632567e-03  7.52749443e-02  1.61769390e-02 -2.45521637e-03
 -4.48122621e-02 -5.78281470e-02 -9.33944713e-03  6.13934062e-02
  8.09941664e-02  1.97537038e-02 -5.26257716e-02 -3.95917520e-02
  9.21046734e-02  8.45851656e-03 -1.77083880e-01  5.46662658e-02
  1.21665254e-01  6.29225150e-02  7.14835376e-02  7.00728670e-02
  7.21897185e-02  1.18014731e-01  2.80837547e-02 -4.56988905e-03
 -5.62901469e-03 -2.17476133e-02 -1.69781893e-01 -5.36071733e-02
  9.22003463e-02  2.43459586e-02  6.82989927e-03  4.56155352e-02
 -2.38097869e-02  4.28325608e-02 -6.87821135e-02 -7.48025030e-02
 -2.34069526e-02  1.74484342e-01  8.74559656e-02  3.46980840e-02
 -1.23350872e-02 -3.67255