## Experiment with Hugging Face Transformers

In [36]:
text = """Having served on the COVID Vaccine Development committee at Moderna, USA, \
    Dr. Nader was involved in the fight against the pandemic of the century. As \
    the race was on to develop a vaccine – the ultimate defense against a virus \
    of which little was known – what helped to expedite the process at the \
    pharmaceutical and biotechnology company was the availability of the \
    technology – messenger RNA – which had been 10 years in the making.\
    The development of vaccines in record time encapsulates the prerequisites \
    for discovery: research, technology, anticipation and inquiring minds, skills \
    that should be fostered in education."""

### Text Completion

Once you execute the below code, notice in the score in the output.  The highest the score, the higher the probability of that output being selected!

In [37]:
from transformers import pipeline 

# specifying the pipeline
bert_unmasker = pipeline('fill-mask', model="bert-base-uncased")
text = "I have to wake up in the morning and [MASK] a doctor"
result = bert_unmasker(text)
for r in result:
    print(r)

All PyTorch model weights were used when initializing TFBertForMaskedLM.

All the weights of TFBertForMaskedLM were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForMaskedLM for predictions without further training.
Device set to use 0


{'score': 0.6457411050796509, 'token': 2156, 'token_str': 'see', 'sequence': 'i have to wake up in the morning and see a doctor'}
{'score': 0.178335040807724, 'token': 2655, 'token_str': 'call', 'sequence': 'i have to wake up in the morning and call a doctor'}
{'score': 0.07508038729429245, 'token': 2424, 'token_str': 'find', 'sequence': 'i have to wake up in the morning and find a doctor'}
{'score': 0.05682653933763504, 'token': 2131, 'token_str': 'get', 'sequence': 'i have to wake up in the morning and get a doctor'}
{'score': 0.0068956902250647545, 'token': 2022, 'token_str': 'be', 'sequence': 'i have to wake up in the morning and be a doctor'}


### Text Classification

The below will be classified the above text as positive.  Can you change that?

In [38]:
#hide_output
from transformers import pipeline

classifier = pipeline("text-classification")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
All PyTorch model weights were used when initializing TFDistilBertForSequenceClassification.

All the weights of TFDistilBertForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForSequenceClassification for predictions without further training.
Device set to use 0


In [39]:
import pandas as pd

outputs = classifier(text)
pd.DataFrame(outputs)    

Unnamed: 0,label,score
0,NEGATIVE,0.989723


In [40]:
negative_result = classifier("The vaccine development process was a complete failure and caused more harm than good.")
print("Negative Classification:", negative_result)

Negative Classification: [{'label': 'NEGATIVE', 'score': 0.9997884631156921}]


### Named Entity Recognition

NER involves detecting and categorizing information in text known as named entities. Named entities refer to the key subjects of a piece of text, such as names, locations, companies, events and products, as well as themes, topics, times, monetary values and percentages.

In [41]:
ner_tagger = pipeline("ner", aggregation_strategy="simple")
outputs = ner_tagger(text)
pd.DataFrame(outputs)    

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
All PyTorch model weights were used when initializing TFBertForTokenClassification.

All the weights of TFBertForTokenClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForTokenClassification for predictions without further training.
Device set to use 0


In [None]:
from transformers import pipeline

ner_pipeline = pipeline("ner", model="dbmdz/bert-large-cased-finetuned-conll03-english", aggregation_strategy="simple")

entities = ner_tagger(text)

import pandas as pd
pd.DataFrame(outputs)

All PyTorch model weights were used when initializing TFBertForTokenClassification.

All the weights of TFBertForTokenClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForTokenClassification for predictions without further training.
Device set to use 0


Unnamed: 0,generated_text
0,I have to wake up in the morning and [MASK] a ...


### Question Answering 

In [43]:
reader = pipeline("question-answering")
question = "What was Dr. Nader involved in?"
outputs = reader(question=question, context=text)
pd.DataFrame([outputs])    

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
All PyTorch model weights were used when initializing TFDistilBertForQuestionAnswering.

All the weights of TFDistilBertForQuestionAnswering were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForQuestionAnswering for predictions without further training.
Device set to use 0


Unnamed: 0,score,start,end,answer
0,0.576397,44,52,a doctor


### Summarization

In [44]:
summarizer = pipeline("summarization")
outputs = summarizer(text, max_length=45, clean_up_tokenization_spaces=True)
print(outputs[0]['summary_text'])

No model was supplied, defaulted to google-t5/t5-small and revision df1b051 (https://huggingface.co/google-t5/t5-small).
Using a pipeline without specifying a model name and revision in production is not recommended.
All PyTorch model weights were used when initializing TFT5ForConditionalGeneration.

All the weights of TFT5ForConditionalGeneration were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.
Device set to use 0
Your max_length is set to 45, but your input_length is only 19. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=9)


i wake up in the morning and a doctor. i have to wake up and wake up a day before. the doctor is in the hospital.


well, the summary doesn't look good, it seem to copy some text from the original paragraph. I don't know if the issue is from my computer as it is taking a lot of time to compile and giving "No model was supplied..." warrning 

### Translation

The below will use a German translation model.  Can you change this to French?  Google will be your best friend in this task :-)

In [45]:
translator = pipeline("translation_en_to_de", 
                      model="Helsinki-NLP/opus-mt-en-de")
outputs = translator(text, clean_up_tokenization_spaces=True, min_length=100)
print(outputs[0]['translation_text'])

All model checkpoint layers were used when initializing TFMarianMTModel.

All the layers of TFMarianMTModel were initialized from the model checkpoint at Helsinki-NLP/opus-mt-en-de.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFMarianMTModel for predictions without further training.
Device set to use 0


Ich muss morgen früh aufwachen und [MASK] einen Arzt haben, um zu sehen, was ich tun kann, um zu sehen, was ich tun kann, um zu sehen, was ich tun kann, und um zu sehen, was ich tun kann, wenn ich weiß, was ich tun kann, was ich tun kann, wenn ich weiß, was ich tun kann, wenn ich weiß, was ich tun kann, wenn ich weiß, was ich tue, was ich tun kann, wenn ich weiß, was ich tue.


Fernch trans.

In [46]:
translator = pipeline("translation_en_to_fr", 
                      model="Helsinki-NLP/opus-mt-en-fr")
outputs = translator(text, clean_up_tokenization_spaces=True, min_length=100)
print(outputs[0]['translation_text'])

All model checkpoint layers were used when initializing TFMarianMTModel.

All the layers of TFMarianMTModel were initialized from the model checkpoint at Helsinki-NLP/opus-mt-en-fr.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFMarianMTModel for predictions without further training.
Device set to use 0


Je dois me réveiller le matin et [MASK] un médecin, je dois me réveiller le matin, et je dois me réveiller le matin, et [MASK] un médecin. Je dois me réveiller le matin et je dois me réveiller le matin, je dois me réveiller le matin, et je dois me réveiller le matin, je dois me réveiller le matin, je dois me réveiller le matin, je dois me réveiller le matin, je dois me réveiller le matin, je dois me réveiller le matin, je dois me réveiller le matin, je dois me réveiller le matin, je dois me réveiller le matin, je dois me réveiller le matin, je dois me réveiller le matin, je dois me réveiller le matin, je dois me réveiller le matin, je dois me réveiller le matin, je dois me réveiller le matin, je dois me réveiller le matin, je dois me réveiller le matin, je dois me réveiller le matin, je dois me réveiller le matin, je dois me réveiller le matin, je dois me réveiller le matin, je dois me réveiller le matin, je dois me réveiller le matin, je dois me réveiller le matin, je dois me réveille

### Text Generation

In [47]:
#hide
from transformers import set_seed
set_seed(42) # Set the seed to get reproducible results

In [48]:
generator = pipeline("text-generation")
response = "Dear Dr. Nader, Thank you for working on the vaccine."
prompt = text + "\n\nResponse to the story:\n" + response
outputs = generator(prompt, max_length=500)
print(outputs[0]['generated_text'])

No model was supplied, defaulted to openai-community/gpt2 and revision 607a30d (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
All PyTorch model weights were used when initializing TFGPT2LMHeadModel.

All the weights of TFGPT2LMHeadModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.
Device set to use 0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


I have to wake up in the morning and [MASK] a doctor

Response to the story:
Dear Dr. Nader, Thank you for working on the vaccine. I recently received the results of the study, and there have been several important developments. One is [MIND]. In the early stages, most people had only mild side effects and no one had seen any signs. Now the majority, if they had, has seen a significant number of side effects. [MIND] is very much the end of that side effects study. So [MIND] will not be in the vaccine. We want people to have it before the vaccine is in the body. And, because the studies have been very close, I did not expect to see any of that [MIND] coming. It did in these last one. But the next one will be a bit of a test before we start taking it. It is going to take a good amount of time, in retrospect. I think it is important. I think it will need to be tested. So in fact, I hope [MIND] will come back in the vaccine when we are ready to go.

Source: Reuters

Response to news storie