# the first model

### prepare the model

In [84]:
# https://huggingface.co/facebook/bart-large-mnli

from transformers import pipeline
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")

### prepare the sequance

In [85]:
sequence_to_classify = "one day I will see the world"

### applying the classifier 1

In [86]:
candidate_labels = ['politics', 'science', 'life style', 'revolution', 'reality']
classifier(sequence_to_classify, candidate_labels)

{'sequence': 'one day I will see the world',
 'labels': ['life style', 'reality', 'revolution', 'science', 'politics'],
 'scores': [0.45578545331954956,
  0.27890530228614807,
  0.1296178549528122,
  0.08806750923395157,
  0.04762383922934532]}

### applying the classifier 2

In [88]:
candidate_labels = ['travel', 'cooking', 'dancing', 'exploration']
classifier(sequence_to_classify, candidate_labels, multi_class=True)

The `multi_class` argument has been deprecated and renamed to `multi_label`. `multi_class` will be removed in a future version of Transformers.


{'sequence': 'one day I will see the world',
 'labels': ['travel', 'exploration', 'dancing', 'cooking'],
 'scores': [0.994511067867279,
  0.9383882284164429,
  0.0057061817497015,
  0.00181929103564471]}

### the second model is a conversational model

In [89]:
# https://huggingface.co/microsoft/DialoGPT-medium
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch


tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-medium")
model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-medium")

# Let's chat for 5 lines
for step in range(5):
    # encode the new user input, add the eos_token and return a tensor in Pytorch
    new_user_input_ids = tokenizer.encode(input(">> User:") + tokenizer.eos_token, return_tensors='pt')

    # append the new user input tokens to the chat history
    bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids

    # generated a response while limiting the total chat history to 1000 tokens, 
    chat_history_ids = model.generate(bot_input_ids, max_length=1000, pad_token_id=tokenizer.eos_token_id)

    # pretty print last ouput tokens from bot
    print("DialoGPT: {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))

>> User:hi


A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


DialoGPT: Hey! :D
>> User:how are you


A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


DialoGPT: I'm good, how are you?
>> User:i'm good too. it is a good day, yeah?


A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


DialoGPT: It is indeed.
>> User:glad to talk with you.


A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


DialoGPT: I'm glad to talk with you too.
>> User:bye


A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


DialoGPT: Bye! :D


# the third one is a Text2Text Generation model

In [90]:
# https://huggingface.co/mrm8488/t5-base-finetuned-question-generation-ap
# Tip: By now, install transformers from source

from transformers import AutoModelWithLMHead, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-question-generation-ap")
model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-question-generation-ap")

def get_question(answer, context, max_length=64):
  input_text = "answer: %s  context: %s </s>" % (answer, context)
  features = tokenizer([input_text], return_tensors='pt')

  output = model.generate(input_ids=features['input_ids'], 
               attention_mask=features['attention_mask'],
               max_length=max_length)

  return tokenizer.decode(output[0])


The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.


In [91]:
context = "\"This could really change how we think about the environments in which life first originated,\" \
said Professor Nick Tosca from the University of Cambridge, who was one of the authors of the study. \
The research, which was headed by University of Cambridge Ph.D. student Matthew Brady, \
reveals that early seawater may have carried 1,000–10,000 times more phosphate than previously thought, \
provided the water contained a lot of iron. Phosphate is a crucial component of DNA and RNA, \
which are the building blocks of life, \
although it is one of the least common elements in the universe relative to its biological significance. \
Phosphate is also relatively inaccessible in its mineral form – \
it can be difficult to dissolve in water so that life can utilize it."

answer = "iron"

get_question(answer, context)

'<pad> question: What element did the study reveal that seawater contained more phosphate than previously thought?</s>'