In [4]:
import transformers

#Set to avoid warning messages.
transformers.logging.set_verbosity_error()

## 03.02. Content Creation

In [5]:
from transformers import pipeline

text_generator = pipeline("text-generation", 
                          model="gpt2")
transformers.set_seed(1)

input_text="Natural Language Processing is a \
growing domain in machine learning"

synthetic_text=text_generator(input_text,
                              num_return_sequences=3,
                              max_new_tokens=50)

for text in synthetic_text:
    print(text.get("generated_text") ,"\n-----------------")
    



Natural Language Processing is a growing domain in machine learning applications. The purpose of this project is to provide a scalable way to develop a variety of languages on top of neural net technologies. Some applications are being developed that use the underlying neural network and can easily integrate with existing applications. These are shown below demonstrating 
-----------------
Natural Language Processing is a growing domain in machine learning and there are numerous applications available in the field of AI in general. With the current state of AI research in AI, we are exploring these topics, including:

Using AI for Machine Learning Science/Sustainability

Using Machine Learning in 
-----------------
Natural Language Processing is a growing domain in machine learning. This means that every new task, even one you've never thought to create, may be used as a foundation for generating a more comprehensive set of AI-based machine learning algorithms. But the problem of human

## 03.04. Chatbot Conversation Example

In [6]:
from transformers import  Conversation

conversational_pipeline = pipeline("conversational", 
                                   model="facebook/blenderbot_small-90M")

print(conversational_pipeline.model.config)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Downloading (…)lve/main/config.json: 100%|██████████| 1.51k/1.51k [00:00<00:00, 748kB/s]
Downloading pytorch_model.bin: 100%|█████████████████| 350M/350M [00:23<00:00, 14.7MB/s]
Downloading (…)neration_config.json: 100%|██████████████| 311/311 [00:00<00:00, 137kB/s]
Downloading (…)okenizer_config.json: 100%|█████████████| 205/205 [00:00<00:00, 86.3kB/s]
Downloading (…)olve/main/vocab.json: 100%|███████████| 964k/964k [00:00<00:00, 3.39MB/s]
Downloading (…)olve/main/merges.txt: 100%|███████████| 345k/345k [00:00<00:00, 2.91MB/s]
Downloading (…)cial_tokens_map.json: 100%|███████████| 99.0/99.0 [00:00<00:00, 39.6kB/s]

BlenderbotSmallConfig {
  "_name_or_path": "facebook/blenderbot_small-90M",
  "activation_dropout": 0.0,
  "activation_function": "gelu",
  "add_bias_logits": false,
  "add_final_layer_norm": false,
  "architectures": [
    "BlenderbotSmallForConditionalGeneration"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "classif_dropout": 0.0,
  "classifier_dropout": 0.0,
  "d_model": 512,
  "decoder_attention_heads": 16,
  "decoder_ffn_dim": 2048,
  "decoder_layerdrop": 0.0,
  "decoder_layers": 8,
  "decoder_start_token_id": 1,
  "do_blenderbot_90_layernorm": true,
  "dropout": 0.1,
  "encoder_attention_heads": 16,
  "encoder_ffn_dim": 2048,
  "encoder_layerdrop": 0.0,
  "encoder_layers": 8,
  "eos_token_id": 2,
  "extra_pos_embeddings": 0,
  "force_bos_token_to_be_generated": false,
  "forced_eos_token_id": 2,
  "gradient_checkpointing": false,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2"
  },
  "init_std": 0.02,
  "is_encoder_decoder": true,
  "label2id




In [7]:
#Sample inputs
first_input="Do you have any hobbies?"
second_input = "I like to watch movies"
third_input = "action movies"

#Create a context
bot_conversation = Conversation(first_input)

print("\nFirst Exchange: \n--------------------")

conversational_pipeline(bot_conversation)
print(" User Input:", bot_conversation.past_user_inputs[0])
print(" Bot Output:", bot_conversation.generated_responses[0])

print("\nSecond Exchange: \n--------------------")
bot_conversation.add_user_input(second_input)
conversational_pipeline(bot_conversation)

print(" User Input:", bot_conversation.past_user_inputs[1])
print(" Bot Output:", bot_conversation.generated_responses[1])

print("\nThird Exchange: \n--------------------")
bot_conversation.add_user_input(third_input)
conversational_pipeline(bot_conversation)

print(" User Input:", bot_conversation.past_user_inputs[2])
print(" Bot Output:", bot_conversation.generated_responses[1])

print("\nAccessing All Responses: ")
print(bot_conversation)



First Exchange: 
--------------------
 User Input: Do you have any hobbies?
 Bot Output: yes, i love going to the beach. what about you? do you have any hobbies?

Second Exchange: 
--------------------
 User Input: I like to watch movies
 Bot Output: i love going to the beach. i also like to watch movies. what kind of movies do you like?

Third Exchange: 
--------------------
 User Input: action movies
 Bot Output: i love going to the beach. i also like to watch movies. what kind of movies do you like?

Accessing All Responses: 
Conversation id: c6a0d092-edb9-4a30-8587-9592cc56005f 
user >> Do you have any hobbies? 
bot >> yes, i love going to the beach. what about you? do you have any hobbies? 
user >> I like to watch movies 
bot >> i love going to the beach. i also like to watch movies. what kind of movies do you like? 
user >> action movies 
bot >> i love going to the beach as well. i like action movies as well, but i don't get to see them often. what's your favorite action movie? 

## 03.06. Translating with Hugging Face

In [8]:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model = AutoModelForSeq2SeqLM.from_pretrained("t5-base")
tokenizer = AutoTokenizer.from_pretrained("t5-base")

source_english="Acme is a technology company based in New York and Paris"

inputs_german = tokenizer(
    "translate English to German: " + source_english,
    return_tensors="pt",
)
outputs_german = model.generate(
    inputs_german["input_ids"], 
    max_length=40)

print("German Translation: ",
      tokenizer.decode(outputs_german[0], 
                       skip_special_tokens=True))

inputs_french = tokenizer(
    "translate English to French: " + source_english, 
    return_tensors="pt",
)
outputs_french = model.generate(
    inputs_french["input_ids"], 
    max_length=40)

print("French Translation: ", 
      tokenizer.decode(outputs_french[0], 
                       skip_special_tokens=True))

Downloading (…)lve/main/config.json: 100%|██████████| 1.21k/1.21k [00:00<00:00, 456kB/s]
Downloading pytorch_model.bin: 100%|█████████████████| 892M/892M [00:53<00:00, 16.7MB/s]
Downloading (…)neration_config.json: 100%|█████████████| 147/147 [00:00<00:00, 31.0kB/s]
Downloading (…)ve/main/spiece.model: 100%|███████████| 792k/792k [00:00<00:00, 11.4MB/s]
Downloading (…)/main/tokenizer.json: 100%|█████████| 1.39M/1.39M [00:00<00:00, 3.14MB/s]
For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-base automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.


German Translation:  Acme ist ein Technologieunternehmen mit Sitz in New York und Paris.
French Translation:  Acme est une société technologique basée à New York et à Paris.
