## What is generative AI?
Imagine presenting a computer with a vast array of paintings. After analyzing them, it tries to craft a unique painting of its own. This capability is termed generative AI. Essentially, the computer derives inspiration from the provided content and uses it to create something new.

## Real-world impact of generative AI
Generative AI is transforming multiple industries. Its applications span:

### 1. Art and creativity
- Generative art: Artists employing generative AI algorithms can create stunning artworks by learning from existing masterpieces and producing unique pieces inspired by them. These AI-generated artworks have gained recognition in the art world.
- Music Composition: Projects in the realm of generative AI have been employed to compose music. They learn from a vast data set of musical compositions and can generate original pieces in various styles, from classical to jazz, revolutionizing the music industry.

### 2. Natural language processing (NLP)
- Content generation: Tools like generative pre-trained transformer (GPT) have demonstrated their ability to generate coherent and context-aware text. They can assist content creators by generating articles, stories, or marketing copy, making them valuable tools in content creation.
- Chatbots and virtual assistants: Generative AI powers many of today's chatbots and virtual assistants. These AI-driven conversational agents understand and generate human-like responses, enhancing user experiences.
- Code Writing: Generative AI models can also produce code snippets based on descriptions or requirements, streamlining software development.

### 3. Computer vision
- Image synthesis: Models like data analysis learning with language model for generation and exploration, frequencly known as DALL-E, can generate images from textual descriptions. This technology finds applications in graphic design, advertising, and creating visual content for marketing.
- Deepfake detection: With the advancement in generative AI techniques, the generation of deep fake content is also on the rise. Consequently, generative AI now plays a role in developing tools and techniques to detect and combat the spread of misinformation through manipulated videos.

### 4. Virtual avatars
- Entertainment: Generative AI is utilized to craft virtual avatars for gaming and entertainment. These avatars mimic human expressions and emotions, bolstering user engagement in virtual environments.
- Marketing: Virtual influencers, propelled by generative AI, are on the rise in digital marketing. Brands are harnessing these virtual personas to endorse their products and services.

## Neural structures behind generative AI
Before we had the powerful transformers, which are like super-fast readers and understand lots of words at once, there were other methods used for making computers generate text. These methods were like the building blocks that led to the amazing capabilities we have today.

## Large language models (LLMs)
Large language models are like supercharged brains. They are massive computer programs with lots of "neurons" that learn from huge amounts of text. These models are trained to do tasks like understanding and generating text, and they're used in many applications. However, there's a limitation: these models are not very good at understanding the bigger context or the meaning of words. They work well for simple predictions but struggle with more complex text.

## Text generation before transformers

### 1. N-gram language models
N-gram models are like language detectives. They predict what words come next in a sentence based on the words that came before. For example, if you say "The sky is," these models guess that the next word might be "blue."

### 2. Recurrent neural networks (RNN)
Recurrent neural networks (RNNs) are specially designed to handle sequential data, making them a powerful tool for applications like language modeling and time series forecasting. The essence of their design lies in maintaining a 'memory' or 'hidden state' throughout the sequence by employing loops. This enables RNNs to recognize and capture the temporal dependencies inherent in sequential data.
- Hidden state: Often referred to as the network's 'memory', the hidden state is a dynamic storage of information about previous sequence inputs. With each new input, this hidden state is updated, factoring in both the new input and its previous value.
- Temporal dependency: Loops in RNNs enable information transfer across sequence steps.

<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX0J87EN/%E9%80%92%E5%BD%92%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C%E5%9B%BE.png" width="60%" height="60%"> 

<div style="text-align:center"><a href="https://commons.wikimedia.org/wiki/File:%E9%80%92%E5%BD%92%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C%E5%9B%BE.png">Image Source</a></div>

Illustration of RNN's operation: Consider a simple sequence, such as the sentence: "I love RNNs". The RNN interprets this sentence word by word. Beginning with the word "I", the RNN ingests it, generates an output, and updates its hidden state. Moving on to "love", the RNN processes it alongside the updated hidden state which already holds insights about the word "I". The hidden state is updated again post this. This pattern of processing and updating continues until the last word. By the end of the sequence, the hidden state ideally encapsulates insights from the entire sentence.
                                                                                                       
### 3. Long short-term memory (LSTM) and gated recurrent units (GRUs)
Long short-term memory (LSTM) and gated recurrent units (GRUs) are advanced variations of recurrent neural networks (RNNs), designed to address the limitations of traditional RNNs and enhance their ability to model sequential data effectively. They processed sequences one element at a time and maintained an internal state to remember past elements. While they were effective for a variety of tasks, they struggled with long sequences and long-term dependencies.

### 4. Seq2seq models with attention
- Sequence-to-sequence (seq2seq) models, often built with RNNs or LSTMs, were designed to handle tasks like translation where an input sequence is transformed into an output sequence.
- The attention mechanism was introduced to allow the model to "focus" on relevant parts of the input sequence when generating the output, significantly improving performance on tasks like machine translation.

While these methods provided significant advancements in text generation tasks, the introduction of transformers led to a paradigm shift. Transformers, with their self-attention mechanism, proved to be highly efficient at capturing contextual information across long sequences, setting new benchmarks in various NLP tasks.

## Transformers
Proposed in a paper titled "Attention Is All You Need" by Vaswani et al. in 2017, the transformer architecture replaced sequential processing with parallel processing. The key component behind its success? The attention mechanism, more precisely, self-attention.

Key steps include:
- Tokenization: The first step is breaking down a sentence into tokens (words or subwords).
- Embedding: Each token is represented as a vector, capturing its meaning.
- Self-attention: The model computes scores determining the importance of every other word for a particular word in the sequence. These scores are used to weight the input tokens and produce a new representation of the sequence. For instance, in the sentence "He gave her a gift because she'd helped him", understanding who "her" refers to requires the model to pay attention to other words in the sentence. The transformer does this for every word, considering the entire context, which is particularly powerful for understanding meaning.
- Feed-forward neural networks: After attention, each position is passed through a feed-forward network separately.
- Output sequence: The model produces an output sequence, which can be used for various tasks, like classification, translation, or text generation.
- Layering: Importantly, transformers are deep models with multiple layers of attention and feed-forward networks, allowing them to learn complex patterns.

The architecture's flexibility has allowed transformers to be used beyond NLP, finding applications in image and video processing too. In NLP, transformer-based models like BERT, GPT, and their variants have set state-of-the-art results in various tasks, from text classification to translation.

### Implementation: Building a simple chatbot with transformers
Now, you will build a simple chatbot using `transformers` library from Hugging Face, which is an open-source natural language processing (NLP) toolkit with many useful features.
#### Step 1: Installing libraries


In [1]:
# !pip install -qq tensorflow
# !pip install -qq transformers
# !pip install sentencepiece

#### Step 2: Importing the required tools from the transformers library
In the upcoming script, you initiate variables using two invaluable classes from the transformers library:
- `model` is an instance of the class `AutoModelForSeq2SeqLM`. This class lets you interact with your chosen language model.
- `tokenizer` is an instance of the class `AutoTokenizer`. This class streamlines your input and presents it to the language model in the most efficient manner. It achieves this by converting your text input into "tokens", which is the model's preferred way of interpreting text.
You choose "facebook/blenderbot-400M-distill" for this example model because it is freely available under an open-source license and operates at a relatively brisk pace. For a diverse range of models and their capabilities, you can explore the Hugging Face website: [Hugging Face Models](https://huggingface.co/models).


In [2]:
# Press Tab for auto suggestion
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

print("Import Successful!")

Import Successful!


In [3]:
# Get helper from docstring
help(AutoTokenizer)

Help on class AutoTokenizer in module transformers.models.auto.tokenization_auto:

class AutoTokenizer(builtins.object)
 |  This is a generic tokenizer class that will be instantiated as one of the tokenizer classes of the library when
 |  created with the [`AutoTokenizer.from_pretrained`] class method.
 |  
 |  This class cannot be instantiated directly using `__init__()` (throws an error).
 |  
 |  Methods defined here:
 |  
 |  __init__(self)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  register(config_class, slow_tokenizer_class=None, fast_tokenizer_class=None, exist_ok=False)
 |      Register a new tokenizer in this mapping.
 |      
 |      
 |      Args:
 |          config_class ([`PretrainedConfig`]):
 |              The configuration corresponding to the model to register.
 |          slow_tokenizer_class ([`PretrainedTokenizer`], *optional*):
 |              The slow tokenizer to register.
 |          fast_tokenizer_class ([`PretrainedTokeni

In [4]:
# Selecting the model. You will be using "facebook/blenderbot-400M-distill" in this example.
# Link https://huggingface.co/facebook/blenderbot-400M-distill
model_name = "facebook/blenderbot-400M-distill"

# Load the model and tokenizer by model_name
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

config.json:   0%|          | 0.00/1.57k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/730M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/347 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/127k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/62.9k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/16.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/310k [00:00<?, ?B/s]



Here’s an explanation of each file downloaded from Hugging Face when you retrieve a model:

1. **config.json**: This file contains the model's configuration parameters, including the architecture type, number of layers, hidden units, attention heads, etc. It’s used to instantiate the model without needing to re-specify its architecture.

2. **pytorch_model.bin**: This is the main file containing the pre-trained model weights. It's typically in PyTorch format (hence the `.bin` extension) and includes all the learned parameters of the model, such as the weights and biases.

3. **generation_config.json**: If the model is used for text generation tasks, this file contains configuration details specific to generation settings, like maximum output length, top-k sampling, temperature, and other parameters influencing how the model generates text.

4. **tokenizer_config.json**: This file contains settings related to the tokenizer, such as the type of tokenizer, preprocessing options (like whether to convert text to lowercase), or handling of unknown tokens.

5. **vocab.json**: This file holds the mapping between tokens (typically subword units) and their corresponding IDs. It’s an essential component of the tokenizer, defining the vocabulary the model works with.

6. **merges.txt**: Used in conjunction with the `vocab.json` file for models using Byte Pair Encoding (BPE). This file lists the merging rules for combining tokens into subwords based on frequency in the training data.

7. **added_tokens.json**: If any additional tokens were added beyond the standard vocabulary (e.g., special tokens for certain tasks), they are defined here.

8. **special_tokens_map.json**: This file maps special tokens (such as `<PAD>`, `<CLS>`, `<SEP>`, `<MASK>`, etc.) to their corresponding token IDs. These tokens play special roles during the model's training and inference processes.

9. **tokenizer.json**: This is a consolidated file that includes information about the tokenizer, combining vocab, merges, and other tokenization rules into one JSON file. It is a more efficient format that Hugging Face’s `transformers` library can load directly.

Each of these files serves a specific role in ensuring that the model can be used effectively for inference, training, or fine-tuning with the proper configuration and tokenization settings.

Following the initialization, let's set up the chat function to enable real-time interaction with the chatbot.


In [5]:
# Define the chat function
def chat_with_bot():
    while True:
        # Get user input
        input_text = input("You: ")

        # Exit conditions
        if input_text.lower() in ["quit","exit","byte"]:
            print("Chatbot: Goodbye!")
            break

        # Tokenize input and generate response
        inputs = tokenizer.encode(input_text, return_tensors = "pt")
        outputs = model.generate(inputs, max_new_tokens=150)
        response = tokenizer.decode(outputs[0], skip_special_tokens=True).strip()

        # Display bot's response
        print("Chatbot:", response)

In [6]:
# Start chatting
chat_with_bot()

You:  Hello


Chatbot: Hi, how are you? I just got back from walking my dog. Do you have any pets?


You:  No i don't


Chatbot: I'm sorry to hear that. Do you have any hobbies that you like to do?


You:  Yes i do


Chatbot: What do you like to do in your free time?  I like to play video games.


You:  I like coding


Chatbot: I love coding as well. It is one of my favorite hobbies. What do you like to do in your free time?


You:  quit


Chatbot: Goodbye!


Alright! You have successfully interacted with your chatbot. By providing it with a prompt, the chatbot used the power of the transformers library and the underlying model to generate a response. This exemplifies the prowess of transformer-based models in comprehending and generating human-like text based on a given context. As you continue to engage with it, you will observe its capacity to simulate a wide range of conversational topics and styles.


#### Step 3: Trying another language model and comparing the output


You can use a different language model, for example the "[flan-t5-base](https://huggingface.co/google/flan-t5-base)" model from Google, to create a similar chatbot. You can use a chat function similar to the one defined in Step 2 and compare the outputs of both models.


In [1]:
import sentencepiece
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

print("Import Successful!")

Import Successful!


In [2]:
# Download model https://huggingface.co/google/flan-t5-base
model_name = "google/flan-t5-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]



config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

The key difference between `flan-t5-base` and `t5-base` models on Hugging Face lies in the fine-tuning and pretraining strategies applied to them:

### 1. **T5 (Text-to-Text Transfer Transformer)**:
- **Base Model:** `t5-base` is the original T5 model developed by Google. It is pretrained on the "Colossal Clean Crawled Corpus" (C4) dataset using a **denoising** objective. The key idea behind T5 is that all NLP tasks can be reframed as a text-to-text problem, where the input and output are always text.
- **Capabilities:** T5 can handle a wide range of tasks such as translation, summarization, classification, and question answering by formulating these tasks in the text-to-text format.
- **Training:** The original T5 models are trained with no task-specific data, focusing on general language understanding.

### 2. **Flan-T5 (Instruction-tuned T5)**:
- **Base Model:** `flan-t5-base` is a variant of T5 that has undergone **instruction fine-tuning** on top of the T5 model. The "Flan" in the name refers to Google's "FLAN" (Fine-tuned Language Net), a framework for fine-tuning models on instruction-based tasks.
- **Capabilities:** Flan-T5 is designed to handle instruction-based tasks (e.g., "summarize this text" or "translate this sentence to French") better than T5 because it has been fine-tuned with a variety of instruction-following data.
- **Fine-tuning:** It has been trained on more task-specific datasets, including ones with instructions to make the model better at following natural language prompts or task instructions.

### Summary:
- `t5-base`: The original model trained with a general text-to-text framework but not specifically tuned for instruction-following tasks.
- `flan-t5-base`: A fine-tuned version of T5, optimized for instruction-following by training on specific tasks with instructions.

If you're working on instruction-following tasks, `flan-t5-base` is likely to perform better. For more general-purpose tasks, `t5-base` might still be a good option.

In [3]:
### Let's chat with another bot
def chat_with_another_bot():
    while True:
        # Get user input
        input_text = input("You: ")

        # Exit conditions
        if input_text.lower() in ["quit", "exit", "bye"]:
            print("Chatbot: Goodbye!")
            break

        # Tokenize input and generate response
        inputs = tokenizer.encode(input_text, return_tensors="pt")
        outputs = model.generate(inputs, max_new_tokens=150) 
        response = tokenizer.decode(outputs[0], skip_special_tokens=True).strip()
        
        # Display bot's response
        print("Chatbot:", response)

# Start chatting
chat_with_another_bot()

You:  Hello


Chatbot: Hello, I am a new user on the forum. I am a new user on the forum. I am a new user on the forum. I am a new user on the forum. I am a new user on the forum. I am a new user on the forum. I am a new user on the forum. I am a new user on the forum. I am a new user on the forum. I am a new user on the forum. I am a new user on the forum. I am a new user on the forum. I am a new user on the forum. I am a new user on the forum. I am a new user on the


You:  quit


Chatbot: Goodbye!


There are many language models available in Hugging Face. In the following exercise, you will compare the output for the same input using two different models.


# Exercise

### Create a chatbot using different models from Hugging Face

Create a simple chatbot using the transformers library from Hugging Face(https://huggingface.co/models). Run the code using the following models and compare the output. The models are "[google/flan-t5-small](https://huggingface.co/google/flan-t5-small)", "[bert-base-uncased](https://huggingface.co/bert-base-uncased)". 
(Note: Based on the selected model, you may notice differences in the chatbot output. Multiple factors, such as model training and fine-tuning, influence the output.)


In [4]:
# Download the model https://huggingface.co/google/flan-t5-small
model_name = "google/flan-t5-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

### Let's chat with another bot
def chat_with_another_bot():
    while True:
        # Get user input
        input_text = input("You: ")

        # Exit conditions
        if input_text.lower() in ["quit", "exit", "bye"]:
            print("Chatbot: Goodbye!")
            break

        # Tokenize input and generate response
        inputs = tokenizer.encode(input_text, return_tensors="pt")
        outputs = model.generate(inputs, max_new_tokens=150) 
        response = tokenizer.decode(outputs[0], skip_special_tokens=True).strip()
        
        # Display bot's response
        print("Chatbot:", response)

# Start chatting
chat_with_another_bot()

tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]



config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/308M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

You:  Hello


Chatbot: Hello, Hello. Hello, Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello.


You:  quit


Chatbot: Goodbye!


In [9]:
# Download the model https://huggingface.co/google-bert/bert-base-uncased
from transformers import BertTokenizer, BertModel

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained("bert-base-uncased")

text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)

print(output)

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

BaseModelOutputWithPoolingAndCrossAttentions(last_hidden_state=tensor([[[ 0.1386,  0.1583, -0.2967,  ..., -0.2708, -0.2844,  0.4581],
         [ 0.5364, -0.2327,  0.1754,  ...,  0.5540,  0.4981, -0.0024],
         [ 0.3002, -0.3475,  0.1208,  ..., -0.4562,  0.3288,  0.8773],
         ...,
         [ 0.3799,  0.1203,  0.8283,  ..., -0.8624, -0.5957,  0.0471],
         [-0.0252, -0.7177, -0.6950,  ...,  0.0757, -0.6668, -0.3401],
         [ 0.7535,  0.2391,  0.0717,  ...,  0.2467, -0.6458, -0.3213]]],
       grad_fn=<NativeLayerNormBackward0>), pooler_output=tensor([[-0.9377, -0.5043, -0.9799,  0.9030,  0.9329, -0.2438,  0.8926,  0.2288,
         -0.9531, -1.0000, -0.8862,  0.9906,  0.9855,  0.7155,  0.9455, -0.8645,
         -0.6035, -0.6666,  0.3020, -0.1587,  0.7455,  1.0000, -0.4022,  0.4261,
          0.6151,  0.9996, -0.8773,  0.9594,  0.9585,  0.6950, -0.6718,  0.3325,
         -0.9954, -0.2268, -0.9658, -0.9951,  0.6127, -0.7670,  0.0873,  0.0824,
         -0.9518,  0.4713,  1.00

# Congratulations! You have completed the lab.

## Authors

[Vicky Kuo](https://author.skills.network/instructors/vicky_kuo) is completing her Master's degree in IT at York University with scholarships. Her master's thesis explores the optimization of deep learning algorithms, employing an innovative approach to scrutinize and enhance neural network structures and performance.

© Copyright IBM Corporation. All rights reserved.