<a href="https://colab.research.google.com/github/mtzeve/M3_assignment_2/blob/main/notebooks/M3_3_NLG_4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LLMs in Harry Potter: Tom Riddle introduces himself to Harry Potter

In [2]:
%%html

<iframe width="966" height="543" src="https://www.youtube.com/embed/eh4b5zC0sB4" title="Tom Riddle introduces himself to Harry Potter | Harry Potter and the Chamber of Secrets" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>

# The brief history of LLMs

![](https://miro.medium.com/v2/resize:fit:1400/format:webp/1*RYNNKmmi1ShV7xx76qtXww.png)

### Summary of Transformer Architecture in Foundation Models

- **Introduction**: Introduced in 2017, transformers have advanced beyond Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks. Key in natural language processing and computer vision.

- **Features**:
  - **Self-Attention Mechanism**: Weighs significance of different input data parts.
  - **Parallel Processing**: Enhances performance and scalability.
  - **Bidirectionality**: Improves understanding of ambiguous words and coreferences.

- **Components**:
  - Original architecture: Encoder and Decoder.
  - Variations: BERT (encoders only), GPT (decoders only).

Transformers represent a significant leap in deep learning, enabling more efficient and effective data processing.


![](https://raw.githubusercontent.com/aaubs/ds-master/main/data/Images/TM_EvolutionaryTree.jpg)

### Roles of Encoders and Decoders in Foundation Models: BERT and GPT
The excerpt outlines the roles of encoders and decoders in foundation models, specifically focusing on BERT and GPT:
- **Encoders and Decoders**:
  - **BERT**: Utilizes only encoders.
  - **GPT**: Exclusively employs decoders.
  - Both models are proficient in understanding language, syntax, and semantics, with GPT's larger scale model (billions of parameters) excelling in these areas.

- **Applications**:
  - **BERT (Encoder)**: Ideal for classification (e.g., sentiment analysis), question-answering, summarization, and named entity recognition.
  - **GPT (Decoder)**: Specializes in translation and content generation (e.g., stories).

- **Outputs**:
  - **BERT**: Produces embeddings representing words with context-specific attention information.
  - **GPT**: Generates next-word predictions along with their probabilities.





### BERT - Encoders

- **Transformer Encoder Usage**:
  - BERT employs the encoder part of the transformer architecture.
  - It focuses on understanding semantic and syntactic language information.
  - BERT's output consists of embeddings, rather than predicted next words.

- **Application of Embeddings**:
  - To use these embeddings, additional layers (e.g., for text classification or questions and answers) need to be added on top of BERT.

- **Training Technique**:
  - BERT utilizes a unique training method to circumvent the need for expensive labeled data.
  - This involves using a technique called Self-Supervised Learning, particularly effective with large data volumes.

- **Masked Language Model**:
  - In BERT’s training, sentences are altered by masking words.
  - Example: **"Sarah went to a restaurant to meet her friend that night."** becomes **"Sarah went to a restaurant to meet her MASK that night."**
  - These masks help BERT generate its own labeled data from originally unlabeled data.
  - Each masked word prediction is informed by the other tokens in the sentence, which is processed by the encoder, eliminating the need for a decoder.



### GPT - Decoders

- **Role in Language Processing**:
  - Decoders are primarily used in generating next words in language tasks such as text translation or story generation.
  - The outputs from decoders are words accompanied by their respective probabilities.

- **Use of Attention Mechanisms**:
  - Decoders implement the attention concept twice during the model training process.
  - Initially, they use Masked Multi-Head Attention, similar to BERT's MASK concept. Here, only the initial words of a target sentence are shown to the model to facilitate learning without 'cheating'.
  - Subsequently, the decoder employs Multi-Head Attention, akin to what is used in encoders.

- **Interaction with Encoders in Transformer Models**:
  - In models combining encoders and decoders, there's a technique where the encoder's output (keys and values) is fed into the decoders.
  - Decoders use queries to find relevant keys, aiding in tasks like understanding and translating sentences, even with varying word counts and order.

- **GPT's Unique Approach**:
  - GPT deviates from this technique by using only a decoder.
  - Trained on massive datasets (Large Language Model), GPT's knowledge is embedded in billions of parameters.
  - This extensive training compensates for the absence of an encoder, embedding equivalent knowledge within the decoder.

- **Evolution in ChatGPT**:
  - ChatGPT has advanced these techniques, incorporating human-labeled data to mitigate issues like hate speech and abuse.
  - It also uses Reinforcement Learning to enhance model quality, as seen in "ChatGPT: Optimizing Language Models for Dialogue".



# How Smart Are They? Understanding the Scale of GPT-3 and GPT-4


| Assumption                                  | Description                                                                                       |
|---------------------------------------------|---------------------------------------------------------------------------------------------------|
| **Average Tokens per Book**                 | Estimated at 135,000 tokens per book, based on an average book length of 80,000 to 100,000 words.  |
| **Average Reading Lifetime of an Individual** | Estimated at 510 books per lifetime, assuming a moderate reading habit of 5-12 books per year over 60 years. |
| **Tokens per Word**                         | Estimated at 1.5 tokens per word, accounting for spaces and punctuation.                          |



| Detail                             | GPT-3                                   | GPT-4                                   |
|------------------------------------|-----------------------------------------|-----------------------------------------|
| **Developed By**                   | OpenAI                                  | OpenAI                                  |
| **Approximate Training Data Size** | 45 terabytes of text data               | Larger than GPT-3 (exact size unknown)  |
| **Estimated Token Count**          | 300-400 billion tokens                  | Likely over 500 billion tokens          |
| **Equivalent Number of Books**     | 2,222,222 - 2,962,963 books             | >3,703,704 books                        |
| **Equivalent Knowledge of People** | 4,356 - 5,810 people                    | >7,263 people                           |


# Why adapt the language model?

- LMs are trained in a task-agnostic way.
- Downstream tasks can be very different from language modeling on the Pile.
For example, consider the natural language inference (NLI) task (is the hypothesis entailed by the premise?):

      Premise: I have never seen an apple that is not red.
      Hypothesis: I have never seen an apple.
      Correct output: Not entailment (the reverse direction would be entailment)

- The format of such a task may not be very natural for the model.

# Ways downstream tasks can be different

- **Formatting**: for example, NLI takes in two sentences and compares them to produce a single binary output. This is different from generating the next token or filling in MASKs. Another example is the presence of MASK tokens in BERT training vs. no MASKs in downstream tasks.
- **Topic shift**: the downstream task is focused on a new or very specific topic (e.g., medical records)
- **Temporal shift**: the downstream task requires new knowledge that is unavailable during pre-training because 1) the knowledge is new (e.g., GPT3 was trained before Biden became President), 2) the knowledge for the downstream task is not publicly available.


# Optimizing Large Language Models

There are several options to optimize Large Language Models:

    Prompt engineering by providing samples (In-Context Learning)
    Prompt Tuning
    Fine-Tuning
       - Classic fine-tuning by changing all weights
       - Transfer Learning - PEFT fine-tuning by changing only a few weights
       - Reinforcement Learning Human Feedback (RLHF)

An important question is which of these options is the most effective one and which one can overwrite previous optimizations.

### Understanding Prompt Engineering, Prompt Tuning, and PEFT
These techniques are essential for efficiently adapting large, pre-trained models like GPT or BERT to specialized tasks or domains, optimizing resource usage and reducing training time.


1. **Prompt Engineering (In-Context Learning)**:
   - **Definition**: Crafting input prompts to guide a Large Language Model (LLM) for desired outputs.
   - **Application**: Uses natural language prompts to "program" the LLM, leveraging its contextual understanding.
   - **Model Change**: No alteration to the model's parameters; relies on the model's existing knowledge and interpretive abilities.

2. **Prompt Tuning**:
   - **Difference from Prompt Engineering**: Involves appending a trainable tensor (prompt tokens) to the LLM's input embeddings.
   - **Process**: Fine-tunes this tensor for a specific task and dataset, keeping other model parameters unchanged.
   - **Example**: Adapting a general LLM for specific tasks like sentiment classification by adjusting prompt tokens.

3. **Parameter-Efficient Fine-Tuning (PEFT)**:
   - **Overview**: A set of techniques to enhance model performance on specific tasks or datasets by tuning a small subset of parameters.
   - **Objective**: Targeted improvements without the need for full model retraining.
   - **Relation to Prompt Tuning**: Prompt tuning is a subset of PEFT, focusing on fine-tuning specific parts of the model for task/domain adaptation.



![](https://raw.githubusercontent.com/aaubs/ds-master/main/data/Images/PEFT_LLMs.png)

### Challenges

Fine-tuning models can certainly help to get models to do what you want them to do. However, there are some potential issues:

> - **Catastrophic forgetting**: This phenomenon describes a behavior when fine-tuning or prompts can overwrite the pre-trained model characteristics.
> - **Overfitting**: If only a certain AI task has been fine-tuned, other tasks can suffer in terms of performance.

In general, fine-tuning should be used wisely and best practices should be applied, for example, the quality of the data is more important than the quantity and multiple AI tasks should be fine-tuned at the same time vs after each other.

# Applications

There are four main platforms that can be used for LLMs' applications:


### LangChain

- Overview: LangChain is a versatile framework designed to simplify the utilization of language models across various tasks. It serves as a seamless toolset for connecting different language capabilities.

- Basic Usage: After installing LangChain, you can effortlessly import it into your Python script. To begin, initialize the LangChain class and use its methods to interact efficiently with GPT models, streamlining the process of applying language models to a wide array of applications.

- Advanced Usage: For advanced users, LangChain offers extensive flexibility. You can customize the underlying language model, integrate external knowledge sources, and combine various language capabilities for tackling complex tasks. This advanced functionality empowers you to develop sophisticated applications harnessing the full potential of language models.

### Llama2Index

- Overview: Llama2Index is a potent tool designed for indexing and searching large datasets with language models. It simplifies the creation of indexes for your data and enables efficient data retrieval through natural language queries.

- Indexing Data: To get started with Llama2Index, prepare your dataset and utilize the tool to create an index. This index serves as the foundation for seamlessly searching through your data using natural language queries, making your dataset easily accessible.

- Usage: Once your data is indexed using Llama2Index, you gain the capability to run natural language queries. This enables you to retrieve relevant information from your dataset effortlessly, expanding accessibility to a broad user base.

- Advanced Usage: Llama2Index is highly integratable, allowing seamless integration with other applications. This integration empowers them to provide natural language search capabilities, opening up possibilities for incorporating language-based search functionality into diverse software solutions.

### Llama.cpp

- Overview: Llama.cpp is a high-performance C++ framework tailored for language models, known for its efficiency and speed. It is an excellent choice for developing applications requiring interactions with language models.

- Basic Usage: To start with Llama.cpp, you can create a straightforward C++ program that utilizes the framework to interact with a GPT model. This simplicity allows developers to harness the power of language models without unnecessary complexity.

- Advanced Usage: For those seeking advanced capabilities, Llama.cpp provides features for optimizing performance in large-scale applications. Additionally, it offers the flexibility to integrate with other C++ projects, expanding the range of applications where language models can be employed effectively.

### Cohere

- Overview: Cohere is a comprehensive platform offering advanced natural language processing capabilities. It empowers users to leverage the potential of language models across a diverse range of applications. Cohere provides a set of tools and APIs for various language-related tasks, facilitating the development of intelligent and context-aware applications.

- Basic Usage: Getting started with Cohere is straightforward. You can seamlessly integrate Cohere's APIs into your applications to perform tasks such as text analysis, sentiment analysis, and language understanding. Cohere's pre-trained models are readily available, enabling you to extract valuable insights from text data effortlessly.

- Advanced Features: Cohere offers advanced features for users looking to customize and extend their natural language processing capabilities. You can fine-tune models to suit your specific tasks and integrate external data sources to enhance your applications' knowledge. Cohere's versatility makes it a valuable tool for both basic and complex language-related tasks, whether you are building chatbots, recommendation systems, or content analysis tools. Cohere empowers you to create intelligent and context-aware solutions that effectively understand and respond to human language.


# LangChain

    Build simple application with LangChain
    Trace your application with LangSmith
    Serve your application with LangServe

The simplest and most common chain contains three things:

- **Model/Chat (LLM) Wrappers**: The language model is the core reasoning engine here. In order to work with LangChain, you need to understand the different types of language models and how to work with them.

- **Prompt Template**: This provides instructions to the language model. This controls what the language model outputs, so understanding how to construct prompts and different prompting strategies is crucial.

- **Memory**: Provides a construct for storing and retrieving messages during a conversation which can be either short term or long term.

- **Indexes**: Help LLMs interact with documents by providing a way to structure them. LangChain provides Document Loaders to load documents, Text Splitters to split documents into smaller chunks, Vector Stores to store documents as embeddings, and Retrievers to fetch relevant documents.

- **Chain**: Probably the most important component of LangChain is the Chain class. It's a wrapper around the LLM that allows you to create a chain of actions.

- **Agents**:: Agents are the most powerful feature of LangChain. They allow you to combine LLMs with external data and tools.

- **Callbacks**: Callbacks mechanism allows you to go back to different stages of your LLM application using ‘callbacks’ argument of the API. It is used for logging, monitoring, streaming etc.



In this guide we'll cover those three components individually, and then go over how to combine them. Understanding these concepts will set you up well for being able to use and customize LangChain applications. Most LangChain applications allow you to configure the model and/or the prompt, so knowing how to take advantage of this will be a big enabler

## Setup

Installing LangChain is easy. You can install it with pip:

In [3]:
%time
!pip install -Uqqq pip --progress-bar off
# !pip install -qqq torch==2.0.1 --progress-bar off
!pip install -qqq transformers==4.33.2 --progress-bar off
!pip install -qqq langchain==0.0.299 --progress-bar off
!pip install -qqq xformers==0.0.21 --progress-bar off
!pip install -qqq sentence_transformers==2.2.2 --progress-bar off
!pip install -qqq tokenizers==0.14.0 --progress-bar off
!pip install -qqq optimum==1.13.1 --progress-bar off
!pip install -qqq auto-gptq==0.4.2 --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/ --progress-bar off
!pip install -qqq unstructured==0.10.16 --progress-bar off

CPU times: user 3 µs, sys: 0 ns, total: 3 µs
Wall time: 6.2 µs
[0m  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Building wheel for lit (pyproject.toml) ... [?25l[?25hdone
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchaudio 2.1.0+cu121 requires torch==2.1.0, but you have torch 2.0.1 which is incompatible.
torchdata 0.7.0 requires torch==2.1.0, but you have torch 2.0.1 which is incompatible.
torchtext 0.16.0 requires torch==2.1.0, but you have torch 2.0.1 which is incompatible.
torchvision 0.16.0+cu121 requires torch==2.1.0, but you have torch 2.0.1 which is incompatible.[0m[31m
[0m  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for sentence_tra

Note that we're also installing a few other libraries that we'll be using in this tutorial.

## Model (LLM) Wrappers

Using Llama 2 is as easy as using any other HuggingFace model. We'll be using the HuggingFacePipeline wrapper (from LangChain) to make it even easier to use. To load the 13B version of the model, we'll use a GPTQ (Generative Pre-trained Transformer Quantization) version of the model:

In [4]:
!pip install accelerate --q

[0m

In [5]:
import torch
from langchain import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, pipeline

MODEL_NAME = "TheBloke/Llama-2-7B-Chat-GPTQ"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME, torch_dtype=torch.float16, trust_remote_code=True, device_map="auto"
)

# Create a configuration for text generation based on the specified model name
generation_config = GenerationConfig.from_pretrained(MODEL_NAME)

# Set the maximum number of new tokens in the generated text to 1024.
# This limits the length of the generated output to 1024 tokens.
generation_config.max_new_tokens = 1024

# Set the temperature for text generation. Lower values (e.g., 0.0001) make output more deterministic, following likely predictions.
# Higher values make the output more random.
generation_config.temperature = 0.0001

# Set the top-p sampling value. A value of 0.95 means focusing on the most likely words that make up 95% of the probability distribution.
generation_config.top_p = 0.95

# Enable text sampling. When set to True, the model randomly selects words based on their probabilities, introducing randomness.
generation_config.do_sample = True

# Set the repetition penalty. A value of 1.15 discourages the model from repeating the same words or phrases too frequently in the output.
generation_config.repetition_penalty = 1.15


# Create a text generation pipeline using the initialized model, tokenizer, and generation configuration
text_pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    generation_config=generation_config,
)

# Create a LangChain pipeline that wraps the text generation pipeline and set a specific temperature for generation
llm = HuggingFacePipeline(pipeline=text_pipeline, model_kwargs={"temperature": 0})


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/727 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/411 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/789 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.90G [00:00<?, ?B/s]



generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

GPTQ has been shown to be able to quantize GPTs down to 4-bit weights with minimal loss of accuracy. This means that GPTQs can be run on much smaller and cheaper hardware, such as smartphones and laptops.

GPTQ is a promising new technology that could make LLMs more accessible to a wider range of users.

Here are some of the benefits of using GPTQ:

> - Smaller model size: GPTQ can reduce the model size by up to 90%, without sacrificing too much accuracy. This makes it possible to deploy GPTs on smaller and cheaper hardware.
- Faster inference: GPTQ can also speed up inference by up to 4x. This makes it possible to use GPTs in more real-time applications.
- Lower power consumption: GPTQ can also reduce power consumption by up to 80%. This makes it possible to use GPTs on battery-powered devices.

Good thing is that the transformers library supports loading models in GPTQ format using the AutoGPTQ library. Let's try out our LLM:

In [6]:
result = llm(
    "Explain the difference between ChatGPT and open source LLMs in a couple of lines."
)
print(result)


ChatGPT is an AI-powered chatbot developed by Meta AI that can understand and respond to user input in a conversational manner. Open source LLMs, on the other hand, are language models that are available for anyone to use and modify, with some examples including BERT, RoBERTa, and XLNet. While both types of models have their own strengths and weaknesses, they differ in terms of their architecture, training data, and licensing restrictions.


##Exercise 1:

Check the results of different settings for a prompt. You can change temperature, top_p, do_sample, and repetition_penalty in the model configuration and compare the results.

## Prompts and Prompt Templates

One of the most useful features of LangChain is the ability to create prompt templates. A prompt template is a string that contains a placeholder for input variable(s). Let's see how we can use them:

In [7]:
import torch
from langchain import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, pipeline

MODEL_NAME = "TheBloke/Llama-2-7B-Chat-GPTQ"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME, torch_dtype=torch.float16, trust_remote_code=True, device_map="auto"
)

# Create a configuration for text generation based on the specified model name
generation_config = GenerationConfig.from_pretrained(MODEL_NAME)

# Set the maximum number of new tokens in the generated text to 1024.
# This limits the length of the generated output to 1024 tokens.
generation_config.max_new_tokens = 1024

# Set the temperature for text generation. Lower values (e.g., 0.0001) make output more deterministic, following likely predictions.
# Higher values make the output more random.
generation_config.temperature = 0.0001

# Set the top-p sampling value. A value of 0.95 means focusing on the most likely words that make up 95% of the probability distribution.
generation_config.top_p = 0.95

# Enable text sampling. When set to True, the model randomly selects words based on their probabilities, introducing randomness.
generation_config.do_sample = True

# Set the repetition penalty. A value of 1.15 discourages the model from repeating the same words or phrases too frequently in the output.
generation_config.repetition_penalty = 1.15


# Create a text generation pipeline using the initialized model, tokenizer, and generation configuration
text_pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    generation_config=generation_config,
)

# Create a LangChain pipeline that wraps the text generation pipeline and set a specific temperature for generation
llm = HuggingFacePipeline(pipeline=text_pipeline, model_kwargs={"temperature": 0})


In [8]:
from langchain.prompts import PromptTemplate

# Define the template for generating prompts
template = """
<s>[INST] <<SYS>>
Behave as a teacher and provide an explanation for the following query:
<</SYS>>

{text} [/INST]
"""

# Initialize the PromptTemplate with the specified variables and template
prompt = PromptTemplate(
    input_variables=["text"],  # Specify the variables to be included in the prompt
    template=template,  # Define the template structure
)

In [9]:
text = "How does attention mechanism work? Let's think step by step"

In [10]:
print(prompt.format(text=text))


<s>[INST] <<SYS>>
Behave as a teacher and provide an explanation for the following query:
<</SYS>>

How does attention mechanism work? Let's think step by step [/INST]



In [11]:
result = llm(prompt.format(text=text))
print(result)

Great, let me explain how Attention Mechanism works in a neural network! 🧠
Attention is a technique used in deep learning models to allow the model to focus on specific parts of the input data when making predictions or decisions. It helps the model to selectively concentrate on the most relevant pieces of information, instead of using a fixed context or considering the entire input equally important. 🔍
So, let's break it down step-by-step:
1️ Step 1: The Attention Mechanism takes the input data (e.g., a sentence) and splits it into multiple segments called "keys," "values," and "queries." These keys, values, and queries are vectors of different lengths, typically with the same dimensionality.
2️ Step 2: The queries are used to compute a weighted sum of the values, where the weights are learned during training and reflect the relative importance of each key for the given query. This process is repeated for multiple queries, resulting in a set of weighted sums.
3️ Step 3: The final outp

## Exercise 2: Basic Prompt Formatting for Sum Calculation

Define a PromptTemplate acting as a calculator that takes two input values and formats a prompt to calculate their sum.

## Exercise 3:

Modify the prompt or question to explore how we can improve the model's performance.

In [12]:
from langchain import PromptTemplate

template = """
<s>[INST] <<SYS>>
Act as a Machine Learning engineer who is teaching high school students.
<</SYS>>

{text} [/INST]
"""

prompt = PromptTemplate(
    input_variables=["text"],
    template=template,
)

The variable must be surrounded by {}. The input_variables argument is a list of variable names that will be used to format the template. Let's see how we can use it:

In [13]:
text = "Explain what are Deep Neural Networks in 2-3 sentences"
print(prompt.format(text=text))


<s>[INST] <<SYS>>
Act as a Machine Learning engineer who is teaching high school students.
<</SYS>>

Explain what are Deep Neural Networks in 2-3 sentences [/INST]



You just have to use the format method of the PromptTemplate instance. The format method returns a string that can be used as input to the LLM. Let's see how we can use it:

In [None]:
result = llm(prompt.format(text=text))
print(result)

## Chain

Probably the most important component of LangChain is the Chain class. It's a wrapper around the LLM that allows you to create a chain of actions. Here's how you can use the simplest chain:

In [None]:
from langchain.chains import LLMChain

chain = LLMChain(llm=llm, prompt=prompt)
result = chain.run(text)
print(result)

The arguments to the LLMChain class are the LLM instance and the prompt template.

## Exercise 4: Use the LLMChain for Direct Response Generation
Task: Create a new PromptTemplate for a fitness coach explaining the benefits of regular exercise and use LLMChain to generate a response.

#### Chaining Chains

The LLMChain is not that different from using the LLM directly. Let's see how we can chain multiple chains together. We'll create a chain that will first explain what are Deep Neural Networks and then give a few examples of practical applications. Let's start by creating the second chain:

In [None]:
template = "<s>[INST] Use the summary {summary} and give 3 examples of practical applications with 1 sentence explaining each [/INST]"

examples_prompt = PromptTemplate(
    input_variables=["summary"],
    template=template,
)
examples_chain = LLMChain(llm=llm, prompt=examples_prompt)

Now we can reuse our first chain along with the examples_chain and combine them into a single chain using the SimpleSequentialChain class:

In [None]:
from langchain.chains import SimpleSequentialChain

# Create an instance of 'SimpleSequentialChain'. This chain will execute two other chains
# sequentially. The 'chains' parameter is a list of these chains - 'chain' and 'examples_chain'.
multi_chain = SimpleSequentialChain(chains=[chain, examples_chain], verbose=True)

# The 'run' method executes the chains in the order they are listed, passing the output
# of one chain as the input to the next. The final output is then stored in the variable 'result'.
result = multi_chain.run(text)

print(result.strip())

## Exercise 5: Chaining Multiple Chains Together
Task: Explain a scientific concept and then provide real-world applications.

## Chatbot

LangChain makes it easy to create chatbots. Let's see how we can create a simple chatbot that will answer questions about Deep Neural Networks. We'll use the ChatPromptTemplate class to create a template for the chatbot:

In [None]:
from langchain.prompts.chat import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    SystemMessagePromptTemplate,
)
from langchain.schema import AIMessage, HumanMessage

template = "Act as an experienced high school teacher that teaches {subject}. Always give examples and analogies"
human_template = "{text}"

chat_prompt = ChatPromptTemplate.from_messages(
    [
        SystemMessagePromptTemplate.from_template(template),
        HumanMessage(content="Hello teacher!"),
        AIMessage(content="Welcome everyone!"),
        HumanMessagePromptTemplate.from_template(human_template),
    ]
)

messages = chat_prompt.format_messages(
    subject="Artificial Intelligence", text="What is the most powerful AI model?"
)
messages

We start by creating a system message that will be used to initialize the chatbot. Then we create a human message that will be used to start the conversation. Next, we create an AI message that will be used to respond to the human message. Finally, we create a human message that will be used to ask the question. We can use the format_messages method to format the messages.

To use our LLM with the messages, we'll pass them to the predict_messages method:

In [None]:
result = llm.predict_messages(messages)
print(result.content)

In [None]:
# Assuming necessary imports and initializations have been done...

# Define the initial template for the AI acting as a high school teacher.
teacher_template = "Act as an experienced high school teacher specializing in {subject}. Respond to the student's questions with informative answers, examples, and analogies."

# Set the subject that the teacher specializes in.
subject = "Artificial Intelligence"

# The loop for the interactive conversation.
while True:
    # Get user input.
    user_input = input("You: ")

    # Check for a quit condition.
    if user_input.lower() in ["exit", "quit"]:
        break

    # Construct the complete prompt for the AI model.
    # This includes the role description (teacher_template) and the user's question.
    complete_prompt = teacher_template.format(subject=subject) + "\nStudent asks: " + user_input + "\nTeacher:"

    # Use the language model to generate a response.
    # Ensure that 'llm.predict' is the correct method for your setup.
    # This method should take the prompt as input and return the AI's response.
    ai_response = llm.predict(complete_prompt)

    # Print the AI's response.
    # Make sure that 'ai_response' is being correctly extracted from the model's output.
    print("Teacher:", ai_response)

# End the conversation loop.
print("Conversation ended.")

## Agents

Agents are the most powerful feature of LangChain. They allow you to combine LLMs with external data and tools. Let's see how we can create a simple agent that will use the Python REPL to calculate the square root of a number and divide it by 2:

In [None]:
from langchain.agents.agent_toolkits import create_python_agent
from langchain.tools.python.tool import PythonREPLTool

agent = create_python_agent(llm=llm, tool=PythonREPLTool(), verbose=True)

result = agent.run("Calculate the square root of a number and divide it by 2")

Python REPL stands for "Read-Eval-Print Loop." It's an interactive environment where you can write Python code and execute it immediately.

Here's the final answer from our agent:

In [None]:
result

Let's run the code from the agent in a Python REPL:

In [None]:
from math import sqrt

x = 16
y = sqrt(x)
z = y / 2
z

So, our agent works but made a mistake in the calculations. This is important, you might hear great things about AI, but it's still not perfect. Maybe another, more powerful LLM, will get this right. Try it out and let me know.

Here's the response to the same prompt but using ChatGPT:

     Enter a number: 16
     The square root of 16.0 divided by 2 is: 2.0

In [None]:
!pip install wikipedia

In [None]:
import wikipedia

class WikipediaAgent:
    def search(self, query):
        # Search Wikipedia and return the summary of the first result.
        try:
            # Get the page summary for the query
            summary = wikipedia.summary(query)
            return summary
        except wikipedia.exceptions.DisambiguationError as e:
            # If there's a disambiguation issue, return the options.
            return "Disambiguation Error: " + '; '.join(e.options)
        except wikipedia.exceptions.PageError:
            # If the page is not found, inform the user.
            return "Page not found for the query."

# Create an instance of the WikipediaAgent
wiki_agent = WikipediaAgent()

# Example use of the agent to search for a term
result = wiki_agent.search("Artificial Intelligence")
print(result)