# Exploring LLMs and ChatModels for LLM Input / Output with LangChain

## Install OpenAI, HuggingFace and LangChain dependencies

In [1]:
import langchain
langchain.__version__

'0.3.7'

In [40]:
# Don't run if you want to use only chatgpt
# This is for accessing open LLMs from huggingface
import transformers
transformers.__version__

'4.46.2'

## Enter API Tokens

#### Enter your Open AI Key here

You can get the key from [here](https://platform.openai.com/api-keys) after creating an account or signing in

In [16]:
import os

In [17]:
from dotenv import load_dotenv

In [18]:
load_dotenv('/home/santhosh/Projects/courses/Pinnacle/.env')

True

In [19]:
OPENAI_API_KEY = os.environ['OPENAI_API_KEY']

#### Enter your HuggingFace token here

You can get the key from [here](https://huggingface.co/settings/tokens) after creating an account or signing in. This is free.

In [9]:
# skip if only using chatgpt
from getpass import getpass

HUGGINGFACEHUB_API_TOKEN = getpass('Please enter your HuggingFace Token here: ')

Please enter your HuggingFace Token here: ··········


## Setup necessary system environment variables

In [10]:
import os

os.environ['HUGGINGFACEHUB_API_TOKEN'] = HUGGINGFACEHUB_API_TOKEN
os.environ['OPENAI_API_KEY'] = OPENAI_KEY

# Model I/O

In LangChain, the central part of any application is the language model. This module provides crucial tools for working effectively with any language model, ensuring it integrates smoothly and communicates well.

### Key Components of Model I/O

**LLMs and Chat Models (used interchangeably):**
- **LLMs:**
  - **Definition:** Pure text completion models.
  - **Input/Output:** Receives a text string and returns a text string.
- **Chat Models:**
  - **Definition:** Based on a language model but with different input and output types.
  - **Input/Output:** Takes a list of chat messages as input and produces a chat message as output.


## Chat Models and LLMs

Large Language Models (LLMs) are a core component of LangChain. LangChain does not implement or build its own LLMs. It provides a standard API for interacting with almost every LLM out there.

There are lots of LLM providers (OpenAI, Hugging Face, etc) - the LLM class is designed to provide a standard interface for all of them.

## Accessing Commercial LLMs like ChatGPT



### Accessing ChatGPT as an LLM

Here we will show how to access a basic ChatGPT Instruct LLM. However the ChatModel interface which we will see later, is better because the LLM API doesn't support the chat models like `gpt-3.5-turbo`and only support the `instruct`models which can respond to instructions but can't have a conversation with you.

In [8]:
from langchain_openai import OpenAI

chatgpt = OpenAI(model_name="gpt-3.5-turbo-instruct", temperature=0)

In [9]:
prompt = """Explain what is Generative AI in 3 bullet points"""
print(prompt)

Explain what is Generative AI in 3 bullet points


In [10]:
response = chatgpt.invoke(prompt)
print(response)



1. Generative AI is a subset of artificial intelligence that focuses on creating new and original content, rather than just analyzing and processing existing data.

2. It uses algorithms and machine learning techniques to generate new ideas, designs, or solutions based on a set of input data or parameters.

3. Generative AI has a wide range of applications, including creating art, music, and text, as well as assisting in product design and optimization. It has the potential to revolutionize industries by automating creative tasks and providing innovative solutions.


### Accessing ChatGPT as an Chat Model LLM

Here we will show how to access the more advanced ChatGPT Turbo Chat-based LLM. The ChatModel interface is better because this supports the chat models like `gpt-3.5-turbo`which can respond to instructions as well as have a conversation with you. We will look at the conversation aspect slightly later in the notebook.

In [11]:
from langchain_openai import ChatOpenAI

chatgpt = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

In [12]:
prompt = """Explain what is Generative AI in 3 bullet points"""
print(prompt)

Explain what is Generative AI in 3 bullet points


In [13]:
response = chatgpt.invoke(prompt)
response

AIMessage(content='- Generative AI is a type of artificial intelligence that is capable of creating new content, such as images, text, or music, based on patterns and data it has been trained on.\n- It uses algorithms and neural networks to generate new content that is similar to the input data it has been trained on, but with variations and creativity.\n- Generative AI has applications in various fields, including art, design, music composition, and even creating realistic deepfake videos.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 94, 'prompt_tokens': 19, 'total_tokens': 113, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-45c79876-8ae7-4660-ab67-5786f1fd1985-0', 

In [14]:
print(response.content)

- Generative AI is a type of artificial intelligence that is capable of creating new content, such as images, text, or music, based on patterns and data it has been trained on.
- It uses algorithms and neural networks to generate new content that is similar to the input data it has been trained on, but with variations and creativity.
- Generative AI has applications in various fields, including art, design, music composition, and even creating realistic deepfake videos.


In [15]:
from langchain_google_genai import ChatGoogleGenerativeAI

gemini = ChatGoogleGenerativeAI(model="gemini-1.5-flash", temperature=0)

In [5]:
prompt = """Explain what is Generative AI in 3 bullet points"""
print(prompt)

Explain what is Generative AI in 3 bullet points


In [17]:
response = gemini.invoke(prompt)
response

AIMessage(content='Here are 3 bullet points explaining Generative AI:\n\n* **Creates new content:** Generative AI models learn patterns from existing data and use that knowledge to generate new, original content like text, images, music, code, and more.\n* **Powered by deep learning:** These models are trained on massive datasets using deep learning algorithms, allowing them to understand complex relationships and generate realistic outputs.\n* **Applications across industries:** Generative AI is transforming various fields, from creative writing and art to drug discovery and software development, by automating tasks and creating innovative solutions. \n', additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'safety_ratings': [{'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'

In [18]:
print(response.content)

Here are 3 bullet points explaining Generative AI:

* **Creates new content:** Generative AI models learn patterns from existing data and use that knowledge to generate new, original content like text, images, music, code, and more.
* **Powered by deep learning:** These models are trained on massive datasets using deep learning algorithms, allowing them to understand complex relationships and generate realistic outputs.
* **Applications across industries:** Generative AI is transforming various fields, from creative writing and art to drug discovery and software development, by automating tasks and creating innovative solutions. 



## Accessing Open Source LLMs with HuggingFace and LangChain

### Accessing Open LLMs with HuggingFace Serverless API

The free [serverless API](https://huggingface.co/inference-api/serverless) lets you implement solutions and iterate in no time, but it may be rate limited for heavy use cases, since the loads are shared with other requests.

For enterprise workloads, you can use Inference Endpoints - Dedicated which would be hosted on a specific cloud instance of your choice and would have a cost associated with it. Here we will use the free serverless API which works quite well in most cases.

The advantage is you do not need to download the models or run them locally on a GPU compute infrastructure which takes time and also would cost you a fair amount.

#### Accessing Microsoft Phi-3 Mini Instruct

The Phi-3-Mini-4K-Instruct is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. Check more details [here](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)

In [11]:
from langchain_huggingface import HuggingFaceEndpoint

repo_id = "microsoft/Phi-3.5-mini-instruct"

phi3_params = {
                  "wait_for_model": True, # waits if model is not available in Hugginface serve
                  "do_sample": False, # greedy decoding - temperature = 0
                  "return_full_text": False, # don't return input prompt
                  "max_new_tokens": 1000, # max tokens answer can go upto
                }

llm = HuggingFaceEndpoint(
    repo_id=repo_id,
    # max_length=128,
    temperature=0.5,
    # huggingfacehub_api_token=HUGGINGFACEHUB_API_TOKEN,
   **phi3_params
)

                    wait_for_model was transferred to model_kwargs.
                    Please make sure that wait_for_model is what you intended.


In [23]:
prompt

'Explain what is Generative AI in 3 bullet points'

In [24]:
# Phi3 expects input prompt to be formatted in a specific way
# check more details here: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct
phi3_prompt = """<|user|>Explain what is Generative AI in 3 bullet points<|end|>
<|assistant|>"""
print(phi3_prompt)

<|user|>Explain what is Generative AI in 3 bullet points<|end|>
<|assistant|>


In [25]:
response = llm.invoke(phi3_prompt)
print(response)

- **Definition**: Generative AI refers to a subset of artificial intelligence technologies that are capable of creating new, original content by learning from a vast dataset of existing examples. Unlike discriminative models that classify or predict outcomes, generative models can produce novel outputs that mimic the distribution of the training data.

- **Applications**: Generative AI is used in various fields such as:
  - **Content Creation**: Generating realistic images, music, writing, and artwork for entertainment, media, and creative industries.
  - **Design and Engineering**: Automating the design process for products, buildings, and systems by generating new design variations based on learned patterns.
  - **Data Augmentation**: Enhancing training datasets for machine learning models by creating additional synthetic data, which can improve model performance and robustness.

- **Technologies**: Several key technologies and approaches enable generative AI, including:
  - **Genera

#### Accessing Google Gemma 2B Instruct

Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as a laptop, desktop or your own cloud infrastructure. Check more details [here](https://huggingface.co/google/gemma-1.1-2b-it)

In [12]:
gemma_repo_id = "google/gemma-2b-it"

gemma_params = {
                  "wait_for_model": True, # waits if model is not available in Hugginface serve
                  "do_sample": False, # greedy decoding - temperature = 0
                  "return_full_text": False, # don't return input prompt
                  "max_new_tokens": 1000, # max tokens answer can go upto
                }

llm = HuggingFaceEndpoint(
    repo_id=gemma_repo_id,
    **gemma_params
)

                    wait_for_model was transferred to model_kwargs.
                    Please make sure that wait_for_model is what you intended.


In [35]:
prompt

'Explain what is Generative AI in 3 bullet points'

In [36]:
response = llm.invoke(prompt)
print(response)

InferenceTimeoutError: Model not loaded on the server: https://api-inference.huggingface.co/models/google/gemma-2b-it. Please retry with a higher timeout (current: 120).

### Accessing Local LLMs with HuggingFacePipeline API

Hugging Face models can be run locally through the `HuggingFacePipeline` class. However remember you need a good GPU to get fast inference

The Hugging Face Model Hub hosts over 500k models, 90K+ open LLMs

These can be called from LangChain either through this local pipeline wrapper or by calling their hosted inference endpoints through the `HuggingFaceEndpoint` API we saw earlier.

To use, you should have the `transformers` python package installed, as well as `pytorch`.

Advantages include the model being completely local, high privacy and security. Disadvantages are basically the necessity of a good compute infrastructure, preferably with a GPU

#### Accessing Google Gemma 2B and running it locally

In [1]:
from langchain_huggingface import HuggingFacePipeline

In [2]:
gemma_params = {
                  "do_sample": False, # greedy decoding - temperature = 0
                  "return_full_text": False, # don't return input prompt
                  "max_new_tokens": 1000, # max tokens answer can go upto
                }

local_llm = HuggingFacePipeline.from_model_id(
    model_id="google/gemma-1.1-2b-it",
    task="text-generation",
    pipeline_kwargs=gemma_params,
    # device=0 # when running on Colab selects the GPU, you can change this if you run it on your own instance if needed
)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [3]:
local_llm

HuggingFacePipeline(pipeline=<transformers.pipelines.text_generation.TextGenerationPipeline object at 0x73947ffe2090>, model_id='google/gemma-1.1-2b-it', model_kwargs={}, pipeline_kwargs={'do_sample': False, 'return_full_text': False, 'max_new_tokens': 1000})

In [6]:
prompt

'Explain what is Generative AI in 3 bullet points'

In [7]:
# Gemma2B when used locally expects input prompt to be formatted in a specific way
# check more details here: https://huggingface.co/google/gemma-1.1-2b-it#chat-template
gemma_prompt = """<bos><start_of_turn>user\n""" + prompt + """\n<end_of_turn>
<start_of_turn>model
"""
print(gemma_prompt)

<bos><start_of_turn>user
Explain what is Generative AI in 3 bullet points
<end_of_turn>
<start_of_turn>model



In [8]:
response = local_llm.invoke(gemma_prompt)
print(response)

* **Generative AI** is a type of artificial intelligence that focuses on creating new content, such as images, text, music, and videos, based on existing data.


* It utilizes machine learning algorithms to learn patterns and relationships from vast datasets and then use this knowledge to generate new outputs that are similar or inspired by the input data.


* Generative AI models can learn complex relationships and generate novel and unexpected results, pushing the boundaries of content creation and automation.


### Accessing Open LLMs in HuggingFace as a Chat Model LLM

Here we will show how to access open LLMs from HuggingFace like Google Gemma 2B and make them have a conversation with you. We will look at the conversation aspect slightly later in the notebook.

In [13]:
from langchain_huggingface import ChatHuggingFace

chat_gemma = ChatHuggingFace(llm=llm,
                             model_id='google/gemma-1.1-2b-it')

tokenizer_config.json:   0%|          | 0.00/34.2k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

In [32]:
print(response.content)

* **Generative AI** focuses on developing models that can generate new content, such as images, text, music, or videos, based on existing data.


* **Data-driven approach:** AI algorithms are trained on vast datasets to learn patterns and relationships, enabling them to create novel and creative outputs.


* **Automated content creation:** Generative AI models automate the content creation process, reducing human effort and increasing efficiency in content generation tasks.


## Message Types for ChatModels and Conversational Prompting

Conversational prompting is basically you, the user, having a full conversation with the LLM. The conversation history is typically represented as a list of messages.

ChatModels process a list of messages, receiving them as input and responding with a message. Messages are characterized by a few distinct types and properties:

- **Role:** Indicates who is speaking in the message. LangChain offers different message classes for various roles.
- **Content:** The substance of the message, which can vary:
  - A string (commonly handled by most models)
  - A list of dictionaries (for multi-modal inputs, where each dictionary details the type and location of the input)

Additionally, messages have an `additional_kwargs` property, used for passing extra information specific to the message provider, not typically general. A well-known example is `function_call` from OpenAI.

### Specific Message Types

- **HumanMessage:** A user-generated message, usually containing only content.
- **AIMessage:** A message from the model, potentially including `additional_kwargs`, like `tool_calls` for invoking OpenAI tools.
- **SystemMessage:** A message from the system instructing model behavior, typically containing only content. Not all models support this type.


## Conversational Prompting with ChatGPT

Here we use the `ChatModel` API in `ChatOpenAI` to have a full conversation with ChatGPT while maintaining a full flow of the historical conversations

In [22]:
from langchain_openai import ChatOpenAI

chatgpt = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)

In [23]:
from langchain_core.messages import HumanMessage, SystemMessage

prompt = """Can you explain what is Generative AI in 3 bullet points?"""
sys_prompt = """Act as a helpful assistant and give meaningful examples in your responses."""
messages = [
    SystemMessage(content=sys_prompt),
    HumanMessage(content=prompt),
]

messages

[SystemMessage(content='Act as a helpful assistant and give meaningful examples in your responses.', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='Can you explain what is Generative AI in 3 bullet points?', additional_kwargs={}, response_metadata={})]

In [24]:
response = chatgpt.invoke(messages)
response

AIMessage(content='Certainly! Here are three key points that explain Generative AI:\n\n1. **Definition and Functionality**: Generative AI refers to a class of artificial intelligence models that can create new content, such as text, images, music, or even videos, by learning patterns from existing data. For example, models like GPT-3 can generate human-like text based on prompts, while DALL-E can create images from textual descriptions.\n\n2. **Applications**: Generative AI has a wide range of applications across various fields. In creative industries, it can assist in generating artwork, writing scripts, or composing music. In business, it can be used for generating marketing content, product designs, or even simulating customer interactions in chatbots.\n\n3. **Ethical Considerations**: The use of Generative AI raises important ethical questions, such as issues of copyright, misinformation, and the potential for misuse. For instance, deepfake technology can create realistic but fake 

In [25]:
print(response.content)

Certainly! Here are three key points that explain Generative AI:

1. **Definition and Functionality**: Generative AI refers to a class of artificial intelligence models that can create new content, such as text, images, music, or even videos, by learning patterns from existing data. For example, models like GPT-3 can generate human-like text based on prompts, while DALL-E can create images from textual descriptions.

2. **Applications**: Generative AI has a wide range of applications across various fields. In creative industries, it can assist in generating artwork, writing scripts, or composing music. In business, it can be used for generating marketing content, product designs, or even simulating customer interactions in chatbots.

3. **Ethical Considerations**: The use of Generative AI raises important ethical questions, such as issues of copyright, misinformation, and the potential for misuse. For instance, deepfake technology can create realistic but fake videos, leading to concer

In [26]:
# add the past conversation history into messages
messages.append(response)
# add the new prompt to the conversation history list
prompt = """What did we discuss so far?"""
messages.append(HumanMessage(content=prompt))
messages

[SystemMessage(content='Act as a helpful assistant and give meaningful examples in your responses.', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='Can you explain what is Generative AI in 3 bullet points?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='Certainly! Here are three key points that explain Generative AI:\n\n1. **Definition and Functionality**: Generative AI refers to a class of artificial intelligence models that can create new content, such as text, images, music, or even videos, by learning patterns from existing data. For example, models like GPT-3 can generate human-like text based on prompts, while DALL-E can create images from textual descriptions.\n\n2. **Applications**: Generative AI has a wide range of applications across various fields. In creative industries, it can assist in generating artwork, writing scripts, or composing music. In business, it can be used for generating marketing content, product designs, or even simul

In [27]:
# sent the conversation history along with the new prompt to chatgpt
response = chatgpt.invoke(messages)
response.content

"So far, we discussed the concept of Generative AI, highlighting three key points:\n\n1. **Definition and Functionality**: Generative AI creates new content by learning from existing data, with examples like GPT-3 for text generation and DALL-E for image creation.\n  \n2. **Applications**: We explored various applications of Generative AI in creative industries, business, and customer interactions.\n\n3. **Ethical Considerations**: We touched on the ethical implications of Generative AI, including issues related to copyright, misinformation, and the potential for misuse, such as deepfakes.\n\nIf you have any further questions or topics you'd like to explore, feel free to ask!"

## Conversational Prompting with Open LLMs via HuggingFace

Here we use the `ChatModel` API in `ChatHuggingFace` to have a full conversation with any open LLMs while maintaining a full flow of the historical conversations. Here we use the Google Gemma 2B LLM.

In [28]:
llm

HuggingFaceEndpoint(repo_id='google/gemma-2b-it', max_new_tokens=1000, stop_sequences=[], server_kwargs={}, model_kwargs={'wait_for_model': True}, model='google/gemma-2b-it', client=<InferenceClient(model='google/gemma-2b-it', timeout=120)>, async_client=<InferenceClient(model='google/gemma-2b-it', timeout=120)>)

In [30]:
# not needed if you are only running chatgpt
from langchain_huggingface import ChatHuggingFace

chat_gemma = ChatHuggingFace(llm=llm,
                             model_id='google/gemma-1.1-2b-it')

In [41]:
# this runs prompts using the open LLM - however gemma doesnt support a system prompt
prompt = """Explain Deep Learning in 3 bullet points"""

messages = [
    HumanMessage(content=prompt),
]

response = chat_gemma.invoke(messages) # doesn't support system prompts
messages.append(response)
print(response.content)

* **Automatic feature extraction:** Deep learning models learn patterns and features from vast datasets, automatically extracting the most relevant information for tasks like image recognition, natural language processing, and speech recognition.


* **Multi-layered learning:** Deep learning models are built with interconnected layers of artificial neurons, inspired by the structure of the human brain. This enables them to learn complex relationships and patterns within data.


* **Representation learning:** Deep learning models learn to represent data in a compressed and efficient way, allowing them to make predictions and decisions with high accuracy.


In [31]:
# this runs prompts using the open LLM - however gemma doesnt support a system prompt
prompt = """Explain Deep Learning in 3 bullet points"""

messages = [
    HumanMessage(content=prompt),
]

response = chat_gemma.invoke(messages) # doesn't support system prompts
messages.append(response)
print(response.content)

- **Artificial neural networks:** Cronly inspired by the human brain's structure and function.
- **Harnessing vast data:** Creates models that learn from large amounts of data, identifying patterns and relationships automatically.
- **Supervised and Unsupervised Learning:** Can be used for both, where data is divided into training (supervised), and validation (unsupervised) sets.


In [32]:
messages

[HumanMessage(content='Explain Deep Learning in 3 bullet points', additional_kwargs={}, response_metadata={}),
 AIMessage(content="- **Artificial neural networks:** Cronly inspired by the human brain's structure and function.\n- **Harnessing vast data:** Creates models that learn from large amounts of data, identifying patterns and relationships automatically.\n- **Supervised and Unsupervised Learning:** Can be used for both, where data is divided into training (supervised), and validation (unsupervised) sets.", additional_kwargs={}, response_metadata={'token_usage': ChatCompletionOutputUsage(completion_tokens=78, prompt_tokens=18, total_tokens=96), 'model': '', 'finish_reason': 'stop'}, id='run-2a80e934-1a44-43a8-a72d-068309430e36-0')]

In [33]:
# formatting prompt is automatically done inside the chatmodel
# formats in this syntax: https://huggingface.co/google/gemma-1.1-2b-it#chat-template
print(chat_gemma._to_chat_prompt([messages[0]]))

<bos><start_of_turn>user
Explain Deep Learning in 3 bullet points<end_of_turn>
<start_of_turn>model



In [44]:
prompt = """Now do the same for Machine learning"""
messages.append(HumanMessage(content=prompt))

response = chat_gemma.invoke(messages) # doesn't support system prompts
print(response.content)

**Machine Learning in 3 bullet points:**

* **Data-driven learning:** Machine learning algorithms are trained on large datasets to identify patterns and relationships within the data. This allows them to make predictions or decisions based on past data experiences.


* **Pattern recognition:** Machine learning algorithms are designed to classify and categorize data into predefined categories. This helps in tasks like image recognition, spam filtering, and medical diagnosis.


* **Automated decision making:** Machine learning models can learn complex decision-making rules from large datasets, enabling them to make predictions or decisions based on data-driven insights.


In [35]:
from huggingface_hub import InferenceClient

In [36]:
import huggingface_hub

In [37]:
huggingface_hub.__version__

'0.23.5'

In [38]:
client = InferenceClient(model='google/gemma-1.1-2b-it')

In [5]:
for message in client.chat_completion(messages=[{'role': 'user', 'content': 'waht is the capital of france?'}], 
                                      max_tokens=200, stream=True):
    print(message.choices[0].delta.content)

The
 capital
 of
 France
 is
 Paris
.
 It
 is
 a
 major
 urban
 center
 and
 the
 political
,
 cultural
,
 and
 economic
 center
 of
 France
.

