# **🔗 Connect with an LLM**

## **🤖 Start Talking with ChatGPT**
- **Input**: The prompt we send to the LLM.  
- **Output**: The response from the LLM.  
- We can **switch LLMs** and use several different models.

---

## **💡 LangChain’s Two Main LLM Types**
1. **LLM Model (Text-Completion)**  
   - Produces a block of text based on a single prompt.  
2. **Chat Model**  
   - Converses with a sequence of messages, featuring roles like **system** and **human**.

**🔍 Note**:  
- While documentation sometimes blurs the line, **both** text-completion and chat models are LLMs.  
- Chat models typically handle **system** (context), **human** (user input), and **assistant** (AI response) messages.

---

## **🎉 Rise of the Chat Model**
- Since the launch of ChatGPT, **chat-based LLMs** have become the most popular format.  
- They are used in a wide range of applications, from **Q&A** to **multi-turn dialogues**.

---

## **📜 List of LLMs Compatible with LangChain**
- Refer to the **official documentation** for an updated list of supported LLMs.

---

## **⚙️ Setup**
1. **Clone or Download** the GitHub repository to your local machine.  
2. In **terminal**:
   ```
   cd project_name
   pyenv local 3.11.4
   poetry install
   poetry shell
   ```
3. To open the notebook with Jupyter Notebooks:
   ```
   jupyter lab
   ```
   - Navigate to the **notebooks folder** and open the `001-connect-llm.ipynb` file.

4. **View Code** in Visual Studio Code or another editor:
   - Open the project folder.
   - Locate and open `001-connect-llm.py`.

---

## **🔐 Create Your `.env` File**
- The GitHub repo includes a file named **`.env.example`**.
- Rename it to **`.env`** and add your confidential API keys:
  ```
  OPENAI_API_KEY=your_openai_api_key
  LANGCHAIN_TRACING_V2=true
  LANGCHAIN_ENDPOINT=https://api.smith.langchain.com
  LANGCHAIN_API_KEY=your_langchain_api_key
  LANGCHAIN_PROJECT=your_project_name
  ```
- This project is labeled **`001-connect-llm`** in **LangSmith**.

---

## **📊 Track Operations**
- From now on, you can **monitor usage and costs** for this project in **LangSmith**:
  ```
  smith.langchain.com
  ```



# **New Method**

## Connect with the .env file located in the same directory of this notebook

If you are using the pre-loaded poetry shell, you do not need to install the following package because it is already pre-loaded for you:

In [None]:
#pip install python-dotenv

#### How to install with pip the exact same version listed in the pyproject.toml file

To install a specific version of a package using `pip`, you should specify the version in the install command. If you see the pyproject.toml, you will see that the version listed for python-dotenv is `python-dotenv = "^1.0.1"`. This notation means to install the version `1.0.1` or any minor updates that still maintain compatibility (i.e., `1.x.x`).

For `pip`, which doesn't natively support the caret (`^`) version specifier directly in the command line, you'll want to specify the exact version you need or use comparison operators to define a range. If you only need exactly version `1.0.1`, you can install it like this:

```bash
pip install python-dotenv==1.0.1
```

If you want to emulate the behavior of `^1.0.1` using `pip` (i.e., any version starting from `1.0.1` up to but not including the next major version, `2.0.0`), you could use:

```bash
pip install "python-dotenv>=1.0.1,<2.0.0"
```

This command tells `pip` to install a version of `python-dotenv` that is at least `1.0.1` but less than `2.0.0`. This will allow minor updates that are presumably backward compatible without crossing into a new major version that might introduce breaking changes.

In [None]:
import os
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())
openai_api_key = os.environ["OPENAI_API_KEY"]

#### Install LangChain

If you are using the pre-loaded poetry shell, you do not need to install the following package because it is already pre-loaded for you:

In [None]:
#!pip install langchain

## Connect with an LLM

If you are using the pre-loaded poetry shell, you do not need to install the following package because it is already pre-loaded for you:

In [None]:
#!pip install langchain-openai

* NOTE: Since right now is the best LLM in the market, we will use OpenAI by default. You will see how to connect with other Open Source LLMs like Llama3 or Mistral in a next lesson.

## LLM Model
* The trend before the launch of chatGPT-4.
* See LangChain documentation about LLM Models [here](https://python.langchain.com/v0.1/docs/modules/model_io/llms/).

In [None]:
from langchain_openai import OpenAI

llmModel = OpenAI()

#### Invoke: all the text of the reponse is printed at once.

In [None]:
response = llmModel.invoke(
    "Tell me one fun fact about the Kennedy family."
)

In [None]:
response

"\n\nOne fun fact about the Kennedy family is that they have their own personalized crest, which includes symbols such as a lion, an eagle, and a ship's wheel, representing courage, strength, and leadership."

In [None]:
print(response)



One fun fact about the Kennedy family is that they have their own personalized crest, which includes symbols such as a lion, an eagle, and a ship's wheel, representing courage, strength, and leadership.


# **📌 Notebook: "1-connect-llm" — Introduction & First Code Example**

## **🎯 Overview**
- **Notebook Purpose**: Demonstrate how to connect with an LLM (large language model) using LangChain.
- **Primary Focus**: Show a simple code snippet for invoking a text-completion model (`OpenAI`), passing in a prompt, and receiving a response.

---

## **🔹 Key Points from the Summary**

1. **LLM vs. Chat Models**  
   - **Text-Completion Model**: Typically returns a single text block based on a prompt.  
   - **Chat Model**: Handles multi-turn conversations with system, user, and AI roles.

2. **Most Common Model**:  
   - Since ChatGPT, **chat-based models** dominate in usage.  
   - Nevertheless, text-completion models are still part of the LLM family.

3. **Project Setup**:  
   - **Environment**: Python 3.11.4, Poetry for dependencies.  
   - **.env File**: Stores API keys and LangSmith tracking info.  
   - **LangSmith Integration**: Monitors usage, logs, and costs at [smith.langchain.com](https://smith.langchain.com).

---

## **🔹 Code Snippet**

    from langchain_openai import OpenAI

    llmModel = OpenAI()

    response = llmModel.invoke(
        "Tell me one fun fact about the Kennedy family."
    )

### **1️⃣ `from langchain_openai import OpenAI`**
- **Imports** a LangChain integration class specifically for OpenAI’s text-completion models.
- **Allows** you to configure settings like temperature, max tokens, or model name if desired.

### **2️⃣ `llmModel = OpenAI()`**
- **Instantiates** the `OpenAI` model object with default parameters.
- By default, it expects an **`OPENAI_API_KEY`** in your environment variables (loaded via `.env`).

### **3️⃣ `response = llmModel.invoke(...)`**
- **Sends** the string prompt to the model: `"Tell me one fun fact about the Kennedy family."`
- **Returns** a single text response (e.g., some historical tidbit about the Kennedys).

---

## **📌 Why This Matters**

1. **First Contact with the LLM**  
   - Proves you can **successfully** connect to OpenAI’s text-completion endpoint.
2. **Modular Approach**  
   - `OpenAI()` can be swapped for **other LLM backends** if you choose.
3. **LangSmith Tracking**  
   - If properly configured, you can **monitor** your usage and costs, especially helpful for scaling apps.

---

## **💡 Example Usage**

- Prompt: `"Tell me one fun fact about the Kennedy family."`  
- **Potential Response**: “John F. Kennedy was the first U.S. president to... (interesting fact).”  
- The LLM might discuss trivia about JFK’s early life, the family’s political influence, or other unique tidbits.

---

## **📌 Key Takeaways**

1. **Simple Prompt**: Demonstrates how to send a plain English request and get a straightforward answer.
2. **Flexibility**: Parameters like `temperature=0.7` can be added to control creativity.
3. **Foundation**: Sets the stage for more advanced operations (e.g., chaining prompts, chat-based flows).

> *Think of this as your “Hello World” for LLM integration—a single prompt that confirms everything’s wired up and ready for deeper exploration!*


In [None]:
# Streaming: printing one chunk of text at a time
for chunk in llmModel.stream(
    "Tell me one fun fact about the Kennedy family."
):
    print(chunk, end="", flush=True)



One fun fact about the Kennedy family is that President John F. Kennedy's favorite meal was New England fish chowder. He often requested it for meals at the White House and even had a special recipe created just for him.

# **📌 Step 2: Streaming Model Responses**

## **🎯 Objective**
- **Showcase** how to retrieve an LLM’s response in **real-time** chunks (streaming).
- **Improve** user experience by displaying partial output as it’s generated rather than waiting for the entire answer.

---

## **🔹 Code Snippet**
    # Streaming: printing one chunk of text at a time
    for chunk in llmModel.stream(
        "Tell me one fun fact about the Kennedy family."
    ):
        print(chunk, end="", flush=True)

### **1️⃣ `.stream(...)` Method**
- **Initiates** a real-time connection to the model.
- **Returns** text in incremental chunks, which can be printed as they arrive.

### **2️⃣ `for chunk in llmModel.stream(...)` Loop**
- **Iterates** through each chunk of generated text.
- **Output** is shown piece by piece, making it appear more conversational or responsive.

---

## **📌 Why Stream Responses?**
1. **Faster User Feedback**: Users immediately see partial results.
2. **Interactive Feel**: Simulates a typing effect or “live conversation.”
3. **Long Outputs**: Especially helpful when generating **lengthy** content, so users don’t wait for the entire text to finish.

---

## **📌 Key Takeaways**
1. **Streaming** is an excellent technique for creating a **dynamic** user interface.  
2. The `.stream(...)` method ensures partial data is processed as it becomes available.  
3. This approach is particularly useful for **chat-like** or **real-time** applications.

> *Imagine `.stream()` as eavesdropping on the LLM’s thought process—text arrives bit by bit, giving your users a more immediate and engaging experience.*


In [None]:
# Temperature: more or less creativity
creativeLlmModel = OpenAI(temperature=0.9)
response = llmModel.invoke(
    "Write a short 5 line poem about JFK"
)

In [None]:
print(response)





A leader with grace and charm,
His words inspired and disarmed,
JFK, a symbol of hope,
With a vision beyond the scope,
Forever remembered, his legacy will never depart.


# **📌 Step 3: Controlling Model Creativity with `temperature`**

## **🎯 Objective**
- **Demonstrate** how to adjust the LLM’s **creativity** by setting a `temperature` parameter.
- **Show** how it affects the output by providing a prompt that encourages a more imaginative response.

---

## **🔹 Code Snippet**

    # Temperature: more or less creativity
    creativeLlmModel = OpenAI(temperature=0.9)
    response = llmModel.invoke(
        "Write a short 5 line poem about JFK"
    )

### **1️⃣ `OpenAI(temperature=0.9)`**
- **Creates** a new model instance (`creativeLlmModel`) with a higher temperature.
- **Higher temperature** → More randomness, novelty, and figurative language in the response.

### **2️⃣ `response = llmModel.invoke("...")`**
- **Generates** a text completion for the poem about JFK.
- Note that in the snippet, `llmModel` (not `creativeLlmModel`) is actually invoked. If you want to use the higher creativity model, replace `llmModel` with `creativeLlmModel`.

---

## **📌 Why Adjust Temperature?**
1. **Enhanced Creativity**  
   - Higher temperature can produce more **original** or **artistic** text.  
2. **Predictable Responses**  
   - Lower temperature yields **consistent, factual** answers, better for straightforward tasks.  
3. **Fine-Tuning Output**  
   - Experimenting with temperature helps **balance** creativity vs. reliability, depending on your application.

---

## **📌 Key Takeaways**
1. **Temperature** is crucial in **prompt engineering**: high for expressive tasks, low for accuracy-focused tasks.  
2. The snippet shows **two** LLM objects—`creativeLlmModel` and `llmModel`. If you want the creative output, **invoke** the `creativeLlmModel`.  
3. Poetry or narrative prompts benefit from increased temperature, while Q&A or data extraction usually lean lower.

> *Think of `temperature` like the LLM’s imagination dial—turn it up to let the model dream, dial it down to keep it on-track.*


# Chat Model
* The general trend after the launch of chatGPT-4.
    * Frequently known as "Chatbot".
    * Conversation between Human and AI.
    * Can have a system prompt defining the tone or the role of the AI.
* See LangChain documentation about Chat Models [here](https://python.langchain.com/v0.1/docs/modules/model_io/chat/).
* By default we will work with ChatOpenAI. See [here](https://python.langchain.com/v0.1/docs/integrations/chat/openai/) the LangChain documentation page about it.

In [None]:
from langchain_openai import ChatOpenAI

chatModel = ChatOpenAI(model="gpt-3.5-turbo-0125")

In [None]:
messages = [
    ("system", "You are an historian expert in the Kennedy family."),
    ("human", "Tell me one curious thing about JFK."),
]
response = chatModel.invoke(messages)

# **📌 Step 4: Using a Chat Model (`ChatOpenAI`) for Role-Based Conversations**

## **🎯 Objective**
- **Demonstrate** how to create a conversation using LangChain’s `ChatOpenAI`.
- **Highlight** the system prompt’s role in setting the AI’s persona or expertise.

---

## **🔹 Code Snippet**

    from langchain_openai import ChatOpenAI

    chatModel = ChatOpenAI(model="gpt-3.5-turbo-0125")

    messages = [
        ("system", "You are an historian expert in the Kennedy family."),
        ("human", "Tell me one curious thing about JFK."),
    ]

    response = chatModel.invoke(messages)

---

## **1️⃣ `ChatOpenAI(model="gpt-3.5-turbo-0125")`**
- **Initializes** a chat-based LLM tied to a specific GPT-3.5 variant (`0125` release).
- **Chat context**: Built to handle system, user, and AI roles, reflecting how ChatGPT and similar chat models operate.

### **System Role**
- **Defines** the overarching instructions: “You are an historian expert in the Kennedy family.”
- **Effect**: The model aims to provide historically grounded responses about JFK.

### **Human Role**
- **Represents** the user’s question: “Tell me one curious thing about JFK.”

---

## **2️⃣ `chatModel.invoke(messages)`**
- **Passes** the role-based message list to the chat model.
- **Returns** a single message from the AI role, typically containing the answer or continuation of the conversation.

---

## **📌 Key Differences from Text-Completion Models**
1. **Role-Based**: Chat models expect input broken down by **system**, **user**, and possibly **assistant** messages.
2. **Context Preservation**: Each message in `messages` helps the model maintain a conversation flow.
3. **Flexible Interaction**: You can keep appending new user messages to the same conversation for multi-turn dialogues.

---

## **📌 Key Takeaways**
1. **System Prompts** shape the AI’s persona and domain knowledge.  
2. **Chat Models** are suited for multi-turn conversations, often with a more **conversational** or **contextual** tone.  
3. **Model Version** (`gpt-3.5-turbo-0125`) ensures consistent output, as different releases may have subtle variations in behavior.

> *View `ChatOpenAI` as a conversation manager—each role-based message adds context, guiding the model’s next reply.*


In [None]:
response

AIMessage(content="One curious thing about JFK is that he was a collector of unique and eclectic items. He had a fascination with history and art, and his collection included everything from ship models and scrimshaw to original manuscripts and rare books. JFK's love of collecting even extended to quirky items like coconut shells carved with faces that he displayed in the Oval Office. This hobby provided a glimpse into his personal interests and served as a reflection of his intellectual curiosity.", response_metadata={'token_usage': {'completion_tokens': 87, 'prompt_tokens': 29, 'total_tokens': 116}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-b2f90721-52fe-4482-989a-bc75d6f3e7fd-0', usage_metadata={'input_tokens': 29, 'output_tokens': 87, 'total_tokens': 116})

In [None]:
response.content

"One curious thing about JFK is that he was a collector of unique and eclectic items. He had a fascination with history and art, and his collection included everything from ship models and scrimshaw to original manuscripts and rare books. JFK's love of collecting even extended to quirky items like coconut shells carved with faces that he displayed in the Oval Office. This hobby provided a glimpse into his personal interests and served as a reflection of his intellectual curiosity."

In [None]:
response.response_metadata

{'token_usage': {'completion_tokens': 87,
  'prompt_tokens': 29,
  'total_tokens': 116},
 'model_name': 'gpt-3.5-turbo-0125',
 'system_fingerprint': None,
 'finish_reason': 'stop',
 'logprobs': None}

# **📌 Step 5: Inspecting Response Metadata**

## **🎯 Objective**
- **Examine** additional information returned by the LLM after it processes a request.
- **Understand** how token usage and model details are encapsulated in `response.response_metadata`.

---

## **🔹 Example Metadata**

    {
      'token_usage': {
        'completion_tokens': 87,
        'prompt_tokens': 29,
        'total_tokens': 116
      },
      'model_name': 'gpt-3.5-turbo-0125',
      'system_fingerprint': None,
      'finish_reason': 'stop',
      'logprobs': None
    }

### **1️⃣ `token_usage`**
- **`completion_tokens`**: Number of tokens the model used to generate its response.  
- **`prompt_tokens`**: Number of tokens used in the input prompt (including system, user messages, etc.).  
- **`total_tokens`**: The sum of both prompt and completion tokens. Useful for **tracking usage** and potential costs.

### **2️⃣ `model_name`**
- Indicates the **specific model** that handled the request.  
- For example, `"gpt-3.5-turbo-0125"` confirms which snapshot of GPT-3.5 was used.

### **3️⃣ `system_fingerprint`**
- **Nullable** field that might store a **hash** or **identifier** for system messages or instructions.  
- In this example, it’s `None`.

### **4️⃣ `finish_reason`**
- **Shows** why the model stopped its generation. Common reasons:
  - `stop`: The model reached a stopping point naturally.  
  - `length`: The model hit a token limit.  
  - `max_tokens`: The response ran out of allowable tokens.

### **5️⃣ `logprobs`**
- **Log probabilities** of the generated tokens—helpful for advanced analysis or debugging.  
- Often returned as `None` by default, unless specifically requested via certain parameters.

---

## **📌 Key Takeaways**
1. **Token Usage** is crucial for monitoring **API costs** and optimizing prompts to stay within usage limits.  
2. **Model Name** ensures you know exactly which version or snapshot of GPT processed your request.  
3. Additional fields like **`finish_reason`** provide insight into how and why the model concluded its response.

> *Think of `response_metadata` as the LLM’s receipt—everything you need to know about the transaction is right here, from the tokens consumed to the model’s finishing reasons.*


In [None]:
response.schema()

{'title': 'AIMessage',
 'description': 'Message from an AI.\n\nAIMessage is returned from a chat model as a response to a prompt.\n\nThis message represents the output of the model and consists of both\nthe raw output as returned by the model together standardized fields\n(e.g., tool calls, usage metadata) added by the LangChain framework.',
 'type': 'object',
 'properties': {'content': {'title': 'Content',
   'anyOf': [{'type': 'string'},
    {'type': 'array',
     'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
  'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
  'response_metadata': {'title': 'Response Metadata', 'type': 'object'},
  'type': {'title': 'Type', 'default': 'ai', 'enum': ['ai'], 'type': 'string'},
  'name': {'title': 'Name', 'type': 'string'},
  'id': {'title': 'Id', 'type': 'string'},
  'example': {'title': 'Example', 'default': False, 'type': 'boolean'},
  'tool_calls': {'title': 'Tool Calls',
   'default': [],
   'type': 'array'

# **📌 Step 6: Exploring the Schema of an AIMessage**

## **🎯 Objective**
- **Understand** the JSON schema returned by the `response.schema()` call.
- **Identify** how LangChain structures AI messages, including content, metadata, and other optional fields.

---

## **1️⃣ Core Properties**

- **`content`**  
  - Main text from the AI. Can be a **string** or a **list** (e.g., chunks or structured data).
- **`additional_kwargs`**  
  - Freeform dictionary for extra parameters or context not explicitly captured in the main schema.
- **`response_metadata`**  
  - Model name, token usage, or other metadata related to the response.
- **`tool_calls` / `invalid_tool_calls`**  
  - Specifies any tools the AI invoked (or tried to invoke incorrectly).
- **`usage_metadata`**  
  - Contains input/output/total token counts.

---

## **2️⃣ Usage & Importance**

1. **Validation**  
   - Ensures any AI message meets the required structure (`content` is mandatory).
2. **Consistency**  
   - Standardizes how messages, metadata, and tool interactions are stored across LangChain.
3. **Debugging & Analysis**  
   - Fields like `usage_metadata` and `response_metadata` help **monitor** costs and understand why a response was generated (e.g., was a tool call attempted?).

---

## **3️⃣ Applications**

- **Tool Integration**  
  - AI messages that include `tool_calls` might pass commands to external tools (e.g., web searches, calculators).
- **Conversation Flows**  
  - Tracking `invalid_tool_calls` helps refine prompts if the AI repeatedly attempts calls that fail.
- **Usage Tracking**  
  - `usage_metadata` is critical for monitoring token consumption, especially for cost management in production environments.

---

## **📌 Key Takeaways**
1. **`AIMessage` Schema** outlines how LangChain **structures** AI responses, including content, metadata, and tool interactions.  
2. **Required Fields**: At minimum, every AI message must include `content`.  
3. **Debug Tools**: The schema reveals fields for usage stats (`usage_metadata`), which tie into cost and performance analytics.

> *Think of `response.schema()` as a blueprint, ensuring every AI-generated message has the same building blocks—making your app more robust and easier to maintain.*


# Classic Method (Connect LLM)


In [None]:
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.prompts import ChatPromptTemplate

In [None]:
messages = [
    SystemMessage(content="You are an historian expert on the Kennedy Family."),
    HumanMessage(content="How many children had Joseph P. Kennedy?"),
]

response = chatModel.invoke(messages)

In [None]:
response

AIMessage(content='Joseph P. Kennedy and his wife Rose Fitzgerald Kennedy had nine children. Their children were Joseph Jr., John F. (JFK), Rosemary, Kathleen, Eunice, Patricia, Robert F. (Bobby), Jean, and Edward M. (Ted).', response_metadata={'token_usage': {'completion_tokens': 54, 'prompt_tokens': 30, 'total_tokens': 84}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-63964416-759f-461a-aecb-789b336ce0c1-0', usage_metadata={'input_tokens': 30, 'output_tokens': 54, 'total_tokens': 84})

#### Streaming:

In [None]:
for chunk in chatModel.stream(messages):
    print(chunk.content, end="", flush=True)

Joseph P. Kennedy and his wife Rose Kennedy had nine children: Joseph Jr., John F. (JFK), Rosemary, Kathleen, Eunice, Patricia, Robert (Bobby), Jean, and Edward (Ted).

#### Another old way, similar results:

In [None]:
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are expert {profession} in {topic}.",
        ),
        ("human", "{input}"),
    ]
)

chain = prompt | chatModel

response = chain.invoke(
    {
        "profession": "Historian",
        "topic": "Kennedy Family",
        "input": "Tell me one fun fact about JFK.",
    }
)

In [None]:
response

AIMessage(content='One fun fact about JFK is that he was the first president to hold a press conference that was broadcast live on television. This event took place on January 25, 1961, and it allowed the American public to see and hear their president in real-time, marking a significant shift in how political communication was conducted.', response_metadata={'token_usage': {'completion_tokens': 64, 'prompt_tokens': 28, 'total_tokens': 92}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-3e6f4f2c-a7fe-4220-925e-1e3a6954cc57-0', usage_metadata={'input_tokens': 28, 'output_tokens': 64, 'total_tokens': 92})

## How to execute the code from Visual Studio Code
* In Visual Studio Code, see the file 001-connect-llms.py
* In terminal, make sure you are in the directory of the file and run:
    * python 001-connect-llm.py