# Introduction to LLM APIs: OpenAI and Ollama

## Overview
In this notebook, we will explore how to interact with Large Language Models (LLMs) using Python.
We will cover two main approaches:

1.  **Proprietary Models (OpenAI)**: Using the OpenAI API to access powerful models like GPT-4o.
2.  **Open Source Models (Ollama)**: Running local LLMs (like Llama 3 or Phi-4) on your own machine (or in this case, the Google Colab environment).

By the end of this session, you will understand how to send prompts to these models and receive responses programmatically.


## 1. OpenAI Python API

First, we need to install the official OpenAI Python SDK. This library simplifies making requests to OpenAI's servers.


In [None]:
!pip install -Uq openai


### API Key Setup

To use OpenAI, you need an API key. In Google Colab, it is best practice to store your keys in the `Secrets` manager (the key icon on the left sidebar).

1.  Click the **key icon** on the left.
2.  Add a new secret named `OPENAI_API_KEY` with your actual key value.
3.  Toggle 'Notebook access' to on.

The code below retrieves this key securely.


**Run the following cell if you are running this from inside Google Colab**

In [None]:
import openai
from google.colab import userdata

try:
    api_key = userdata.get('OPENAI_API_KEY')
except Exception as e:
    print("Error retrieving API key. Make sure you set 'OPENAI_API_KEY' in Colab Secrets.")
    api_key = None


**Run the following if you are using Colab from within VSCode or just using VSCode**

In [None]:
# import os
# import openai
# from dotenv import load_dotenv
# load_dotenv()

# api_key = os.getenv("OPENAI_API_KEY")

# if api_key is None:
#     raise ValueError("OPENAI_API_KEY not found in environment variables")


### Basic Completion

Let's make our first call to the API. We will use the `chat.completions.create` method.
This method requires:
-   `model`: The specific model ID (e.g., `gpt-4o-mini`).
-   `messages`: A list of message objects, where each object has a `role` (system, user, assistant) and `content`.


In [None]:
from openai import OpenAI

client = OpenAI(api_key=api_key)

completion = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Explain what a Large Language Model is in one sentence."}
        ]
    )

response = completion.choices[0].message.content
print(response)


A Large Language Model is an advanced artificial intelligence system designed to understand, generate, and manipulate human language by utilizing vast amounts of text data and sophisticated algorithms to predict and produce coherent and contextually relevant language outputs.


In [None]:
# using temperature, max_tokens

client = OpenAI(api_key=api_key)

completion = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Explain what a Large Language Model is in depth."}
        ],
        temperature=0.7,
        max_tokens=500,
        )

response = completion.choices[0].message.content
print(response)


A Large Language Model (LLM) is a type of artificial intelligence designed to understand, generate, and manipulate human language. These models are built using deep learning techniques, particularly neural networks, and trained on vast amounts of text data. Here‚Äôs a deep dive into the components, workings, and implications of LLMs:

### 1. **Architecture**
LLMs are typically based on the transformer architecture, which was introduced in the paper "Attention is All You Need" by Vaswani et al. in 2017. The key components of this architecture include:

- **Attention Mechanism**: This allows the model to weigh the importance of different words in a sentence when processing language. Instead of processing words sequentially, the attention mechanism enables the model to consider all words simultaneously, improving context understanding.

- **Self-Attention**: Within the transformer, self-attention computes representations of the input by relating different positions of the input sequence t

In [None]:
# improving output format using Markdown
from IPython.display import display, Markdown

Markdown(response)

A Large Language Model (LLM) is a type of artificial intelligence designed to understand, generate, and manipulate human language. These models are built using deep learning techniques, particularly neural networks, and trained on vast amounts of text data. Here‚Äôs a deep dive into the components, workings, and implications of LLMs:

### 1. **Architecture**
LLMs are typically based on the transformer architecture, which was introduced in the paper "Attention is All You Need" by Vaswani et al. in 2017. The key components of this architecture include:

- **Attention Mechanism**: This allows the model to weigh the importance of different words in a sentence when processing language. Instead of processing words sequentially, the attention mechanism enables the model to consider all words simultaneously, improving context understanding.

- **Self-Attention**: Within the transformer, self-attention computes representations of the input by relating different positions of the input sequence to each other. This is crucial for capturing context and relationships in language.

- **Multi-Head Attention**: The model uses multiple attention heads to capture different types of relationships and contexts simultaneously. Each head can focus on different parts of the input, enriching the model's understanding.

- **Feedforward Neural Networks**: After the attention layers, the output is passed through feedforward neural networks, which apply additional transformations to the data.

- **Positional Encoding**: Since transformers do not process data in sequence, positional encodings are added to the input embeddings to give the model information about the order of words.

### 2. **Training Process**
LLMs are trained using a method called unsupervised learning, where they learn from large datasets of text without explicit labels. The training process typically involves:

- **Data Collection**: LLMs are trained on diverse and extensive datasets, which can include books, articles, websites, and other text sources.

- **Tokenization**: The text is broken down into smaller units called tokens (words, subwords, or characters) which the model uses for processing.

- **Objective Function**: Most LLMs are trained to predict the next token in a sequence given the previous tokens. This is known as language modeling and is typically framed as minimizing the cross-entropy loss between the predicted and actual next tokens.

- **Fine-tuning**: After pre-training on a broad corpus, LLMs can be fine-tuned on specific tasks (like question answering or summarization) using

### Streaming Responses

LLMs generate text token by token. Instead of waiting for the full response, we can 'stream' the output so it appears as it is being written. This creates a better user experience.


In [None]:
stream = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "user", "content": "Explain how a neural network works to a 5-year-old."}
        ],
        stream=True,  # Enable streaming
    )

print("Streaming response:")
for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)



Streaming response:
Okay, imagine you have a toy robot that wants to learn how to recognize different animals. 

First, we give the robot lots and lots of pictures of animals, like cats, dogs, and rabbits. Each picture is like a puzzle piece. The robot has a special brain called a "neural network," which is made up of tiny helpers, kind of like a team of little friends.

When the robot looks at a picture, each little helper takes a tiny look at just a part of the picture. Some helpers look at colors, some look at shapes, and others look at where things are in the picture.

The helpers then talk to each other and pass their thoughts along, trying to guess what animal it is. If they get it right, they cheer! If they get it wrong, they remember what they learned and try to do better next time.

Over and over, the robot looks at new pictures, and with each picture, the little helpers get better at guessing! So, after a lot of practice, the robot becomes really good at recognizing animals, 

In [None]:
from IPython.display import display, Markdown, clear_output

stream = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "user", "content": "Explain what are large language models."}
        ],
        stream=True,  # Enable streaming
    )

print("Streaming markdown response:")
full_response_content = ""
display_handle = display(Markdown(""), display_id=True)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        full_response_content += content
        display_handle.update(Markdown(full_response_content))


Streaming markdown response:


Large language models (LLMs) are a type of artificial intelligence model designed to understand, generate, and manipulate human language. These models are built using deep learning techniques, particularly neural networks, and are trained on vast amounts of text data to learn the patterns, structure, and nuances of language. Here are some key characteristics and features of LLMs:

1. **Scale**: LLMs are typically characterized by their size, which is often measured in terms of the number of parameters they have. Parameters are the internal variables that the model learns during training. A larger number of parameters generally allows the model to capture more complex patterns and relationships in language.

2. **Training Data**: LLMs are trained on diverse datasets that include books, articles, websites, and other written content. The training process involves the model learning to predict the next word in a sentence given the preceding words, enabling it to pick up grammar, facts, and some contextual understanding.

3. **Versatility**: Once trained, LLMs can perform a wide range of natural language processing (NLP) tasks, including text generation, translation, summarization, question answering, and sentiment analysis, among others. Their generalization capabilities allow them to handle a variety of topics and writing styles.

4. **Contextual Understanding**: LLMs are capable of understanding context to a significant extent, which allows them to generate responses that are coherent and contextually appropriate. This is often achieved through mechanisms like attention, which helps the model focus on relevant parts of the input text while generating output.

5. **Zero-shot and Few-shot Learning**: Some LLMs can perform tasks with little or no task-specific training. This means they can generalize their knowledge to apply to new tasks, making them flexible and powerful for various applications without extensive retraining.

6. **Challenges**: Despite their capabilities, LLMs have limitations. They can produce biased or factually incorrect outputs, lack true understanding or common sense reasoning, and may generate nonsensical or irrelevant text. Additionally, issues related to ethical use, data privacy, and misinformation are ongoing concerns in the deployment of LLMs.

7. **Examples**: Some well-known large language models include OpenAI's GPT-3 and GPT-4, Google's BERT, and Facebook's LLaMA. Each of these models has its unique architecture and training framework but shares the common goal of advancing the understanding and generation of human language.

In summary, large language models represent a significant advancement in natural language processing and artificial intelligence, enabling more sophisticated interactions between humans and machines through language understanding and generation. They continue to evolve, driving research and development in both academia and industry.

## 2. Ollama (Local LLMs)

[Ollama](https://ollama.com/) is a tool that allows you to run open-source LLMs locally. It simplifies the process of downloading and managing models like Llama 3, Mistral, and Gemma.

### Setting up Ollama in Colab
Since Google Colab is a virtual environment, we need to:
1.  Install Ollama.
2.  Start the Ollama server in the background.
3.  Pull (download) the model we want to use.


In [None]:
# 1. Install Ollama
!curl -fsSL https://ollama.com/install.sh | sh


In [None]:
# 2. Start Ollama server in the background using nohup
!nohup ollama serve > ollama.log 2>&1 &

# Wait a few seconds for the server to spin up
import time
time.sleep(5)
print("Ollama server started.")


In [None]:
# 3. Pull a lightweight model (Llama 3.2 is great for Colab)
!ollama pull qwen3:4b


### Using Ollama Python Library

Just like OpenAI, Ollama has a Python library to interact with the models running on the local server.


In [None]:
# Install the Ollama python client
!uv pip install -q ollama


In [None]:
import ollama

response = ollama.chat(model='qwen3:4b', messages=[
  {
    'role': 'user',
    'content': 'What are the main components of a RAG application?'
  },
])
print(response['message']['content'])


Here are the **core components of a RAG (Retrieval-Augmented Generation) application**, explained clearly and concisely for both technical and non-technical audiences. I'll focus on *what* each component does and *why* it matters‚Äîavoiding excessive jargon while ensuring practical relevance.

---

### üß† 1. **Data Ingestion & Storage**
- **What it does**: Collects, stores, and organizes raw data (e.g., PDFs, websites, databases, text files) into a structured format.
- **Why it matters**: Without this, there's no data to retrieve. Real-world examples:  
  ‚Üí *Customer support tickets* (for troubleshooting)  
  ‚Üí *Internal knowledge bases* (for HR policies)  
  ‚Üí *Research papers* (for academic answers)
- **Key tools**: S3 buckets, databases (e.g., PostgreSQL), or cloud storage.

---

### üîç 2. **Document Preprocessing**
- **What it does**: Cleans, splits, and transforms raw documents into a format suitable for retrieval (e.g., removing noise, splitting into chunks, standardizi

Notice the output from OpenAI and Ollama is usually in Markdown format. You can request the output to be in plain text by setting the `response_format` parameter to `text`.

In [None]:
# you can use Markdown to make it look nicer in the notebook
Markdown(response['message']['content'])

Here are the **core components of a RAG (Retrieval-Augmented Generation) application**, explained clearly and concisely for both technical and non-technical audiences. I'll focus on *what* each component does and *why* it matters‚Äîavoiding excessive jargon while ensuring practical relevance.

---

### üß† 1. **Data Ingestion & Storage**
- **What it does**: Collects, stores, and organizes raw data (e.g., PDFs, websites, databases, text files) into a structured format.
- **Why it matters**: Without this, there's no data to retrieve. Real-world examples:  
  ‚Üí *Customer support tickets* (for troubleshooting)  
  ‚Üí *Internal knowledge bases* (for HR policies)  
  ‚Üí *Research papers* (for academic answers)
- **Key tools**: S3 buckets, databases (e.g., PostgreSQL), or cloud storage.

---

### üîç 2. **Document Preprocessing**
- **What it does**: Cleans, splits, and transforms raw documents into a format suitable for retrieval (e.g., removing noise, splitting into chunks, standardizing text).
- **Why it matters**: Prevents irrelevant results. Example:  
  ‚Üí Splitting a 50-page PDF into 1000-character chunks so the model can find *exact* answers.  
  ‚Üí Removing tables, images, or irrelevant sections from web pages.
- **Key tools**: Python libraries like `PyPDF2`, `BeautifulSoup`, or `LangChain`'s chunkers.

---

### üì¶ 3. **Indexing (Vector Database)**
- **What it does**: Converts processed documents into **vectors** (numerical representations) and stores them in a fast-searchable database.
- **Why it matters**: This is the *heart* of RAG. Without it, you can't quickly find relevant documents.  
  ‚Üí *Example*: When a user asks "How do I reset my password?", the system converts this query into a vector and searches the vector database for documents with similar vectors (e.g., "password reset instructions").
- **Key tools**: FAISS, Pinecone, Weaviate, or ChromaDB.  
  ‚Üí *Critical note*: This is where **embedding models** (e.g., `all-MiniLM-L6-v2`) live.

---

### üïµÔ∏è 4. **Retrieval System**
- **What it does**: Takes a user query ‚Üí converts it into a vector ‚Üí searches the index ‚Üí returns *top-k* relevant documents.
- **Why it matters**: This is where RAG *augments* the LLM. Without good retrieval, the LLM gets "hallucinated" answers (made up facts).  
  ‚Üí *Example*: If the query is "What is the capital of France?", the system retrieves the document "France: Paris is the capital" (not a random document about "France" in a history book).
- **Key metrics**: Precision (how many relevant docs?), Recall (how many relevant docs are found?).

---

### üß© 5. **Query Processing Pipeline**
- **What it does**: Prepares the user query for retrieval (e.g., adding context, handling typos, language translation).
- **Why it matters**: Ensures the query vector aligns with the index.  
  ‚Üí *Example*: If a user types "reset pass", the system might correct it to "reset password" before retrieval to avoid mismatched results.

---

### ü§ñ 6. **Generation Module (LLM)**
- **What it does**: Takes the retrieved documents + user query ‚Üí generates a **natural-language response**.
- **Why it matters**: This is where the *augmentation* happens. The LLM uses retrieved info to answer *accurately* (e.g., "Your password reset link is: `https://...`").  
  ‚Üí *Without RAG*: The LLM might hallucinate ("Password reset links expire in 24 hours...") because it has no recent data.  
  ‚Üí *With RAG*: The LLM uses the retrieved document to say "Your link expires in 1 hour" (if the doc states this).
- **Key tools**: LLMs like `GPT-4`, `Llama 3`, or `Claude`.

---

### üìä 7. **Evaluation & Feedback Loop**
- **What it does**: Tests the RAG system's performance (e.g., accuracy, latency) and uses user feedback to improve.
- **Why it matters**: RAG isn't a "one-off" solution. Real-world systems need constant tuning.  
  ‚Üí *Example*: If users say "This answer is wrong", the system updates the index or retrieval model.
- **Key metrics**:  
  - **Precision**: % of retrieved docs that are relevant  
  - **Relevance score**: How well the response matches the query  
  - **Latency**: Time from query to response (critical for apps)

---

### üí° Why This Matters in Practice
RAG solves a **real problem**: LLMs (like GPT) often hallucinate answers because they lack context. By *retrieving relevant documents first*, RAG:  
‚úÖ **Reduces hallucinations** (uses actual data)  
‚úÖ **Improves accuracy** (answers are grounded in your knowledge base)  
‚úÖ **Scales better** (works with large datasets without retraining the LLM)

> üåü **Real-world example**: A bank‚Äôs chatbot uses RAG to:  
> 1. Ingest customer agreements (PDFs) ‚Üí  
> 2. Retrieve clauses when a user asks "Can I withdraw money?" ‚Üí  
> 3. Generate a precise answer: *"Yes, but your account must be in good standing."* (from the retrieved document).

---

### ‚ö†Ô∏è Key Pitfalls to Avoid
| Component          | Common Mistake                          | Fix                                  |
|---------------------|------------------------------------------|---------------------------------------|
| **Indexing**        | Using too large chunks ‚Üí slow retrieval  | Split docs into 500-1000 chars       |
| **Retrieval**       | Returning irrelevant docs (low precision)| Tune `k` (top results) and embedding model |
| **Generation**      | Over-reliance on retrieved docs         | Add fallback: "I don't have this info" |

---

### Summary Table
| **Component**              | **Purpose**                                  | **Real-World Analogy**              |
|----------------------------|----------------------------------------------|-------------------------------------|
| Data Ingestion              | Collect raw data                             | Library shelves                    |
| Preprocessing               | Clean & split documents                      | Sorting books into chapters        |
| Indexing (Vector DB)        | Store vectors for fast search                | Index in a physical library        |
| Retrieval System            | Find relevant docs for the query             | Librarian finding the right book   |
| Generation Module (LLM)     | Create human-like answers with context        | Author writing a story using notes |
| Evaluation Loop             | Measure accuracy & improve                   | Teacher grading & revising essays  |

---

### Final Thought
**RAG isn't just "an LLM + a database"**‚Äîit's a *pipeline* where **retrieval** and **generation** work together to make answers **accurate, contextual, and trustworthy**. Start with **data ingestion ‚Üí preprocessing ‚Üí indexing** (the "foundation"), then build retrieval and generation on top. 

For beginners: **Use LangChain** (open-source framework) to implement RAG quickly. It handles most components out-of-the-box.

Let me know if you'd like a **step-by-step tutorial** or **code example** for one of these components! üòä

## 3. OpenAI Compatibility

One of the coolest features of Ollama is that it is **OpenAI-compatible**.
This means you can use the `openai` python client to talk to your local Ollama models! You just need to change the `base_url` to point to your local server.

**Why is this useful?**
It allows you to switch between expensive proprietary models (OpenAI) and free local models (Ollama) without rewriting your entire application logic.


In [None]:
from openai import OpenAI

# Point the OpenAI client to the local Ollama server
client = OpenAI(
    base_url='http://localhost:11434/v1',
    api_key='ollama', # Required, but ignored by Ollama
)

response = client.chat.completions.create(
    model="qwen3:4b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What are the main components of a RAG application?"}
    ]
)

response = response.choices[0].message.content
print(response)


The **main components of a RAG (Retrieval-Augmented Generation) application** work together to enable AI models to generate **accurate, context-aware responses** by first retrieving relevant information from a *knowledge base* before generating text. Below is a clear breakdown of the core components, their roles, and how they interconnect:

---

### üîë 1. **Knowledge Base (KB)**
   - **What it is**: A structured repository of source documents (e.g., PDFs, articles, databases, web pages).
   - **Purpose**: Stores factual information that the RAG system uses for retrieval.
   - **Examples**: 
     - Enterprise documents (internal reports, contracts)
     - External knowledge (scientific papers, news articles)
     - Structured datasets (databases, CSV files).

---

### üß† 2. **Embedding Model**
   - **What it is**: A model that converts text (documents, queries) into **numerical vectors** (embeddings).
   - **Purpose**: Enables semantic similarity comparisons between queries and docu

In [None]:
# you can also use streaming as well:

from IPython.display import display, Markdown, clear_output
client = OpenAI(
    base_url='http://localhost:11434/v1',
    api_key='ollama', # Required, but ignored by Ollama
)
stream = client.chat.completions.create(
        model="qwen3:4b",
        messages=[
            {"role": "user", "content": "Explain what are large language models."}
        ],
        stream=True,  # Enable streaming
    )

print("Streaming markdown response:")
full_response_content = ""
display_handle = display(Markdown(""), display_id=True)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        full_response_content += content
        display_handle.update(Markdown(full_response_content))


Streaming markdown response:


Large language models (LLMs) are **a type of artificial intelligence (AI) system designed to understand, generate, and work with human language**. They‚Äôre trained on massive amounts of text data (like books, articles, code, and web content) to predict patterns in language and produce coherent, contextually relevant responses. Here‚Äôs a breakdown in simple terms:

---

### üîë **Core Idea: What They Do**
Think of LLMs as **extremely powerful "language pattern detectives."** They:
1. **Learn language patterns**: By analyzing vast text data, they understand how words relate to each other (e.g., "cat" ‚Üí "meow," "run" ‚Üí "fast").
2. **Predict the next word**: When given a partial sentence (e.g., "The sky is..."), they guess the most likely next word(s).
3. **Generate new text**: Using this pattern-matching, they create new sentences, stories, code, emails, or answers to questions ‚Äî all while maintaining natural flow.

> ‚úÖ **Real-world example**:  
> *Input*: "Explain quantum computing in simple terms."  
> *LLM response*: "Quantum computing uses the principles of quantum mechanics... [and so on]"

---

### üìè **Why "Large"? The Key Size Matters**
- **Scale = Power**: LLMs are called "large" because they have **billions to trillions of parameters** (values the model adjusts during training).  
  - *Why this matters*: More parameters = better pattern recognition = more accurate language generation.  
  - *Example*: GPT-3.5 has ~175 billion parameters; ChatGPT-4 has ~1 trillion+.
- **Training data**: They‚Äôre trained on **huge datasets** (e.g., trillions of words from the internet, books, code repositories).  
  - *Important*: The model **doesn‚Äôt "understand" language** ‚Äî it statistically predicts patterns based on what it learned.

---

### üß† **How They Work (Simplified)**
1. **Input**: A user asks a question or provides text (e.g., "How do I bake a cake?").  
2. **Processing**: The model scans the input, identifies context, and predicts the most probable next words.  
3. **Output**: Generates a response (e.g., step-by-step cake instructions).  
*Under the hood*: They use a **neural network architecture** (specifically, **transformers** ‚Äî which excel at handling sequence data like text).

---

### üí° **What LLMs Can Do (Practical Uses)**
| **Task**                     | **Example**                                      |
|------------------------------|--------------------------------------------------|
| Answer questions              | "What's the capital of France?" ‚Üí "Paris"         |
| Write stories, emails, scripts | Draft a sci-fi story or professional email       |
| Code generation               | Write Python code from a description             |
| Translate languages           | Translate Spanish to English                     |
| Summarize text                | Turn a 10-page report into a 1-page summary      |
| Debugging/fixing errors       | Help spot mistakes in code or logic              |

---

### ‚ö†Ô∏è **Key Limitations (What They *Can't* Do)**
- **Don‚Äôt "understand" meaning**: They mimic language patterns but don‚Äôt grasp concepts (e.g., they won‚Äôt explain *why* gravity works).  
- **Not creative**: They generate based on patterns, not true innovation (e.g., they won‚Äôt invent new ideas).  
- **Bias risks**: If training data has biases (e.g., gender, race), LLMs can amplify them.  
- **Hallucinations**: They may invent facts not in their training data (e.g., "In 2023, the moon landed on Earth").  
- **Security**: They can be tricked with prompts ("What's the best way to hack?") ‚Üí **Never use LLMs for malicious purposes**.

---

### üåü **Why Are LLMs So Important?**
- **Democratize AI**: Tools like ChatGPT make powerful AI accessible to non-experts.  
- **Transform industries**: Used for coding (GitHub Copilot), healthcare (medical reports), education, and more.  
- **Pushing AI boundaries**: LLMs are the most advanced "language-focused" AI systems built so far ‚Äî but they‚Äôre still evolving.

---

### üíé **In a Nutshell**
> **Large language models (LLMs) are hyper-advanced AI systems trained on massive text data to predict and generate human-like language. They‚Äôre not "conscious" but excel at tasks like answering questions, writing, coding, and translating ‚Äî by spotting patterns in language, not understanding meaning. Their size (billions of parameters) enables high performance but also comes with limitations like bias and hallucinations.**

They‚Äôre a powerful tool for real-world applications today ‚Äî **but they‚Äôre not sentient, and they shouldn‚Äôt replace human judgment or creativity**.

If you'd like to dive deeper into *how* they work, *specific examples* (like ChatGPT vs. Gemini), or *how to use them safely*, just say the word! üòä

## 3. Activity: Build an AI Email Assistant

**Objective**: Expand the functionality of a basic AI email generator.

**The Scenario**: You have a simple script that drafts professional emails. Your task is to customize it to handle more specific details like tone and dates.


In [None]:
from openai import OpenAI

# 1. Setup the Client
# Ensure you have the model pulled: !ollama pull qwen3:4b
client = OpenAI(
    base_url='http://localhost:11434/v1',
    api_key='ollama',
)

def generate_email(subject, recipient_name, additional_info):
    """
    Generates an email based on the provided inputs.
    """
    # 1. Construct the prompt
    prompt = f"Write a professional email to {recipient_name} with the subject '{subject}'. Include the following information: {additional_info}"

    # 2. Call the API
    response = client.chat.completions.create(
        model="qwen3:4b",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt}
        ]
    )

    return response.choices[0].message.content

# 3. Test the Function with User Input
print("--- Email Assistant ---")
entry_recipient = input("Enter recipient name: ")
entry_subject = input("Enter email subject: ")
entry_info = input("Enter key points/details: ")

print("\nGenerating email...\n")
print(generate_email(entry_subject, entry_recipient, entry_info))

--- Email Assistant ---

Generating email...

Subject: I need a vacation  

Dear Sara Smith,  

I hope this message finds you well.  

I am writing to formally request a short vacation period as I have been feeling increasingly tired and need time to unwind. After several months of consistent work demands, I believe a brief break will help me recharge and return to our projects with greater focus and energy.  

I am flexible with the timing and would appreciate your guidance on the best window to accommodate this request without disrupting team workflows. Please let me know what dates would work for you, and I‚Äôm happy to align my schedule accordingly.  

Thank you for your understanding and support‚ÄîI truly appreciate your assistance in helping me find a solution that benefits both my well-being and our team‚Äôs success.  

Best regards,  
[Your Name]


### üü¢ Challenges

Now that you have a functioning template, modify the code above to solve the following challenges:

**1. Add a `tone` argument**
Customize the style of the email.
*   Modify the testing block to ask the user for a `tone` (e.g., "urgent", "enthusiastic").
*   Pass this argument to `generate_email` and update the prompt.

**2. Add `start_date` and `duration`**
Imagine this is a "Request for Leave" email generator.
*   Add inputs for `start_date` and `duration`.
*   Update the prompt to ensure these details are included clearly in the email.

**3. Dynamic System Prompt**
Currently, the system prompt is static ("You are a helpful assistant.").
Change the system prompt to: *"You are a professional executive assistant who negotiates schedules effectively."* or allow it to be passed as an argument.
