<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/016_quasi_agent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


### 🤖 Why This Is a **Quasi-Agent**

A **quasi-agent** behaves *like* an agent in some ways — it performs a task through a sequence of LLM calls — but lacks the **core traits of a true autonomous agent**:

| Trait                          | Quasi-Agent (Your Project)                            | Full Agent                                                     |
| ------------------------------ | ----------------------------------------------------- | -------------------------------------------------------------- |
| ✅ Multi-step logic             | Yes, follows a 3-step code generation process         | Yes                                                            |
| ❌ Decision-making              | No — fixed prompts and steps                          | Yes — can decide what to do next                               |
| ❌ Memory or context management | Minimal — just passes previous output                 | Yes — tracks goals, tasks, and failures                        |
| ❌ Reactivity                   | No — doesn’t adapt if LLM output is bad or incomplete | Yes — can retry, validate, branch                              |
| ❌ Tools / environment actions  | No — doesn’t use APIs, run code, or call functions    | Yes — can take actions (e.g., call APIs, run code, browse web) |

---

### 🔍 Concrete Example

Let’s say the LLM generates bad test code. A **quasi-agent** will still save it and move on. A **full agent** might:

* Notice the code has syntax errors
* Rerun the generation step with a clarification
* Try an alternate method or model
* Ask the user to clarify
* Retry until success

Your current project doesn’t have that loop — it just chains static prompts.

---

### 🧠 So What Makes a **Full Agent**?

A real agent often includes:

1. **A planning loop** (e.g., ReAct, CoT, or Task Decomposition)
2. **State management** (memory, goals, scratchpad)
3. **Tool use** (calls to Python, web search, APIs)
4. **Autonomy** (runs until task is done or fails)
5. **Error handling and retries**

Popular agent frameworks like:

* **AutoGPT**, **BabyAGI**
* **LangChain Agents**
* **OpenAI Function Calling + Planning**

…combine LLM reasoning with memory, tools, and feedback loops.



In [6]:
!pip install -q transformers accelerate huggingface_hub litellm python-dotenv

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m63.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m20.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m32.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m1.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m12.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m127.9/127.9 MB[0m [31m7.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━


## 🧠 Using Code LLaMA for Python Code Generation with Transformers

This notebook builds a quasi-agent that generates Python functions, adds documentation, and creates unit tests based on user input. It uses the Hugging Face `transformers` library and the `CodeLlama-7b-Instruct` model — a powerful, open-source large language model optimized for code generation.

---

### 📦 Key Libraries and Their Roles


| Component                     | Description                                                                                                                                                                                      |
| ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `AutoModelForCausalLM`        | Loads a **causal (left-to-right) language model** for text/code generation. Perfect for tasks where the model generates tokens one after another, like writing code or autocompleting text.      |
| `AutoTokenizer`               | Converts text into tokens (IDs) the model can understand, and converts output tokens back into readable text. Ensures compatibility with the specific model's vocabulary and tokenization rules. |
| `pipeline("text-generation")` | A high-level abstraction that combines tokenization, model inference, and decoding into one simple call — allowing us to generate code with minimal overhead.                                    |

---

### 🔍 Why Causal Language Modeling?

**Causal Language Modeling** (CLM) trains a model to predict the next token based only on previous tokens — like writing left to right. This makes it ideal for tasks like:

* Generating code from prompts
* Writing documentation progressively
* Autocompleting functions

In contrast to models like BERT (which look in both directions), CLMs are autoregressive and **better suited for generative tasks**, especially where **sequence order matters** (like in Python syntax).

---

### ✅ Why This Setup?

This setup is chosen because it meets several key goals for the assignment:

* ✅ **Open-source**: No API keys or external service needed; runs locally in Colab using Hugging Face models.
* ✅ **Instruction-tuned model**: `CodeLlama-7b-Instruct` understands user instructions well and produces high-quality, structured Python code.
* ✅ **Aligned with real-world tools**: The `transformers` library is industry-standard for LLMs, and learning it provides long-term value.
* ✅ **Simplified workflow**: Using the `pipeline` interface streamlines tokenization, inference, and decoding, making the code more readable and maintainable.

---

By combining a causal language model with this toolkit, we're able to build a responsive, context-aware code generation system — perfect for automating multi-step Python development tasks like function creation, documentation, and testing.


In [9]:
import os
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

# Load CodeLlama model
model_name = "codellama/CodeLlama-7b-Instruct-hf"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto")
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

def ask_llama(prompt, max_tokens=512):
    result = pipe(prompt, max_new_tokens=max_tokens, do_sample=True, temperature=0.7)[0]['generated_text']
    return result

# Step 1: Ask the user and generate a basic function
user_input = input("What function do you want to create? ")
step1_prompt = f"""### Instruction:
Write a Python function that does the following:
{user_input}
Only provide code output.

### Response:\n"""
basic_function = ask_llama(step1_prompt)
print("Generated Code:\n", basic_function)

# Step 2: Add documentation
step2_prompt = f"""### Instruction:
Add full documentation to the following Python function. Include:
- Function description
- Parameter descriptions
- Return value description
- Example usage
- Edge cases

### Code:
{basic_function}

### Response:\n"""
documented_function = ask_llama(step2_prompt)
print("Documented Code:\n", documented_function)

# Step 3: Generate unit tests
step3_prompt = f"""### Instruction:
Write Python `unittest` test cases for the following function. Cover:
- Basic functionality
- Edge cases
- Error cases
- Various input scenarios

### Code:
{documented_function}

### Response:\n"""
test_code = ask_llama(step3_prompt)
print("Test Cases:\n", test_code)

# Step 4: Save to a file
with open("final_function.py", "w") as f:
    f.write(documented_function)
    f.write("\n\n")
    f.write(test_code)


In [8]:
import torch
print(torch.cuda.is_available())  # Should return True


False




### ❌ Why the First Approach (Code LLaMA on CPU) Failed

The first attempt used the **CodeLLaMA-7B-Instruct** model from Hugging Face, which is a powerful but **large open-source model** (\~13 billion parameters with weights). This approach failed primarily due to:

* **No GPU support**: Running large models like CodeLLaMA on CPU is **extremely slow and memory-intensive**, often leading to crashes or indefinite execution times.
* **Colab Free tier limitations**: Without access to GPU, even downloading and loading the model into memory can take 10–20+ minutes or fail outright.
* **High RAM & VRAM requirements**: CodeLLaMA needs \~12–16 GB of VRAM to run smoothly, which a CPU-only environment can’t provide.

> ⚠️ Large transformer models are designed with parallel GPU computation in mind. CPUs are simply too slow and memory-limited for efficient inference.

---

### ✅ Why Using OpenAI Worked

Switching to **OpenAI's GPT models** (via their API) resolved these issues:

* **Model inference runs on OpenAI's high-performance GPUs**, not your local machine — removing hardware limitations.
* **Fast and reliable**: GPT-3.5 and GPT-4 are highly optimized, returning responses within seconds.
* **No model loading required**: You don’t need to download, cache, or manage large models.
* **Production-grade quality**: The outputs are consistently clean, formatted, and instruction-following.

---

### 🎯 Conclusion

| Approach               | Outcome                             | Notes                           |
| ---------------------- | ----------------------------------- | ------------------------------- |
| CodeLLaMA (local, CPU) | ❌ Failed (too slow, resource-heavy) | Needs GPU for practical use     |
| OpenAI (via API)       | ✅ Success (fast, reliable)          | Ideal for CPU-only environments |

**Using OpenAI is the better choice when:**

* You're on a CPU-only setup
* You want quick results without infrastructure setup
* You need production-quality output



In [10]:
import os
import json
import logging
from dotenv import load_dotenv
import litellm
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

# Load token from .env.
load_dotenv("/content/API_KEYS.env", override=True)

def call_openai(prompt):
    response = litellm.completion(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": prompt}]
    )
    return response['choices'][0]['message']['content']


# Step 1: Ask the user and generate a basic function ==========
user_input = input("What function do you want to create? ")
prompt1 = f"Write a Python function for: {user_input}"
basic_function = call_openai(prompt1)
print("Generated Code:\n", basic_function)


# Step 2: Ask for documentation =================
prompt2 = f"""Add full documentation to this Python function including:
- Description
- Parameters
- Return value
- Example usage
- Edge cases

Here is the code:
{basic_function}
"""
documented_function = call_openai(prompt2)
print("Documented Code:\n", documented_function)


# Step 3: Generate unit tests ===============
prompt3 = f"""Write unittests for the following Python function. Cover:
- Basic functionality
- Edge cases
- Error handling

Here is the code:
{documented_function}
"""
test_code = call_openai(prompt3)
print("Test Cases:\n", test_code)


# Step 4: Save everything ==============
with open("final_function.py", "w") as f:
    f.write(documented_function)
    f.write("\n\n")
    f.write(test_code)



What function do you want to create? can you write a function to calculate the square root of a number?
Generated Code:
 Certainly! Here is a Python function to calculate the square root of a number:

```python
def square_root(num):
    return num ** 0.5

# Test the function
number = 25
result = square_root(number)
print(f"The square root of {number} is {result}")
```

You can use this `square_root` function by passing a number as an argument to get its square root.


  PydanticSerializationUnexpectedValue(Expected 9 fields but got 6: Expected `Message` - serialized value may not be as expected [input_value=Message(content='Certainl...: None}, annotations=[]), input_type=Message])
  PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [input_value=Choices(finish_reason='st...ider_specific_fields={}), input_type=Choices])
  return self.__pydantic_serializer__.to_python(


Documented Code:
 """
Description:
This function takes a number as input and calculates its square root using the exponentiation operator (**).

Parameters:
- num: A numerical value for which the square root needs to be calculated.

Return value:
The function returns the square root of the input number as a float.

Example usage:
number = 25
result = square_root(number)
print(f"The square root of {number} is {result}")

This will output:
The square root of 25 is 5.0

Edge cases:
- Passing a negative number will result in a ValueError as the square root of a negative number is not a real number.
- If the input is a decimal number, the function will still calculate the square root, but the result may not be exact due to floating-point precision.
"""


  PydanticSerializationUnexpectedValue(Expected 9 fields but got 6: Expected `Message` - serialized value may not be as expected [input_value=Message(content='"""\nDes...: None}, annotations=[]), input_type=Message])
  PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [input_value=Choices(finish_reason='st...ider_specific_fields={}), input_type=Choices])
  return self.__pydantic_serializer__.to_python(


Test Cases:
 import unittest

def square_root(num):
    if num < 0:
        raise ValueError("Cannot calculate square root of a negative number")
    return num ** 0.5

class TestSquareRootFunction(unittest.TestCase):

    def test_basic_functionality(self):
        self.assertEqual(square_root(25), 5.0)
        self.assertEqual(square_root(16), 4.0)
        self.assertEqual(square_root(0), 0.0)

    def test_edge_cases(self):
        with self.assertRaises(ValueError):
            square_root(-1)
        self.assertAlmostEqual(square_root(2), 1.41421356237, places=10)
        self.assertAlmostEqual(square_root(10.5), 3.2403703492, places=10)

    def test_error_handling(self):
        with self.assertRaises(TypeError):
            square_root("abc")
        with self.assertRaises(TypeError):
            square_root([25])
        with self.assertRaises(TypeError):
            square_root(None)

if __name__ == '__main__':
    unittest.main()


  PydanticSerializationUnexpectedValue(Expected 9 fields but got 6: Expected `Message` - serialized value may not be as expected [input_value=Message(content='import u...: None}, annotations=[]), input_type=Message])
  PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [input_value=Choices(finish_reason='st...ider_specific_fields={}), input_type=Choices])
  return self.__pydantic_serializer__.to_python(


In [11]:

def ask_openai(prompt, model="gpt-3.5-turbo"):
    response = litellm.completion(
        model=model,
        messages=[{"role": "user", "content": prompt}]
    )
    return response['choices'][0]['message']['content']

# Step 1: Ask the user what function to create
user_input = "Create a function to sort a list of dictionaries by a specific key."

print(f"\n🧑 User request: {user_input}")
prompt1 = f"Write a Python function for this task: {user_input}"
basic_function = ask_openai(prompt1)
print("\n🛠️ Generated Function:\n", basic_function)

# Step 2: Add documentation
prompt2 = f"""Add full documentation to the following Python function. Include:
- Description
- Parameters
- Return value
- Example usage
- Edge cases

Function:
{basic_function}
"""
documented_function = ask_openai(prompt2)
print("\n📘 Documented Code:\n", documented_function)

# Step 3: Add unit tests
prompt3 = f"""Write unit tests for this function using Python's unittest framework.
Cover:
- Sorting by string keys
- Sorting by numeric keys
- Missing key edge cases
- Error handling

Function:
{documented_function}
"""
test_code = ask_openai(prompt3)
print("\n✅ Unit Tests:\n", test_code)

# Step 4: Save to file
with open("final_function.py", "w") as f:
    f.write(documented_function)
    f.write("\n\n")
    f.write(test_code)

print("\n💾 Saved to final_function.py")



🧑 User request: Create a function to sort a list of dictionaries by a specific key.

🛠️ Generated Function:
 Here is a Python function that sorts a list of dictionaries by a specific key:

```python
def sort_dict_list(dict_list, key):
    return sorted(dict_list, key=lambda x: x[key])

# Example usage
list_of_dicts = [{'name': 'Alice', 'age': 25},
                 {'name': 'Bob', 'age': 30},
                 {'name': 'Charlie', 'age': 20}]

sorted_list = sort_dict_list(list_of_dicts, 'age')
print(sorted_list)
```

This function takes a list of dictionaries `dict_list` and a key `key` to sort the list of dictionaries by. The `sorted()` function is then used with a lambda function as the `key` to extract the specific key from each dictionary for sorting.


  PydanticSerializationUnexpectedValue(Expected 9 fields but got 6: Expected `Message` - serialized value may not be as expected [input_value=Message(content="Here is ...: None}, annotations=[]), input_type=Message])
  PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [input_value=Choices(finish_reason='st...ider_specific_fields={}), input_type=Choices])
  return self.__pydantic_serializer__.to_python(



📘 Documented Code:
 Description:
This function sorts a list of dictionaries by a specific key.

Parameters:
- dict_list: a list of dictionaries to be sorted
- key: the key by which the list of dictionaries should be sorted

Return value:
The function returns the sorted list of dictionaries.

Example usage:
```
list_of_dicts = [{'name': 'Alice', 'age': 25},
                 {'name': 'Bob', 'age': 30},
                 {'name': 'Charlie', 'age': 20}]

sorted_list = sort_dict_list(list_of_dicts, 'age')
print(sorted_list)
```

Edge cases:
- If `dict_list` is empty, the function will return an empty list since there is nothing to sort.
- If `key` does not exist in any of the dictionaries in `dict_list`, the function may raise a KeyError.


  PydanticSerializationUnexpectedValue(Expected 9 fields but got 6: Expected `Message` - serialized value may not be as expected [input_value=Message(content="Descript...: None}, annotations=[]), input_type=Message])
  PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [input_value=Choices(finish_reason='st...ider_specific_fields={}), input_type=Choices])
  return self.__pydantic_serializer__.to_python(



✅ Unit Tests:
 Here are the unit tests for the provided function using Python's unittest framework:

```python
import unittest
from sort_dict_list import sort_dict_list

class TestSortDictList(unittest.TestCase):

    def test_sort_by_string_keys(self):
        dict_list = [{'name': 'Alice', 'age': 25},
                     {'name': 'Bob', 'age': 30},
                     {'name': 'Charlie', 'age': 20}]
        sorted_list = sort_dict_list(dict_list, 'name')
        self.assertEqual(sorted_list, [{'name': 'Alice', 'age': 25},
                                       {'name': 'Bob', 'age': 30},
                                       {'name': 'Charlie', 'age': 20}])

    def test_sort_by_numeric_keys(self):
        dict_list = [{'name': 'Alice', 'age': 25},
                     {'name': 'Bob', 'age': 30},
                     {'name': 'Charlie', 'age': 20}]
        sorted_list = sort_dict_list(dict_list, 'age')
        self.assertEqual(sorted_list, [{'name': 'Charlie', 'age': 20},
       

  PydanticSerializationUnexpectedValue(Expected 9 fields but got 6: Expected `Message` - serialized value may not be as expected [input_value=Message(content="Here are...: None}, annotations=[]), input_type=Message])
  PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [input_value=Choices(finish_reason='st...ider_specific_fields={}), input_type=Choices])
  return self.__pydantic_serializer__.to_python(




## ✅ What This Output Demonstrates

### 🧠 Quasi-Agent Functionality in Action

| Step                | What Happened                                   | Success                         |
| ------------------- | ----------------------------------------------- | ------------------------------- |
| **User Prompt**     | You asked for a sorting function                | ✅ Interpreted correctly         |
| **Code Generation** | The agent created a clean, Pythonic function    | ✅ Uses `sorted()` with a lambda |
| **Documentation**   | Added real doc-style formatting with edge cases | ✅ Includes examples & caveats   |
| **Test Suite**      | Generated `unittest` tests for 4 cases          | ✅ Covers normal + edge cases    |
| **Output Handling** | Saved to file, clear terminal output            | ✅ ✅ ✅                           |

---

## ⚠️ About the Warnings Again

These LiteLLM + Pydantic warnings are **cosmetic**. You can safely ignore or suppress them, as they don’t impact output:

```python
import warnings
warnings.filterwarnings("ignore", category=UserWarning)
```

---

## 💡 What You’ve Learned (and Practiced)

* How to run a **multi-step LLM workflow**
* How to **preserve context** between steps
* How to **manage output** from generation to testing
* Why it’s a **quasi-agent** (not adaptive, but sequential and helpful)
* How to **use OpenAI's hosted power** to bypass local resource limits

---

## 🛠️ Where You Can Go From Here

### 🧭 Extend This Quasi-Agent

* Add **error detection** (e.g., parse code with `ast`, retry on syntax errors)
* Add **user refinement loop**: "Do you want to change the function?"
* Add support for **multiple function types** in a menu

### 🤖 Turn Into a Real Agent

* Add tool use: execute the generated code and return the result
* Track memory, goals, and retry logic (e.g., with LangChain Agents or ReAct loop)
* Dynamically modify its own behavior based on output or errors






## ✅ What Are Unit Tests?

**Unit tests** are small, automated tests that verify that a **single "unit" of code** (usually a function or method) works as expected.

> A unit test answers: *“Given this input, does the function return the correct output?”*

They're like a safety net — they make sure your function still works even if other parts of your code change later.

---

## 🧪 Example

Say you have this function:

```python
def add(a, b):
    return a + b
```

A unit test for it might look like:

```python
import unittest

class TestAddFunction(unittest.TestCase):
    def test_add_positive_numbers(self):
        self.assertEqual(add(2, 3), 5)

    def test_add_negative_numbers(self):
        self.assertEqual(add(-1, -1), -2)
```

These tests run automatically and alert you if something breaks — much better than manually checking results.

---

## 🧠 How to Design Good Unit Tests

Designing tests *is* a bit of an art, and you're right — it requires thinking like both a coder and a skeptic. Here's a solid checklist:

### 1. **Test Expected Behavior (Happy Path)**

Test that the function works when given normal, valid input.

> ✅ *Does it work for common use cases?*

### 2. **Test Edge Cases**

Test extreme or unusual values.

> ✅ *What happens with 0, empty lists, or very large numbers?*

### 3. **Test Error Handling**

Deliberately feed it bad inputs — wrong types, missing keys, etc.

> ✅ *Does it raise the right exceptions?*

### 4. **Test Performance Assumptions (Optional)**

Test with large inputs or time-sensitive code.

---

## 🔍 For Your Example (Sort Dictionary List)

You were testing:

| Case            | Example                     | Why it matters                              |
| --------------- | --------------------------- | ------------------------------------------- |
| ✅ Basic sorting | Sort by `'age'` or `'name'` | Proves the core logic works                 |
| ✅ Edge case     | Empty list                  | Should return empty list, not crash         |
| ✅ Missing key   | One dict has no `'age'`     | Should raise a `KeyError`                   |
| ✅ Wrong type    | Input isn't a list          | Should raise `TypeError` or fail gracefully |

This shows you understand what could go wrong — and **build confidence** that your function behaves well.

---

## 🛠️ Tools Used

In Python, the standard tool is:

```python
import unittest
```

But many developers also use:

* `pytest` (more flexible, readable syntax)
* `doctest` (tests in docstrings)
* Custom test runners (for CI/CD)

---

## 🔁 Why Unit Testing Is Powerful

* Catches bugs early
* Documents what your function *should* do
* Makes refactoring safe
* Builds trust in your code





## 🧠 What is `LiteLLM`?

`LiteLLM` is a lightweight Python wrapper that gives you a **unified interface** to talk to different large language models (LLMs), including:

* ✅ OpenAI (`gpt-3.5-turbo`, `gpt-4`)
* ✅ Anthropic (`Claude`)
* ✅ Hugging Face Inference models
* ✅ Azure OpenAI
* ✅ Google Gemini (via proxy)
* ✅ Local models (optionally)

> Basically, it lets you **write one prompt call**, and switch LLMs behind the scenes without rewriting your whole app.

---

### 📞 What is `completion`?

The function `completion(...)` is **LiteLLM's version of a chat-style API call**. It’s similar to `openai.ChatCompletion.create(...)` but:

* Works across providers
* Has a more consistent response structure
* Is easier to swap models or tune behaviors

---

### 🧪 Example

```python
from litellm import completion

response = completion(
    model="openai/gpt-4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

print(response.choices[0].message.content)
# Output: "The capital of France is Paris."
```

It returns an object with `.choices[0].message.content`, just like OpenAI's API — so it’s mostly a drop-in replacement.

---

### ✅ Why Use LiteLLM?

| Benefit           | Description                                                      |
| ----------------- | ---------------------------------------------------------------- |
| 🌐 Multi-provider | Talk to OpenAI, Anthropic, Hugging Face, etc. with the same code |
| 🔁 Model swapping | Change from GPT-4 to Claude or Mistral in one line               |
| 🧪 Simplified API | Easier to test, consistent structure                             |
| 💰 Cost tracking  | Optional built-in logging + pricing                              |


---

### ✅ LiteLLM is perfect for **prototyping** when:

* You know **what kind of output** you want (e.g., generate code, summarize text, write SQL).
* You want to **experiment quickly** across different LLMs (OpenAI, Claude, Mistral, Hugging Face).
* You care about **performance, cost, or latency** and want to compare models easily.
* You’re building a **lean script or tool** and don’t want to pull in a full agent framework like LangChain.

---

### 🧪 Example Use Case

Let’s say you want to see which model writes better docstrings:

```python
from litellm import completion

messages = [{"role": "user", "content": "Write a Python function to reverse a string."}]

# Try GPT-4
response = completion(model="openai/gpt-4", messages=messages)
print("GPT-4:", response.choices[0].message.content)

# Try Claude 3
response = completion(model="anthropic/claude-3-opus-20240229", messages=messages)
print("Claude 3:", response.choices[0].message.content)

# Try Mistral
response = completion(model="mistralai/mistral-7b-instruct", messages=messages)
print("Mistral:", response.choices[0].message.content)
```

> With just one line (`model="..."`), you swap providers — **no need to rewrite your whole prompt structure**.

---

### 🔁 Once You Decide…

Once you find the best model for your use case:

* You can hardcode it
* Or migrate to a larger framework (like LangChain) if you need memory, agents, retrieval, etc.

---

### 🧠 TL;DR

> **LiteLLM = Rapid A/B testing + model swapping with minimal setup.**

Perfect for:

* Hackathons
* Prototyping
* Internal tools
* Developer testing
* Cost comparisons





In [12]:

from litellm import completion
from typing import List, Dict
import sys

def generate_response(messages: List[Dict]) -> str:
   """Call LLM to get response"""
   response = completion(
      model="openai/gpt-4",
      messages=messages, # Pass the full conversation (acts as context)
      max_tokens=1024    # Limit the response length to avoid overflow
   )
   return response.choices[0].message.content # # Return only the generated text portion (not metadata or tokens)

def extract_code_block(response: str) -> str:
    """
    Extracts the first code block from an LLM response.

    This function assumes the LLM output is in markdown-style format using triple backticks (```).
    It pulls out the code portion — and removes any optional language tag like 'python'.

    Parameters:
    - response (str): The raw text output from the LLM.

    Returns:
    - str: The extracted code block, or the original string if no code block is found.
    """

    # If there's no triple backtick in the response, just return the full response
    if '```' not in response:
        return response

    # Split the response by backticks and grab the first code block after the opening ```
    code_block = response.split('```')[1].strip()

    # If the code block starts with "python", remove it (it's a formatting tag, not actual code)
    if code_block.startswith("python"):
        code_block = code_block[6:]  # Removes the "python" prefix

    return code_block


def develop_custom_function():
   # Get user input for function description
   print("\nWhat kind of function would you like to create?")
   print("Example: 'A function that calculates the factorial of a number'")
   print("Your description: ", end='')
   function_description = input().strip()

   # Initialize conversation with system prompt
   messages = [
      {"role": "system", "content": "You are a Python expert helping to develop a function."}
   ]

   # First prompt - Basic function
   messages.append({
      "role": "user",
      "content": f"Write a Python function that {function_description}. Output the function in a ```python code block```."
   })
   initial_function = generate_response(messages)

   # Parse the response to get the function code
   initial_function = extract_code_block(initial_function)

   print("\n=== Initial Function ===")
   print(initial_function)

   # Add assistant's response to conversation
   # Notice that I am purposely causing it to forget its commentary and just see the code so that
   # it appears that is always outputting just code.
   messages.append({"role": "assistant", "content": "\`\`\`python\n\n"+initial_function+"\n\n\`\`\`"})

   # Second prompt - Add documentation
   messages.append({
      "role": "user",
      "content": "Add comprehensive documentation to this function, including description, parameters, "
                 "return value, examples, and edge cases. Output the function in a ```python code block```."
   })
   documented_function = generate_response(messages)
   documented_function = extract_code_block(documented_function)
   print("\n=== Documented Function ===")
   print(documented_function)

   # Add documentation response to conversation
   messages.append({"role": "assistant", "content": "\`\`\`python\n\n"+documented_function+"\n\n\`\`\`"})

   # Third prompt - Add test cases
   messages.append({
      "role": "user",
      "content": "Add unittest test cases for this function, including tests for basic functionality, "
                 "edge cases, error cases, and various input scenarios. Output the code in a \`\`\`python code block\`\`\`."
   })
   test_cases = generate_response(messages)
   # We will likely run into random problems here depending on if it outputs JUST the test cases or the
   # test cases AND the code. This is the type of issue we will learn to work through with agents in the course.
   test_cases = extract_code_block(test_cases)
   print("\n=== Test Cases ===")
   print(test_cases)

   # Generate filename from function description
   filename = function_description.lower()
   filename = ''.join(c for c in filename if c.isalnum() or c.isspace())
   filename = filename.replace(' ', '_')[:30] + '.py'

   # Save final version
   with open(filename, 'w') as f:
      f.write(documented_function + '\n\n' + test_cases)

   return documented_function, test_cases, filename

if __name__ == "__main__":


   function_code, tests, filename = develop_custom_function()
   print(f"\nFinal code has been saved to {filename}")


What kind of function would you like to create?
Example: 'A function that calculates the factorial of a number'
Your description: Create a function to return the fibonnaci sequence

=== Initial Function ===

def fibonacci(n):
    fib_seq = []
    a, b = 0, 1
    while a < n:
        fib_seq.append(a)
        a, b = b, a+b
    return fib_seq


  PydanticSerializationUnexpectedValue(Expected 9 fields but got 6: Expected `Message` - serialized value may not be as expected [input_value=Message(content="Sure, he...: None}, annotations=[]), input_type=Message])
  PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [input_value=Choices(finish_reason='st...ider_specific_fields={}), input_type=Choices])
  return self.__pydantic_serializer__.to_python(



=== Documented Function ===

def fibonacci(n):
    """
    Function to generate a sequential list of Fibonacci numbers up to a specified value.

    Parameters:
    n (int): A positive integer. The function generates the Fibonacci sequence up to this number.

    Returns:
    fib_seq (list): A list of integers where each item is a Fibonacci number, generated up to (but not exceeding) 
    the input parameter n.

    Example:
      - If the function is called with n=10, the output will be [0, 1, 1, 2, 3, 5, 8], 
      as these are the Fibonacci numbers up to 10.
      - Calling the function with n=1 will output [0, 1, 1].

    Edge Cases:
      - Calling the function with n less than or equal to 0 will result in an empty list 
      because the Fibonacci sequence starts with 0 and thus cannot generate numbers up to 
      a non-positive number.
    """
    fib_seq = []
    a, b = 0, 1
    while a < n:
        fib_seq.append(a)
        a, b = b, a+b
    return fib_seq


  PydanticSerializationUnexpectedValue(Expected 9 fields but got 6: Expected `Message` - serialized value may not be as expected [input_value=Message(content='```pytho...: None}, annotations=[]), input_type=Message])
  PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [input_value=Choices(finish_reason='st...ider_specific_fields={}), input_type=Choices])
  return self.__pydantic_serializer__.to_python(



=== Test Cases ===

import unittest

class TestFibonacci(unittest.TestCase):
    def test_functionality(self):
        self.assertEqual(fibonacci(10), [0, 1, 1, 2, 3, 5, 8])
        self.assertEqual(fibonacci(6), [0, 1, 1, 2, 3, 5])
        
    def test_edge_cases(self):
        self.assertEqual(fibonacci(0), [])
        self.assertEqual(fibonacci(1), [0,1,1])
        
    def test_error_cases(self):
        with self.assertRaises(TypeError):
            fibonacci("10")
            
    def test_various_inputs(self):
        self.assertEqual(fibonacci(50), [0, 1, 1, 2, 3, 5, 8, 13, 21, 34])
        self.assertEqual(fibonacci(100), [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89])

if __name__ == '__main__':
    unittest.main()

Final code has been saved to create_a_function_to_return_th.py


**ReAct** is one of the most important ideas in modern agent design. It's not React (the JavaScript library), but something completely different in the context of AI agents.

---

## 🧠 What is ReAct?

**ReAct** stands for:

> **Reasoning + Acting**

It's a prompting and architecture strategy that enables language models to:

1. **Think step-by-step** (Reason)
2. **Take actions** using tools (Act)
3. **Reflect and continue** based on the results (React)

It was introduced in a 2022 research paper from Google and Princeton:
📄 *ReAct: Synergizing Reasoning and Acting in Language Models*

---

### 🧩 Why Is It Important?

Most LLMs can only generate *text*. But agents often need to:

* Query a database
* Call an API
* Run a calculator
* Read a file
* Retry something if it fails

**ReAct gives the LLM a structure to "decide" when to take actions**, and how to reason before or after.

---

### 🧪 ReAct Workflow (Simplified)

Here’s what it looks like under the hood:

```
User: What's the population of France + Germany combined?

Agent:
Thought: I need to look up the populations of both countries.
Action: search("population of France")

Observation: France has 67 million people.

Thought: Now I’ll get Germany's population.
Action: search("population of Germany")

Observation: Germany has 83 million people.

Thought: 67 + 83 = 150
Answer: The combined population is 150 million.
```

Here, the LLM isn’t just answering directly — it’s using **"Thought → Action → Observation → Thought → Answer"** as a loop.

---

### 🤖 Where is ReAct used?

| Platform            | ReAct Support                                |
| ------------------- | -------------------------------------------- |
| 🔗 LangChain        | ✅ Yes (built-in agent framework)             |
| 🔁 AutoGPT          | ✅ Sort of — more goal/task-based             |
| 🧠 OpenAI Functions | ✅ Possible, with structured prompts          |
| 🧪 Custom agents    | ✅ You can build your own ReAct loop manually |

---

### 🔧 How Do You Use It?

In LangChain, you can build ReAct agents like this:

```python
from langchain.agents import initialize_agent, Tool
from langchain.llms import OpenAI

# Define tools the agent can use
tools = [
    Tool(name="Calculator", func=my_calculator, description="Useful for math"),
    Tool(name="Search", func=google_search, description="Useful for finding info"),
]

agent = initialize_agent(tools, OpenAI(), agent="zero-shot-react-description")

agent.run("How many more people live in Japan than in Canada?")
```

The model will:

* Decide when to use tools
* Reflect on tool outputs
* Continue reasoning until it’s ready to answer

---

### 🧠 TL;DR

> **ReAct** is a powerful prompting pattern that helps LLMs reason and act in cycles — enabling true multi-step agents.

