# Converting Jupyter Notebook to Python Project

## Project Setup

Start a new project:

In [None]:
!uv init
!uv add openai requests pydantic-ai minsearch python-frontmatter

Get the docs.py file from week 1

In [1]:
!wget https://github.com/alexeygrigorev/ai-bootcamp-codespace/raw/refs/heads/main/week1/docs.py

--2025-10-21 11:55:43--  https://github.com/alexeygrigorev/ai-bootcamp-codespace/raw/refs/heads/main/week1/docs.py
Resolving github.com (github.com)... 140.82.113.4
Connecting to github.com (github.com)|140.82.113.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/alexeygrigorev/ai-bootcamp-codespace/refs/heads/main/week1/docs.py [following]
--2025-10-21 11:55:44--  https://raw.githubusercontent.com/alexeygrigorev/ai-bootcamp-codespace/refs/heads/main/week1/docs.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8001::154, 2606:50c0:8000::154, 2606:50c0:8003::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8001::154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8583 (8.4K) [text/plain]
Saving to: ‘docs.py’


2025-10-21 11:55:44 (652 KB/s) - ‘docs.py’ saved [8583/8583]



## Search Tools Implementation

Now create search tools. This is out search_tools.py:

In [None]:
from minsearch import Index
from typing import Any, Dict, List

import docs


class SearchTools:

    def __init__(self, index: Index):
        self.index = index

    def search(self, query: str) -> List[Dict[str, Any]]:
        """
        Search the index for documents matching the given query.

        Args:
            query (str): The search query string.

        Returns:
            A list of search results
        """
        return self.index.search(
            query=query,
            num_results=5,
        )

Put the index preparation code in a separate function:

In [None]:
def prepare_index():
    github_data = docs.read_github_data()
    parsed_data = docs.parse_data(github_data)
    chunks = docs.chunk_documents(parsed_data)

    index = Index(
        text_fields=["title", "description", "content"]
    )

    index.fit(chunks)
    return index

Add caching to avoid reprocessing data every time:

In [None]:
import pickle
from pathlib import Path


def prepare_index_cached():
    cache_dir = Path(".cache")
    cache_dir.mkdir(exist_ok=True)

    index_path = cache_dir / "search_index.bin"

    if index_path.exists():
        with open(index_path, "rb") as f:
            index = pickle.load(f)
            return index

    index = prepare_index()

    with open(index_path, "wb") as f:
        pickle.dump(index, f)

    return index


Put everything together:

In [None]:
def prepare_search_tools():
    index = prepare_index_cached()
    return SearchTools(index=index)

Here's a summary of what each class/function is doing:

- **SearchTools class**: A wrapper around index.

- **prepare_index()**: Downloads GitHub documentation data, parses it, chunks it into smaller pieces, and creates the index.

- **prepare_index_cached()**: A wrapper around prepare_index() that saves the index to disk. If a cached version exists, it loads that instead of reprocessing everything.

- **prepare_search_tools()**: The main entry point that returns a ready-to-use SearchTools instance.

We added **prepare_index_cached** - so we don't need to redo the indexing every time.

Add .cache to .gitignore (along with other files) so we don't accidentally commit them to git:

```
.cache
.pytest_cache
__pycache__
.venv
```

## Search Agent Implementation

Now implement the search agent (search_agent.py):

In [None]:
from pydantic_ai import Agent
from pydantic_ai.messages import FunctionToolCallEvent
import search_tools


class NamedCallback:

    def __init__(self, agent):
        self.agent_name = agent.name

    async def print_function_calls(self, ctx, event):
        # Detect nested streams
        if hasattr(event, "__aiter__"):
            async for sub in event:
                await self.print_function_calls(ctx, sub)
            return

        if isinstance(event, FunctionToolCallEvent):
            tool_name = event.part.tool_name
            args = event.part.args
            print(f"TOOL CALL ({self.agent_name}): {tool_name}({args})")

    async def __call__(self, ctx, event):
        return await self.print_function_calls(ctx, event)


search_instructions = """

"""

def create_agent():
    tools = search_tools.prepare_search_tools()

    return Agent(
        name="search",
        instructions=search_instructions,
        tools=[tools.search],
        model="gpt-4o-mini",
    )


Here's what each part does:

- **NamedCallback class**: This helps us monitor what function calls the agent makes. We implemented it when creating the deep research agent, so I just took it form there.

- **create_agent()**: sets up the complete agent with search tools.

## Main Application

And put everything together in main.py:

In [None]:
import search_agent
import asyncio

agent = search_agent.create_agent()
agent_callback = search_agent.NamedCallback(agent)


async def run_agent(user_prompt: str):

    results = await agent.run(
        user_prompt=user_prompt,
        event_stream_handler=agent_callback
    )

    return results



result = asyncio.run(run_agent("What is LLM evaluation?"))

print(result.output)


This main file brings everything together.

It creates the agent, sets up monitoring with the callback, and provides a simple function to run queries.

The example query asks about LLM evaluation, which should trigger the agent to search the documentation and provide a comprehensive answer based on what it finds.

Now let's add tests!