# Project Planning

When building full-stack applications, I begin by mapping out a comprehensive overview of each feature to be developed. This holistic, forward-thinking approach helps me anticipate challenges and avoid costly refactoring or rewrites that can result from incomplete planning—a lesson learned from past experience.

For this project, I closely followed the provided outline and specifications, shaping my plan around the following core goals:
- use the right tool for the job
- keep solutions as simple as possible
- prioritize developer experience, maintainability, and scalability

### Main Phases
1. LLM & Ollama Setup
2. Database Setup
3. Backend Setup
4. Frontend Setup
5. Testing and Deployment

### Integration with Cursor

To accelerate development and streamline problem-solving, I leveraged various AI agents through [Cursor](https://www.cursor.com/) and [ChatGPT/Claude](https://t3.chat/). Whenever I encountered challenges—such as refactoring, setting up boilerplate, or making architectural decisions—I consulted these tools. This approach not only helped me overcome obstacles efficiently but also enhanced my skills as a developer, as I used AI as a collaborative assistant rather than a crutch.

# Deployment via Docker

Right from the start, I planned to use Docker because it makes running and sharing apps much easier. I already had some experience with Docker, but I knew that working with several services at once and setting up their network connections could be tricky. Before this project, I hadn’t set up a Docker network in so much detail.

To make things easier, I gave each service a fixed IP address and wrote these into the `docker-compose.yml` file. Doing this early on made it much simpler to connect everything together later. When it was time to link the services, it all worked smoothly.

I also chose to use `docker compose` instead of running each container with `docker run` commands. I find `docker compose` much easier to work with, especially when you need to set up different settings or restart services often.

Setting clear semantic names for the different services also helped when referencing them in commands in Docker.

Overall, planning ahead with Docker saved me time and made the whole process go more smoothly.

# Base LLM: Gemma 3

At first, it wasn’t clear what the main use of the chat app would be, so I didn’t have a specific goal when picking the language model. Because of this, I chose a model that is good at general knowledge, reasoning, and coding. I also wanted a model that is fast and doesn’t take up too much space, since this makes development and testing easier. I avoided models that focus only on reasoning, since they can be slower to start, and speed was more important for this project.

To help me decide, I checked benchmark leaderboards like [WebDevArena](https://web.lmarena.ai/leaderboard) and [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#). After looking at the options, I picked `gemma3:1b`, which is a 1-billion parameter model. It’s new, performs well, and is small enough to run easily on most computers. I also like Google’s `Gemini 2.5 Pro` model, so I thought its "open weights" version would be a good fit too.

You can read more about Gemma 3 in this [blog post](https://blog.google/technology/developers/gemma-3/).

I also decided to add another model to compare `gemma3` with, `qwen3`, I chose it as one of the smallest new reasoning models, to see how it compares with gemma3. Since I like using other models like `qwen2.5-coder` from qwen, I found it interesting to compare the 2, especially the lower parameter count in qwen3 and would the reasoning capability compensate for that. Additionally, this provided a way for me to add model switching in the app so that comparing models can be easy.

# LLM Provider: Ollama

For running the models, I chose Ollama right away. I already use Ollama for my own projects, so I know how it works and how to set it up. It’s also popular in the developer community, so there are lots of guides and help available on places like Stack Overflow and GitHub. Ollama works on many operating systems and is easy to run with Docker.

I followed this guide to set up Ollama with Docker: [Ollama Official Docker Image](https://ollama.com/blog/ollama-is-now-available-as-an-official-docker-image).

Since I already had Ollama running on my computer, I made sure to save the `.ollama` data in a separate volume and used a different port for this project. This way, it wouldn’t interfere with my other work.

Additionally, I added the `OLLAMA_KEEP_ALIVE=24h` key in the compose setup in order to prevent model unloading when they are stale or have not been used recently. I found that this was good when testing different models and at different times - fortunately my GPU had enough VRAM to store all models that were tested at the same time

Furthermore, I configured the `ollama/entrypoint.sh` script so that it downloads all of the models at startup, so that the setup is hands-free

# Postgres Database

The project instructions said to use Postgres, so I used this SQL database to handle the app’s data. Even if it wasn’t required, I would have picked Postgres anyway because I’m familiar with it. Postgres is popular, has lots of resources and plugins, and is widely used in real-world projects.

To connect my app to Postgres, I used the [langchain-postgres](https://python.langchain.com/docs/integrations/memory/postgres_chat_message_history/) library. This library made it easy to set up the database tables for storing chat message history.

However, `langchain-postgres` only saves the chat messages. I wanted more control, so I added my own table to keep track of different chat sessions. This lets me store things like session titles and link them to each user by their username. Later, I also added some sample data using a `database/seed.sql` file, so the table wouldn’t be empty during initial testing.

Here’s the code I used to create the sessions table, it just used `psycopg` to write the raw SQL:

```python
def create_db_sessions_table(conn: Connection):
    with conn.cursor() as cur:
        cur.execute(
            """
            CREATE TABLE IF NOT EXISTS db_sessions (
                id UUID PRIMARY KEY,
                username VARCHAR(255) NOT NULL,
                title VARCHAR(255) NOT NULL,
                created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
            )
            """
        )
        conn.commit()
```

### Postgres Deployment via Docker

Setting up Postgres with Docker can be a bit tricky because you need to be clear about things like the database username and password. Just like with Ollama, I made sure to use a unique port and a separate data volume for this project. This way, it wouldn’t interfere with any other Postgres instances I have running on my computer.

I followed this guide for setting up Postgres with Docker: [How to use the Postgres Docker Official Image](https://www.docker.com/blog/how-to-use-the-postgres-docker-official-image/).

Here’s the part of my `docker-compose.yml` file that sets up the database:

```yaml
database:
  image: postgres:16
  restart: always
  container_name: bd_database
  ports:
    - "${POSTGRES_PORT:-5432}:5432"
  environment:
    POSTGRES_USER: ${POSTGRES_USER}
    POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    POSTGRES_DB: ${POSTGRES_DB}
  networks:
    bd_network:
      ipv4_address: "172.28.0.30"
  volumes:
    - bd_pgdata:/var/lib/postgresql/data
    - ./database/seed.sql:/docker-entrypoint-initdb.d/seed.sql
```

> Please note that I injected the seeding script directly on the docker container, so that it would be initialized on startup.

# Backend Setup

For the backend of this app, Python was the obvious choice because it works well with the tools and libraries needed for the project. I use Python a lot, so I felt comfortable building both small and large parts of the backend with it. For this project, I kept things simple and only used the modules and functions I really needed, while still following good coding practices.

Instead of using pip to manage Python packages, I chose [uv](https://docs.astral.sh/uv/). I like uv because it’s fast (it’s built with Rust) and makes it easy to manage the Python environment and keep all the dependencies in sync. It feels a lot like using `pnpm` for JavaScript, but for Python projects.

# Langchain Backend

When working with LLMs, I have a lot of experience using [OpenAI API Spec](https://openai.com/index/openai-api/) services since they have a lot of SDKs as well as resources on how to use them. Honestly, I haven't used langchain prior to this project but after doing some research as well as hands-on experience, I appreciate the hands-free orchestration and simplified setup it offers as well as the community libraries that make bootstrapping applications much quicker and easier

For this, I had three main problems I needed to achieve

### Langchain Integration with Ollama
Fortunately, there is already a resource that does this, [langchain-ollama](https://python.langchain.com/docs/integrations/llms/ollama/). Thus, I just used the provided boilerplate when interfacing Langchain with Ollama

```python
from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama.llms import OllamaLLM

template = """Question: {question}

Answer: Let's think step by step."""

prompt = ChatPromptTemplate.from_template(template)

model = OllamaLLM(model="llama3.1")

chain = prompt | model

chain.invoke({"question": "What is LangChain?"})
```

From there, setting up the system prompts, as well as using `langchain-postgres` to differentiate between the `HumanMessage` as well as `AIMessage`, setting up the chat workflows became easy.

### Streaming Chat Completions

Streaming responses from Ollama via Langchain was one of my first hurdles in the project, since I haven't used these before and thus I did not know how to setup streaming. I understood streaming from a network point of view but not how to do it via Python. Unfortunately, this was one of the areas as well that using ChatGPT faltered since it did not set it up correctly, and there were a lot of conflicting ways to do it via different langchain libraries that evolved over the years. Fortunately, I was able to find this github issue that started way back in 2023 that taught me how to implement streaming https://github.com/langchain-ai/langchain/issues/13333

```python
    model_with_streaming = OllamaLLM(
        model=request.model,
        base_url=os.getenv("OLLAMA_BASE_URL"),
        streaming=True,
    )

    full_response = ""

    async def stream_response():
        nonlocal full_response
        async for token in model_with_streaming.astream(messages):
            full_response += token
            yield token

```
Turns out, you needed to define it explicitly in the `OllamaLLM` class as well as use the `astream` method. For a minute there I was stuck trying to implement the old methods like using `AsyncIteratorCallbackHandler` or `StreamingStdOutCallbackHandler`

# FastAPI Backend

For managing the backend, I decided to expose all of these features via a REST HTTP server through [FastAPI](https://fastapi.tiangolo.com/). I like it because setting up a web server through it is very easy, with a lot of references and all the batteries already included

I only had to setup the protocol for which the frontend later will interact with the backend for the different app operatinos

```python
class ChatRequest(BaseModel):
    name: str = "User"
    session_id: str
    content: str
    model: str = "gemma3:1b"

@app.post("/stream")
async def chat(request: ChatRequest):
    ## input validation & session handling

    ## chat completion via langchain
    chat_history = PostgresChatMessageHistory(
        table_name, request.session_id, sync_connection=sync_connection
    )

    new_usr_msg = HumanMessage(
        content=request.content, id=generate_message_id(), name=request.name
    )
    
    model_with_streaming = OllamaLLM(
        model=request.model,
        base_url=os.getenv("OLLAMA_BASE_URL"),
        streaming=True,
    )

    # RESPONSE STREAMING
    full_response = ""

    async def stream_response():
        nonlocal full_response
        async for token in model_with_streaming.astream(messages):
            full_response += token
            yield token

    ## streaming response
    return StreamingResponse(
        stream_response(), media_type="text/plain; charset=utf-8"
    )

```
Fortunately, FastAPI has a built-in helper for the streamed response using `StreamingResponse`. This made the setup much easier to work with

### Background Task for Database Operation post-Streaming
In order to sync the message state with the database while also returning the streamed chat completions as soon as possible to the client, I had to find a way to store the full response after streaming it, which was done via a `BackgroundTask`
```python
async def store_messages():
    new_ai_msg = AIMessage(
        content=full_response, id=generate_message_id(), name="Assistant"
    )
    chat_history.add_messages([new_usr_msg, new_ai_msg])

background_tasks.add_task(store_messages)
```


# Testing the API via Insomnia

Before building the frontend, I used [Insomnia](https://insomnia.rest/) to manually test the FastAPI endpoints and the Ollama integration. This allowed me to quickly verify that the backend was working as expected and that the chat completions were being streamed correctly.

Here are some examples which I exported as `curl` commands:

Getting the available models in Ollama to make sure that they were loaded correctly
```bash
curl --request GET \
  --url 'http://localhost:11434/api/tags?name=John%20Doe' \
  --header 'Content-Type: application/json' \
  --header 'User-Agent: insomnia/11.2.0'
```

```bash
curl --request POST \
  --url http://localhost:8000/stream \
  --header 'Content-Type: application/json' \
  --header 'User-Agent: insomnia/11.2.0' \
  --data '{
	"name": "Red",
	"session_id": "b95eecaa-f30f-45b1-bcc2-acb984dce9a5",
	"content": "Who are you?"
}'
```




# React Frontend

Instead of using the suggested Python UI libraries, I chose to build the frontend with React and TypeScript. I made this choice because I’m already comfortable with React, so it was much faster for me to get started and connect it to the Python backend. Using React also gave me more control over how the app looks and works, letting me create a user interface that is both good-looking and easy to use.

Here are the main libraries and tools I used:
- [React](https://react.dev/) – for building the user interface
- [Vite](https://vite.dev/) – for fast bundling and serving of React code (faster than Next.js or Create React App)
- [Tanstack Start](https://tanstack.com/start/latest) – for easy file routing and server functions using RPCs
- [TailwindCSS](https://tailwindcss.com/) – for quick and simple styling with utility classes
- [shadcn/ui](https://ui.shadcn.com/) – for ready-to-use, reusable UI components

## User Handling
I used [react-hook-form](https://www.react-hook-form.com/) with [zod](https://zod.dev/) in order to validate the input for the user's username. This made it easy for me since the boilerplate was already setup, and I just passed the user's username as a URL parameteter to the chat page

```tsx
const form = useForm<z.infer<typeof usernameFormSchema>>({
  resolver: zodResolver(usernameFormSchema),
  defaultValues: {
    username: "",
  },
})

function onSubmit(values: z.infer<typeof usernameFormSchema>) {
  // generate uuid v4 
  const { username } = values
  const session_id = crypto.randomUUID()
  navigate({ to: '/chat/$session_id', params: { session_id }, search: { username } })
}
```
## Threads Sidebar
First, there is a navigation component that basically retrieves a user's list of past threads, if any, so that the user can read or chat with any of them again. One of the problems I had with this one was providing a way to render the title with the emoji correctly, truncate long titles, as well as have a snappy feedback for whenever a user "creates" a new thread 

You can see here the code that gets the sessions from the `/sessions` endpoint in the Backend API

```ts
export const getSessions = createServerFn({
  method: 'GET',
  response: 'data',
}).validator(({ name, session_id }: { name: string, session_id: string }) => {
  return {
    name: name,
    session_id: session_id,
  }
}).handler(async ({ data }) => {
  try {
    const url = `${import.meta.env.VITE_BACKEND_BASE_URL}/sessions?name=${data.name}`
  
    const response: AxiosResponse<SessionData[]> = await axios.get(url)

    const sessions = response.data.slice(0, 15)

    // check if session_id is in the sessions, if not, add it at the top of the list with the title "New Thread"
    if (!sessions.some((session) => session.id === data.session_id)) {
      sessions.unshift({
        id: data.session_id,
        title: "🧵 New Thread",
        username: data.name,
        isNew: true,
      })
    }

    return sessions
  } catch (err) {
    // console.error(err)
    return []
  }
})
```
I found that setting up all of the validation, as well as the static types for the returned JSON was good since it is easier to read and debug, can be understood by AI agents, as well as the autocomplete is just a quality of life that is leagues ahead of other languages or frameworks.

## Thread/Session Handling
In order to simplify it, as well as seeing how other chat apps do it, each "thread" or "conversation" is represented by a unique UUID to reference that thread. Thus, I just render this via the `/chat/<uuid>` endpoint in the website. 


Thus, the UI just needs to do three things:
- retrieve old messages (for old threads)
- provide a way to input a user's message
- provide a way to get the assistant's response

## Message Input
This one took me a while to setup, as I had to setup a `<Select/>` component which allows the user to select what chat model to use (from the available models via the Backend API). I also had to setup a `<Textarea/>` component that allows the user to input multi-line text for the Chat workflow.

I also invested some time making it pretty, and fixed to the chat window at the bottom. I like how the glassmorphism effect is rendered through the built-in `background opacity` & `backdrop blur` in Tailwind

```tsx
<div className="isolate backdrop-blur-sm flex flex-row items-center gap-2 p-2 bg-neutral-800/20 border-[1px] border-b-0 border-neutral-800 rounded-xl rounded-b-none pb-0">
  <div className="isolate rounded-xl bg-neutral-800/20 border-[1px] border-b-0 border-neutral-800 w-full flex flex-col rounded-b-none pb-0">
    <div className="p-2">
      <Textarea />
    </div>
    <div className="p-2 pt-0 flex justify-between items-center">
      <Select/>
      <Button size="icon" id='send-message-button' type="submit" className='cursor-pointer'>
        <ArrowUp />
      </Button>
    </div>
  </div>
</div>
```

## Chat Streaming
Once the user input is received, the Frontend Client just sends an HTTP request to the Backend API and waits for the Streamed Response. This streamed response is streamed to the Client State so that it can be smoothly rendered in the UI - producing the "typing" animation on the interface.

```tsx
const response = await fetch(STREAMING_URL, {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    name: username,
    session_id: session_id,
    content: content,
    model: values.model
  })
})

if (!response.ok) {
  const errorData = await response.json()
  throw new Error(errorData.detail || 'Failed to send message')
}

const reader = response.body?.getReader()
if (!reader) {
  throw new Error('No reader available')
}

let accumulatedContent = ''

while (true) {
  const { done, value } = await reader.read()
  if (done) break

  // Convert the Uint8Array to text
  const chunk = new TextDecoder().decode(value)
  accumulatedContent += chunk

  // Update the message with accumulated content
  setMessages(prevMessages =>
    prevMessages.map(msg =>
      msg.id === tempMessageId
        ? { ...msg, content: accumulatedContent }
        : msg
    )
  )
}
```

## Conversation Display
I created a `<MessagesContainer/>` component that renders the messages from the backend (whether it be from user or assistant) and displays it through the UI
```tsx
function MessagesContainer({ messages }: { messages: MessageData[] }) {
  const { username } = Route.useSearch()

  return (
    <div className='flex flex-col grow pb-40'>
      {messages.map((message) => (
        <MessageBox key={message.id} message={message} />
      ))}

    </div>
  )
}
```

This component only "listens" for new messages from the user or the assistant via the Backend API and renders it to the UI as soon and as smoothly as possible

## QoL: Markdown Rendering and Syntax Highlighting
Since the chat UI was probably going to be used for coding tasks, one of the features that I tried to implement was syntax highlighting. This was hard to implement in React since there is not a lot of resources on doing it easily and allowing suport for multiple languages.

Fortunately, after a lot of trial and error, I was able to make [highlight.js](https://highlightjs.org/) work, which makes reading code snippets on the UI much easier on the eyes. I also found that explicitly stating to the LLM through the system prompt to follow best markdown practices by explicitly stating the language also helped

```tsx
<ReactMarkdown
  remarkPlugins={[remarkGfm]}
  rehypePlugins={[rehypeHighlight, rehypeRaw]}
  components={{
    // Custom styling for code blocks
    pre: ({ children, ...props }) => (
      <pre
        {...props}
        className="bg-gray-100 dark:bg-gray-800 rounded p-2 overflow-x-auto text-xs"
      >
        {children}
      </pre>
    ),
    // Custom styling for inline code
    code: ({ children, className, ...props }: any) => {
      const match = /language-(\w+)/.exec(className || '')
      return match ? (
        <code
          {...props}
          className={cn(
            'hljs',
            className,
            'bg-gray-100 dark:bg-gray-800 px-1 rounded text-xs  overflow-x-scroll scrollbar-thin'
          )}
        >
          {children}
        </code>
      ) : (
        <code
          {...props}
          className="bg-gray-100 dark:bg-gray-800 px-1 rounded text-xs overflow-x-scroll scrollbar-thin"
        >
          {children}
        </code>
      )
    },
  }}
>
  {message.content}
</ReactMarkdown>
```

## QoL: Auto-scroll
Additionally, one of the much-needed features was the auto-scroll for the UI, which makes the animation of the typing sequence in the streamed response much smoother. I honestly didn't know how to do it, but using AI and testing helped me figure it out
```tsx
  useEffect(() => {
    if (mainRef.current) {
      mainRef.current.scrollTo({
        top: mainRef.current.scrollHeight,
        behavior: 'smooth'
      })
    }
  }, [messages])
```

Note that the effect listens for the updates to the `messages` state, which allows me to scroll on each message update from the streamed response - this made the UI feel much more natural

# Testing
This phase of the project was admittedly not very hard, but it's actually one of parts I enjoyed the most. Truly, I haven't done any large-scale testing in my apps before since it was not needed, so having the time to practice it was both a good learning experience and also a fun task to do.

From research, I found that using [pytest](https://docs.pytest.org/en/stable/) was one of the industry standards. 

Some suggest that it would be good to make tests before doing the features, to have "test-driven development". For me, I believe that a mix of both is good. Since the tests are being created now, maintaining the project and adding more features in the future will be much easier.

Funnily enough, I was actually able to identify a bug in the Backend API after implementing testing, wherein I found that the `/session` endpoint was not implementing proper input validation, and not outputting the correct HTTP error code. 

Below are some examples of the tests I had done 

```python
def test_unknown_method():
    response = client.put("/health")
    assert response.status_code == 405
```
This test was actually good, since although FastAPI already handled the boilerplate for me, it gave me a wider view of the project

```python


@pytest.mark.selenium
def test_continue_with_long_username(driver):
    """Test that submitting the form with a long username shows the correct error message."""
    test_username = "testuser123" * 10
    expected_message = "Username is too long"

    time.sleep(1)  # Wait for page to load

    # Find and fill the username input
    username_input = driver.find_element("id", "username-input")
    username_input.send_keys(test_username)

    # Click the continue button
    button = driver.find_element("id", "continue-button")
    button.click()

    # Wait for the form message to appear (max 5 seconds)
    message_element = WebDriverWait(driver, 5).until(
        EC.presence_of_element_located((By.CSS_SELECTOR, '[data-slot="form-message"]'))
    )

    # Assert the message matches the expected text
    assert message_element.text == expected_message, (
        f"Expected message '{expected_message}' but got '{message_element.text}'"
    )
```

It was also fund doing an E2E test with the React frontend and ensuring all the form validation is correct by using [Selenium](https://www.selenium.dev/), the foremost web browser driver in Pythonland.

# Docker Deployment

Since I had already planned for deploying via Docker from the start, deploying each service became an easy task afterward. Thus, I only needed to double check all the config and connections if they were correct, since each service was already containerized and decoupled from one another.


### .env handling in Frontent
Unfortunately, I had some problems doing in the .env management in the Frontend since Vite does it differently from what I was accustomed to. They had their own `import.meta.env` syntax for declaring and accessing the environment secrets, and these were statically replaced at build time - a feature I did not understand truthfully. I found that the environment variables were being undefined at different areas (client and server) and erratically depending on the method (`process.env` or `import.meta.env`). There was also the `VITE_` prefix that was necessary in order for it to be readable on the client.

- .env handling in Vite: https://vite.dev/guide/env-and-mode
- the Stackoverflow post that made me find out it was Docker build: https://stackoverflow.com/questions/77486735/docker-with-vite-env-variables-are-undefined-inside-the-docker-container

Eventually, I found out that a weird quirk in the build process meant that the .env was not available to the builder in Docker. I only had to set it up as a build argument in the Dockerfile and it finally worked in the docker container.


### Container-to-container and Host-Container Networking in Docker
After deploying, I found that there were issues when running the entire stack in different environments and OS. I found that a weird quirk in the client-side (browser) code on the React app which was connecting to the Backend (through the Docker network) was failing because, depending on how the OS and Docker handles DNS, the IP address for the backend service was not accessible (because I also set static IPs in the `docker-compose.yml` file)

After rigorous testing across different devices, I refactored my Frontend app to have logic within the server-side environment to not have the browser/client side interface directly with the Backend/Database/Ollama. Furthermore, I simplified the port and IP assignment in the compose file by setting correct dependencies, and letting docker handle the IP assignment and networking e.g. `http://bd_backend:8000` instead of explicitly assigning `http://172.28.0.20:8000` (which, on hindsight, would really NOT be accessible on client side)

Furthermore, I found that exposing the services on the same ports on the containers as well as on host would reduce confusions on testing and development. For example, the backend was running on port `8002` on my host machine, which was forwarded to port `8000` of the container - it became a headache to put all of these assignments in your mental map. So I just simplified it

# Conclusions

Working on this project was a great learning experience for me. I got to use a mix of technologies—like Docker, Postgres, Langchain, Ollama, and React—and learned how they can all work together to build a seamless containerized full-stack app. Setting up things like Docker networks and managing dependencies with new tools like `uv` helped me understand more about modern development workflows.

I also had a lot of fun exploring new libraries and frameworks, and I enjoyed the challenge of connecting everything smoothly. Building both the backend and frontend gave me a better idea of how to design apps that are both powerful and easy to use.

Overall, this project expanded my vision and skillset. I feel more confident now in using these tools for future projects, and I’m excited to keep learning and building even more complex applications.