The aim of this repository is to learn more about the fundamentals of modern, scalable web applications by designing, building and deploying an AI-powered chat app from scratch. We will have full control over the language model, data, infrastructure and costs.
This will hopefully provide an glimpse of how real-world scalable AI systems work under the hood. The focus will be on engineering, backend and cloud deployment rather than the language model output or a fancy frontend. The app is built with FastAPI, PostgreSQL, llama.cpp, nginx, and Docker.
There are two blogposts that accompany this repository. Please read them for a detailed walkthrough of the code.
- Designing, Building & Deploying an AI Chat App from Scratch (Part 1): Microservices Architecture and Local Development
- Designing, Building & Deploying an AI Chat App from Scratch (Part 2): Cloud Deployment and Scaling
Here is a quick demo of the app when deployed to Azure. It shows starting a new chat, coming back to that same chat, or starting a second chat.
The local app is built with microservices. Each service has its own directory. They communicate over a private network through HTTP requests. For a detailed walkthrough, check the blogposts.

- Language model API, CPU-based llama.cpp language model inference server with Alibaba Cloud's Qwen2.5-0.5B-Instruct model.
- Database, a PostgreSQL database server that stores chats and messages.
- Database API, a FastAPI+Uvicorn Python server that queries the PostgreSQL database.
- User interface, a FastAPI+Uvicorn Python server that serves HTML and supports session-based authentication.
- Nginx reverse proxy, a gateway between the outside world and network-isolated services.
To run the app with all its services locally, we use Docker Compose. Download Docker & Docker Compose, navigate to the root directory of this project, download the model into the lm-api/models directory, and start the app with Docker Compose:
cd $PROJECT_PATH
wget https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GGUF/resolve/main/qwen2.5-0.5b-instruct-q5_k_m.gguf -P lm-api/models
docker compose up --buildTo run an individual service manually, look at the README.md in the chat-ui, db-api, lm-api, nginx and db directories.
To deploy the app to Microsoft Azure, create a (free) Azure account and install Docker. Then run the deploy_app.sh bash script. This installs the Azure CLI and creates all the infrastructure and resources we need to run our app on Azure.
cd $PROJECT_PATH
bash azure-deployment/deploy_app.shHere is a diagram of the cloud architecture.

To run the containerized app locally:
- Docker 26.6.1
- Docker compose 2.27.0
To run the database API and UI Python servers locally:
- Python 3.12.0
- packages in
chat-ui/requirements.txtanddb-api/requirements.txt
To deploy to Azure
- Azure account (free tier is enough), I'm using a free education account.
- Docker 26.6.1
- Azure CLI 2.28.0
