Python Codebase Teacher: Local AI RAG Agent

Codebase Teacher is a fully local RAG (Retrieval-Augmented Generation) application designed to help you understand, refactor, and document your own code.

Unlike cloud tools, this runs 100% offline using Ollama and FAISS. You simply paste the path to your local project, and the AI scans your files, allowing you to ask questions like "How does the authentication logic work?" or "Where is the user data stored?" without your proprietary code ever leaving your machine.

Demo

Tech Stack

App Framework: Streamlit

LLM Orchestration: LangChain (Classic & Community)

Local Inference: Ollama (llama3.2)

Vector Store: FAISS

Embeddings: HuggingFace (all-MiniLM-L6-v2)

Data Processing: Custom os.walk loader + TextLoader

Installation & Setup

Prerequisites

Python 3.10+ installed.
Ollama installed and running.

Pull the Model

Open terminal and pull the Llama 3.2 model:
ollama pull llama3.2

Clone & Install

git clone [https://github.com/YOUR_USERNAME/codebase-teacher.git](https://github.com/YOUR_USERNAME/codebase-teacher.git)
cd codebase-teacher

# Create Virtual Environment
- python -m venv venv

# Activate Environment
- venv\Scripts\activate (Windows) or source venv/bin/activate (Mac/Linux)

# Install Dependencies
- pip install -r requirements.txt

Run the App

streamlit run app.py

How It Works

The user pastes a local codebase directory and the app walks through each file looking for only certain files (.py, .md, .json, .txt). LangChain provides the RecursiveCharacterTextSplitter that can be specifically configured to process a certain programming language, we choose Python here. The splitter tries to group functions and classes together to prevent cut-offs. The all-MiniLM-L6-v2 model is leveraged to convert the chunks of code into vectors for processing, this process happens completely offline. FAISS is then used to store the vectors in RAM as Windows likes to block multiple programs from touching the same temp file on hard disk. When a user asks a question relevant code chunks are sent to Llama 3.2 via Ollama and is prompted to answer the question based solely on the context given.

Troubleshooting

requirements.txt not installing:
If you get install errors follow these steps:

Paste this in you virtual environment console:
pip install streamlit langchain langchain-community langchain-text-splitters langchain-huggingface python-dotenv sentence-transformers
Now install an earlier version of pytorch:
pip install torch==2.8.0
If you run into anymore loading errors just install as they pop up, there's a lot in this project

"Load a directory first" Warning:
If you see this warning, make sure you have entered a valid absolute path in the sidebar and clicked the "Load Directory" button.

Memory Usage:
If the app crashes during "Reading Directory...", try loading a smaller specific subfolder rather than the entire project root, or increase the chunk_size in the code to reduce the total number of vectors.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Python Codebase Teacher: Local AI RAG Agent

Tech Stack

Installation & Setup

How It Works

Troubleshooting

Contributing

License

About

Uh oh!

Releases

Packages

Languages

joshGilstrap/Python-Codebase-Teacher

Folders and files

Latest commit

History

Repository files navigation

Python Codebase Teacher: Local AI RAG Agent

Tech Stack

Installation & Setup

How It Works

Troubleshooting

Contributing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages