A Streamlit application that lets you chat with the contents of any GitHub repository. It uses FAISS for vector storage, Ollama for embeddings and language model, and PydanticAI as the agent framework to handle retrieval and question-answering.
- Features
- Demo
- Prerequisites
- Installation
- Configuration
- Usage
- Project Structure
- How It Works
- Customization
- Contributing
- License
- GitHub Repo Cloning: Clone any public GitHub repository and extract text from code and documentation files.
- Text Chunking: Split large text into manageable chunks for efficient vector storage.
- Vector Search: Build a FAISS index of embeddings to enable semantic similarity search.
- PydanticAI Agent: Leverage PydanticAI’s agent-and-tool architecture to handle retrieval and LLM calls.
- Ollama Integration: Use Ollama’s local LLM and embedding models for both embeddings and chat completions.
- Streamlit UI: Interactive web interface for loading repos and chatting with contents.
- Python 3.10+
- Ollama installed and running locally (default HTTP endpoint
http://localhost:11434/v1) - Git installed on your system
-
Clone this repository
git clone https://github.com/YourUsername/github-repo-chatbot-pydanticai.git cd github-repo-chatbot-pydanticai -
Create and activate a virtual environment
python3 -m venv venv source venv/bin/activate -
Install dependencies
pip install -r requirements.txt
requirements.txtshould include:streamlit pydantic-ai langchain-community langchain-ollama faiss-cpu # or faiss-gpu if you have GPU support
-
Run Ollama daemon
Ensure Ollama is up and running. By default it listens on
http://localhost:11434/v1. -
Model names
- Embedding model:
all-minilm:33m - LLM model:
llama3.2
You can change these in
app.pywhen initializingOllamaEmbeddingsandOpenAIModel. - Embedding model:
Launch the Streamlit app:
streamlit run app.py- Open the URL shown in the console (usually
http://localhost:8501). - Enter a GitHub repository URL in the sidebar (e.g.,
https://github.com/JustCodeIt7/GitHub_Repo_Chat). - Select file extensions to include (default:
.py, .md, .txt, .js, .html, .css, .json). - Click Load Repository to clone, process, and index the repo.
- Once loaded, ask questions in the chat interface about the repository’s contents.
├── app.py # Main Streamlit application
├── requirements.txt # Python dependencies
├── README.md # This documentation
├── docs/
│ └── chat-demo.gif # Demo GIF (optional)
└── .gitignore # Excludes venv, __pycache__, etc.
-
Clone & Extract
get_repo_textclones the repository into a temporary directory.- Walks through files with allowed extensions and concatenates their contents.
-
Chunking
split_textsplits the combined text into overlapping chunks of ~1000 characters.
-
Vector Store
create_vectorstorebuilds a FAISS index using OllamaEmbeddings on each chunk.
-
Agent & Retrieval
initialize_agentcreates a PydanticAIAgentwith anretrievetool that returns the top-4 similar chunks.- When a user query comes in, PydanticAI handles tool invocation (retrieval) and crafts the prompt for Ollama to answer.
-
Chat UI
- Streamlit displays previous messages and handles user input via
st.chat_input. - Responses from the agent are streamed back into the chat interface.
- Streamlit displays previous messages and handles user input via
- Chunk Size & Overlap: Adjust
chunk_sizeandoverlapinsplit_text. - Number of Docs: Change the
kparameter invectorstore.similarity_searchwithin theretrievetool. - Models: Swap out
OllamaEmbeddingsorOpenAIModelparameters for different model sizes or temperatures. - Additional Tools: Add more
@agent.toolfunctions to perform extra tasks (e.g., code search, summary, translation).
Contributions are welcome! Please open an issue or submit a pull request with improvements, bug fixes, or new features.
This project is licensed under the MIT License. See LICENSE for details.
