GitHub - pramodv1993/CodeQA: A utility that enables users to talk to any repository of their choice leveraging LLMs

CodeQA

Problem Statement:
Given a URL of a Github repository, the proposed solution enables the user to have a priliminary understanding of the repository by asking the system in a conversation style setup.

High Level Design:

3 main modules, each of which function as a standlone dockerized microservice:

API: That captures the main tasks of downloading the repository, processing it, embedding the same and so on. More details can be found here
UI: Simple interface to see the solution in action.
VecDB: A vector database that supports CRUD of vector embeddings as well as some metadata information.

Demo:

See demo here

QuickStart:

Include a .env file with a key for OPENAI_API_KEY="" in the API module (ie in this path)
Download the embedding model from here and place it in this location
The microservices are encapsulated as composable docker services. Hence run
docker-compose up ---build
at the root location.
You can find each of the modules in the following URLs:
- UI: localhost:8000
- API: localhost:8001/docs
- Qdrant Dashboard: localhost:6333/dashboard

More technical details:

UI: Streamlit is used for building a basic interactive app. Screenshot:
API: Fast API is used for implementing RESTful services. More details about the supported APIs can be found here.
VectorDB: Self-hosted Qdrant database is used as a vector database.

Next steps:

Improvements can be done at several places, Some of them (but not limited to) could be:
- Filtering strategies while downloading the repo
- Detecting programming language of the scripts and performing appropriate cleaning strategies
- Creating richer metadata for the scripts at both document and chunk level such as summaries of functions, comments, function names etc.
- More advanced strategies while embedding the scripts, to bring about "Contextual RAG".
I have tried to add comments (@TODO) in appropriate places in the scripts.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
api		api
media		media
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
compose.yaml		compose.yaml
requirements.txt		requirements.txt
ui_elements.py		ui_elements.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CodeQA

High Level Design:

Demo:

QuickStart:

More technical details:

Next steps:

About

Releases

Packages

Languages

pramodv1993/CodeQA

Folders and files

Latest commit

History

Repository files navigation

CodeQA

High Level Design:

Demo:

QuickStart:

More technical details:

Next steps:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages