DataChad V3🤖

This is an app that let's you ask questions about any data source by leveraging embeddings, vector databases, large language models and last but not least langchains

How does it work?

Upload any file(s) or enter any path or url to create Knowledge Bases which can contain multiple files of any type, format and content and create Smart FAQs which are lists of curated numbered Q&As.
The data source or files are loaded and splitted into text document chunks
The text document chunks are embedded using openai or huggingface embeddings
The embeddings are stored as a vector dataset to activeloop's database hub
A langchain is created consisting of a custom selection of an LLM model (gpt-3.5-turbo by default), multiple vector store as knowledge bases and a single special smart FAQ vector store
When asking questions to the app, the chain embeds the input prompt and does a similarity search in in the provided vector stores and uses the best results as context for the LLM to generate an appropriate response
Finally the chat history is cached locally to enable a ChatGPT like Q&A conversation

Good to know

The app only runs on py>=3.10!
To run locally or deploy somewhere, execute cp .env.template .env and set credentials in the newly created .env file. Other options are manually setting of system environment variables, or storing them into .streamlit/secrets.toml when hosted via streamlit.
If you have credentials set like explained above, you can just hit submit in the authentication without reentering your credentials in the app.
If you run the app consider modifying the configuration in datachad/backend/constants.py, e.g enabling advanced options
Your data won't load? Feel free to open an Issue or PR and contribute!
Use previous releases like V1 or V2 for original functionality and UI

How does it look like?

TODO LIST

If you like to contribute, feel free to grab any task

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.streamlit		.streamlit
datachad		datachad
static		static
.env.template		.env.template
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
packages.txt		packages.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.streamlit

.streamlit

datachad

datachad

static

static

.env.template

.env.template

.gitignore

.gitignore

Dockerfile

Dockerfile

LICENSE

LICENSE

README.md

README.md

app.py

app.py

packages.txt

packages.txt

requirements.txt

requirements.txt

Repository files navigation

DataChad V3🤖

How does it work?

Good to know

How does it look like?

TODO LIST

About

Releases 2

Packages

Contributors 2

Languages

License

gustavz/DataChad

Folders and files

Latest commit

History

Repository files navigation

DataChad V3🤖

How does it work?

Good to know

How does it look like?

TODO LIST

About

Topics

Resources

License

Stars

Watchers

Forks

Languages