Skip to content

gjreda/scratch-pdf-bot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GPT PDF Chatbot

This is a prototype of a chatbot that can answer questions about PDFs. It uses OpenAI's API for language modeling, and LanceDB for vector storage and retrieval.

Setup

This uses Poetry for dependency management. To install dependencies:

$ poetry install

You'll also need to create a .env file and add your OPENAI_API_KEY to it (see .env.example).

Usage

The command below will run the pipeline on the papers directory, which contains a few PDFs. It will then start a REPL where you can ask questions about the PDFs. You can exit the Q&A loop by typing "exit" or cmd/ctrl + c.

$ poetry run python main.py --pdf_directory=papers

Note that the LanceDB database is included in this repo, so the creation and storage of embeddings within the ingestion code will be skipped unless you delete the .lancedb directory. This is included to make it easier to run the code without having to wait for the embeddings to be generated.

Demo

I wrote a brief blog post and recorded a demo video of this project here.

Demo video

About

Prototyping a question and answer bot over PDFs

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages