Skip to content

This repository contains the code for the information extraction app that uses langchain to extract a structured output from unstructured data for a particular schema.

License

Notifications You must be signed in to change notification settings

mohanbing/st_doc_ext

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

st_doc_ext

This repository contains the code for the information extraction app that uses langchain to extract a structured output from unstructured data for a particular schema.

Create and activate a venv

python -m venv <name_of_the_env>
source <name_of_the_env>/bin/activate

Pip install all requirements

pip install -r requirements.txt

Setup Streamlit Secrets File

This application communicates with the OCR API service to generate the OCR outputs. Spawn the OCR service and then create the secrets.toml file in .streamlit directory at root level and add the following fields to it.

Learn more about Secrets management in Streamlit at: https://docs.streamlit.io/streamlit-community-cloud/deploy-your-app/secrets-management

HOST_URL = ""
OCR_SERVICE_PORT = ""
OCR_PDF_RESP_ENDPOINT = "ocr_pdf"
OCR_IMG_RESP_ENDPOINT = "ocr_image"
OPENAI_API_KEY = ""
ALLOW_FREE = false

Run Streamlit App

To finally run the app:

streamlit run states.py

Experience the app!

Hosted with the help of Streamlit Cloud!

https://extractinfo.streamlit.app/

About

This repository contains the code for the information extraction app that uses langchain to extract a structured output from unstructured data for a particular schema.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published