Skip to content

nickschu/doc-visualizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 

Repository files navigation

[WIP] Doc Visualizer

Doc Visualizer is a full-stack application that takes a 10-K financial document and produces dynamic visualizations for the most important insights in the document.

Table of Contents

  1. Overview
  2. Tech Stack
  3. Project Structure
  4. Setup & Usage

Overview

Doc Visualizer is designed to help users quickly analyze large financial documents (like SEC 10-K filings). It does the following:

  1. Parses the document into text chunks.
  2. Embeds the chunks using OpenAI’s embeddings API.
  3. Stores embeddings in Pinecone for quick retrieval.
  4. Uses GPT to identify key insights from the document.
  5. Generates chart or module specifications that can be rendered dynamically in the front-end.

The end goal is to turn dense text content into meaningful data visualizations so users can spot important trends, performance metrics, and risk factors more quickly.


Tech Stack

Backend

  • FastAPI: For the REST API, background tasks, and overall application logic.
  • Python: Core language for the backend.
  • OpenAI API: For text embeddings using embeddings API and for summarizing documents, generating insights, and chart specs using GPT API.
  • Pinecone: Vector database to store embeddings for RAG.
  • Pydantic: For data validation and modeling (e.g., chart specs, insights).

Frontend


Project Structure

[WIP] A simplified view:

├── backend
│   └── app
│       ├── routers
│       ├── services
│       └── main.py
└── frontend
    ├── app
    │   ├── api
    │   │   ├── generate-visualization
    │   │   └── upload
    │   ├── upload
    │   └── visualize
    └── components
        ├── charts
        └── visualization

Setup & Usage

  1. Clone the Repository

    git clone https://github.com/your-username/doc-visualizer.git
    cd doc-visualizer
  2. Set Up Backend

    1. Create a virtual environment:
      cd backend
      python -m venv venv
      source venv/bin/activate
    2. Install dependencies:
      pip install -r requirements.txt
    3. Configure environment variables (see .env.example)
    4. Start the FastAPI server:
      uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
  3. Set Up Frontend

    1. Navigate to frontend directory:
      cd frontend
    2. Install dependencies:
      npm install
    3. Configure environment variables (see .env.example)
    4. Run dev server:
      npm run dev
    5. Visit http://localhost:3000.
  4. Upload a PDF

    • Go to [frontend_base_url]/upload (e.g., http://localhost:3000/upload).
    • Select a PDF (10-K).
    • The system will generate the visualization at [frontend_base_url]/visualize/[doc-id]`.

Thank you for checking out Doc Visualizer!
For questions, issues, or contributions, please open an issue or a pull request on GitHub. Happy visualizing!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published