AI-powered knowledge graph construction from unstructured data
A sophisticated document processing backend that leverages AI-powered intelligent document analysis with flexible and modular preprocessing capabilities. The system uses FastAPI, LLM, and Neo4j to create an intelligent document processing pipeline.
- Advanced AI-driven document processing pipeline with robust error handling
- Multi-format document support and intelligent workflow management
- Scalable microservice architecture optimized for document intelligence
- Real-time status tracking for preprocessing steps
- Temporary subgraph creation for review before final integration
- Human-in-the-loop feedback system
- Python 3.11 or higher
- LLM API key
- Neo4j database access
- Install uv:
curl -LsSf https://astral.sh/uv/install.sh | sh - Create virtual env:
uv venv
- Install
libmagicsudo apt-get install libmagic1
-
uv sync
- Setup BAML
make install– sync Python dependencies via uvuv sync --group dev– install optional dev tools (e.g. Vulture for dead-code checks)make compose-up– start local Neo4j and Redis containers (see Local Development Guide)make lint,make test,make typecheck– run quality gates before committingmake test-unit,make test-integration– run just unit or integration slices as neededmake testnow emits coverage stats to the terminal and writescoverage.xmlfor CI toolingmake deadcode– run Vulture against the codebase to surface unused definitionsmake openapi-snapshot– regeneratetests/snapshots/openapi.jsonafter intentional API changes so contract tests stay green
The project will automatically start when you run it on Replit. The FastAPI server will be available at port 8000.
To manually start the server:
python -m app.mainThe API will be available at:
- API Documentation:
/api/v1/docs - OpenAPI Specification:
/api/v1/openapi.json
- All API requests must include a Clerk-issued bearer token:
Authorization: Bearer <token>. - Configure the backend with Clerk credentials via
.env:CLERK_JWKS_URL,CLERK_ISSUER,CLERK_AUDIENCE, andCLERK_API_KEYif server-to-server calls are required. - Clients no longer send the legacy
user-idheader; the backend derives the user from the JWT subject claim.
You can call the API from CI jobs or data pipelines without any extra backend code. Mint short-lived Clerk JWTs on demand and feed them to the Graphora client:
- Create (or reuse) a Clerk user that represents the pipeline and add a JWT template (e.g.
graphora_pipeline) whoseaudvalue matchesCLERK_AUDIENCE. - When the pipeline starts, create a token via Clerk's backend API using your Clerk API key:
curl -X POST "https://api.clerk.com/v1/users/<USER_ID>/tokens/graphora_pipeline" \\ -H "Authorization: Bearer $CLERK_API_KEY" \\ -H "Content-Type: application/json" \\ -d '{"expires_in_seconds": 3600}'
- Export the returned
tokenright before invoking the client:export GRAPHORA_AUTH_TOKEN="<clerk-jwt-from-step-2>" python pipeline.py
- Repeat the minting step whenever the token expires (keep TTLs short and rotate the Clerk API key like any other secret).
The graphora Python package automatically reads GRAPHORA_AUTH_TOKEN, so no application changes are required as long as the bearer token is valid.
app/
├── agents/ # AI agents for workflow and feedback
├── api/ # API endpoints (REST and GraphQL)
├── services/ # Core business logic services
├── schemas/ # Pydantic models and schemas
├── utils/ # Utility functions and helpers
└── main.py # Application entry point
-
Preprocessing Service
- Handles multi-step document preprocessing
- Provides real-time status updates
- Implements robust error handling
-
Extraction Service
- Manages entity and relationship extraction
- Integrates with OpenAI for intelligent processing
- Handles temporary graph creation
-
Graph Service
- Manages Neo4j database operations
- Handles subgraph creation and updates
- Processes user feedback
- Document Upload
POST /api/v1/documents/upload
Content-Type: multipart/form-data- Submit Feedback
POST /api/v1/feedback/{document_id}
Content-Type: application/jsonThe system implements comprehensive error handling:
- Status tracking for each preprocessing step
- Detailed error messages and logging
- Graceful failure recovery
- User-friendly error responses
The project is configured to run automatically with the following features:
- Auto-reload during development
- Production-ready ASGI server (uvicorn)
- Proper port configuration for Replit hosting
When deploying to production:
- Update CORS settings in
main.py - Configure proper logging levels
- Set up proper database credentials
- Enable rate limiting and security measures
We welcome contributions! Please see our Contributing Guide for details.
- Read the Code of Conduct
- Sign the Contributor License Agreement
- Check out good first issues
- Contributing Guide - How to contribute
- Repository Guidelines - Quick contributor reference
- Local Development Guide - Spin up dependencies and run the API locally
- Security Policy - How to report security issues
- Support - How to get help
- Trademark Policy - Trademark usage guidelines
- Frontend: graphora/graphora-fe
- Python Client: graphora/graphora-client
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).
- ✅ Use for free under AGPL v3 terms
- ✅ Modify and distribute with source code
- ❌ Cannot use as closed-source SaaS without commercial license
For commercial licensing (closed-source SaaS, enterprise deployments, OEM), contact: sales@graphora.io
See LICENSE for full terms.
- Enterprise Support: SLA-backed support for production deployments
- Consulting: Custom integrations, training, architecture design
- Commercial Licensing: Closed-source and SaaS deployments
- Database Vendor Partnerships: OEM licensing for database companies
Contact: support@graphora.io
- GitHub Discussions: Ask questions, share ideas
- Discord: Coming soon
- Twitter: Coming soon
Please report security vulnerabilities to support@graphora.io
See SECURITY.md for details.
Made with ❤️ by Arivan Labs