This FastAPI-based API processes financial document images using Google's Gemini AI model to extract structured information. The API supports various financial document types and returns the extracted data in JSON format.
- Image upload and processing with Gemini AI
- Support for multiple document types:
- PT Sheet (Old and New formats)
- Daybook
- One Time Info Sheet
- Robust JSON correction and validation
- CORS enabled for cross-origin requests
- Ready for Vercel deployment
- Python 3.8 or higher
- Google Cloud API key for Gemini AI
- Clone the repository:
git clone <your-repo-url>
cd <your-repo-name>- Create and activate a virtual environment:
python -m venv venv
# On Windows:
.\venv\Scripts\activate
# On Unix or MacOS:
source venv/bin/activate- Install dependencies:
pip install -r requirements.txt- Create a
.envfile in the project root:
GOOGLE_API_KEY=your_gemini_api_key_here- Run the API locally:
uvicorn main:app --reloadThe API will be available at http://localhost:8000
- Welcome message and API information
- Returns: List of available endpoints and their descriptions
Process an image and extract information based on document type.
Parameters:
file: Image file (form-data)document_type: (Optional) Type of document to process- Options: "PT Sheet Old", "PT Sheet New", "Daybook", "One Time Info Sheet"
Example using cURL:
curl -X POST "http://localhost:8000/process-image/" \
-H "accept: application/json" \
-H "Content-Type: multipart/form-data" \
-F "file=@your_image.jpg" \
-F "document_type=Daybook"Example using Python requests:
import requests
url = "http://localhost:8000/process-image/"
files = {
'file': open('your_image.jpg', 'rb')
}
data = {
'document_type': 'Daybook'
}
response = requests.post(url, files=files, data=data)
print(response.json()){
"Day 1": {
"Date": "15/10/2025",
"Cash Sale": 5000.0,
"Credit Sale": 2000.0,
"Cash Collection": 1500.0,
"Cash Purchase": 3000.0,
"Credit Purchase": 1000.0,
"Cash Paid to Suppliers": 2500.0,
"Transportation": 200.0,
"Owner's Wages": 500.0,
"Workers' Wages": 1000.0,
"Electricity": 300.0,
"Repairs": 150.0,
"Other Cost": 100.0,
"Other Income": 50.0,
"Loan": 0.0,
"Interest": 0.0,
"Amount": 750.0
}
}- Install Vercel CLI:
npm i -g vercel- Deploy:
vercel- Set up environment variables in Vercel:
- Go to your project settings
- Add the
GOOGLE_API_KEYenvironment variable
- Start the API locally:
uvicorn main:app --reload- Open the interactive Swagger documentation:
- Visit
http://localhost:8000/docsin your browser - Test the endpoints using the interactive interface
- Try different document types:
- Upload sample images for each document type
- Verify the JSON output matches the expected structure
- Check error handling by uploading invalid files
The API includes comprehensive error handling:
- Invalid file types
- Malformed images
- Invalid document types
- AI processing errors
- JSON parsing errors
Error responses include detailed messages to help identify and resolve issues.
CORS is enabled for all origins by default. Modify the CORS settings in main.py if you need to restrict access.