A modern, full-stack web application that generates intelligent captions for images using OpenAI's Vision API. This project combines a beautiful React frontend with a robust Python Flask backend for seamless image analysis and caption generation.
- Drag & Drop Upload: Easy image upload with preview
- Batch Processing: Process multiple images at once
- Image Optimization: Automatic image optimization and EXIF data removal
- Multiple Formats: Support for PNG, JPG, JPEG, GIF, WebP
- GPT-4 Vision API: Using OpenAI's latest vision model
- Detailed Analysis: Get comprehensive image analysis
- Smart Descriptions: Context-aware caption generation
- Customizable Prompts: Tailor captions to your needs
- Dark/Light Theme: Toggle between themes
- Responsive Design: Works seamlessly on all devices
- Real-time Processing: Live caption generation
- Modern UI: Built with React, Tailwind CSS, and Framer Motion
- Caption History: Save and manage your caption history
- Export Options: Download captions and images
- Favorites: Mark and organize favorite captions
- Search & Filter: Easily find previous captions
- Multilingual Captions: Generate captions in different languages
- Translation API: Translate existing captions
- Language Detection: Automatic language detection
- Rate Limiting: API rate limiting for fair usage
- File Size Limits: Maximum 5MB file upload
- CORS Protection: Secure cross-origin requests
- Caching: Redis-based caching for performance
- React 18: Modern UI library
- Vite: Lightning-fast build tool
- Tailwind CSS: Utility-first CSS framework
- Framer Motion: Smooth animations
- React Dropzone: File upload handling
- React Hot Toast: Toast notifications
- Axios: HTTP client
- Python Flask: Web framework
- OpenAI Vision API: Image analysis
- Pillow: Image processing
- Flask-CORS: Cross-origin support
- Flask-Limiter: Rate limiting
- Redis: Caching (optional)
- Docker: Containerization
- Python 3.8+
- pip package manager
- OpenAI API key
- Redis (optional, for caching)
- Node.js 16.x or higher
- npm or yarn package manager
git clone git@github.com:itzmeahammed/Image-caption-generator-ai.git
cd image-captioncd backend
pip install -r requirements.txt
cp .env.example .env
# Edit .env and add your OpenAI API key
python app.pydocker-compose up -dcd frontend
npm install
npm run dev- Frontend:
http://localhost:5173 - Backend API:
http://localhost:5000
Create a .env file in the backend directory:
# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key_here
# Flask Configuration
FLASK_ENV=development
FLASK_DEBUG=True
# Redis Configuration (optional)
REDIS_URL=redis://localhost:6379
# API Configuration
MAX_FILE_SIZE=5242880
ALLOWED_EXTENSIONS=png,jpg,jpeg,gif,webp
# CORS Configuration
CORS_ORIGINS=http://localhost:5173,http://localhost:3000image-caption/
βββ backend/
β βββ app.py # Flask application
β βββ requirements.txt # Python dependencies
β βββ .env.example # Environment variables template
β βββ Dockerfile # Docker configuration
βββ frontend/
β βββ src/
β β βββ components/ # React components
β β βββ pages/ # Page components
β β βββ hooks/ # Custom hooks
β β βββ utils/ # Utility functions
β β βββ App.jsx # Main App component
β β βββ main.jsx # Entry point
β βββ public/ # Static assets
β βββ package.json # Node dependencies
β βββ vite.config.js # Vite configuration
β βββ tailwind.config.js # Tailwind CSS config
β βββ Dockerfile.dev # Development Docker config
βββ docker-compose.yml # Docker Compose configuration
βββ Dockerfile # Production Docker config
βββ setup.sh # Linux/Mac setup script
βββ setup.bat # Windows setup script
βββ test_api.py # API testing script
βββ test_backend.py # Backend testing script
βββ QUICKSTART.md # Quick start guide
βββ DEPLOYMENT.md # Deployment guide
βββ README.md # This file
- POST
/api/caption- Generate basic caption for an image{ "image": "base64_encoded_image", "style": "descriptive" }
- POST
/api/analyze- Get detailed image analysis{ "image": "base64_encoded_image", "detail_level": "high" }
- POST
/api/translate- Translate caption to different language{ "caption": "English caption", "target_language": "es" }
- GET
/api/history- Get caption history - DELETE
/api/history/<id>- Delete history entry
- Upload an image using drag & drop or file picker
- Wait for AI analysis
- View generated caption
- Copy, download, or share the caption
- Upload multiple images
- Select batch processing mode
- Generate captions for all images
- Download results as CSV or JSON
- Upload an image
- Select target language
- Generate caption in chosen language
- Translate to other languages as needed
python test_api.pypython test_simple.pypython test_backend.pydocker-compose up -ddocker-compose downdocker-compose logs -fcd frontend
npm run build
vercel --prod- Connect GitHub repository
- Set environment variables
- Deploy with Docker
See DEPLOYMENT.md for detailed instructions.
- API Key Protection: Never commit
.envfiles - Rate Limiting: Implemented to prevent abuse
- File Validation: Strict file type and size checks
- CORS Protection: Configured for allowed origins
- Input Sanitization: All inputs are validated
- OpenAI API Error: Verify API key is correct and has sufficient credits
- Port Already in Use: Change port in
app.pyor kill existing process - CORS Error: Check CORS configuration in
.env
- API Connection Failed: Ensure backend is running on correct port
- Image Upload Failed: Check file size and format
- Theme Not Persisting: Clear browser cache
- Container Won't Start: Check logs with
docker-compose logs - Port Conflicts: Change ports in
docker-compose.yml - Volume Issues: Ensure Docker has proper permissions
- QUICKSTART.md - Quick start guide
- DEPLOYMENT.md - Deployment instructions
- API Documentation - Backend API docs
Contributions are welcome! Please feel free to submit pull requests or open issues for bugs and feature requests.
MIT License - see LICENSE file for details
- OpenAI for the Vision API
- React and Vite communities
- Flask and Python communities
- All open-source contributors
Happy Captioning! πβ¨