VisualAid.AI is a powerful AI-driven image analysis tool built using Streamlit and Google's Gemini AI. This application allows users to upload images and interact with an AI chatbot that can describe, analyze, and respond to queries about the uploaded image.
- Image Upload & Analysis: Upload images for AI-powered descriptions and analysis.
- AI Chatbot: Ask questions related to the uploaded image.
- Text-to-Speech: Converts AI-generated descriptions to audio.
- Dynamic UI: Aesthetic and interactive interface using custom CSS.
- Seamless Navigation: Start from a landing page and navigate easily.
- Frontend: Streamlit
- AI Model: Google Gemini API (gemini-1.5-flash)
- Backend: Python
- Audio Processing: gTTS (Google Text-to-Speech)
- Image Handling: PIL (Pillow)
-
Clone the Repository
git clone https://github.com/shreyazh/VisualAid.AI.git cd VisionAid-AI -
Create a Virtual Environment (Optional but Recommended)
python -m venv venv source venv/bin/activate # On Windows use: venv\Scripts\activate
-
Install Dependencies
pip install -r requirements.txt
-
Set Up API Key
- Replace
YOUR_GEMINI_API_KEYin the script with your Google Gemini API key.
- Replace
-
Run the Application
streamlit run app.py
- Launch the app and start from the landing page.
- Click "Get Started" to navigate to the Upload Image page.
- Upload an image, and the AI will analyze it.
- View the AI-generated description and listen to the text-to-speech output.
- Chat with the AI about the image.
- Stylish navigation bar
- Animated buttons and inputs
- Responsive image display
- Custom Streamlit styling
- Add support for real-time image processing.
- Implement multilingual support for text-to-speech.
- Enhance chatbot responses with contextual memory.
We welcome contributions! Feel free to fork the repo and submit PRs.
This project is licensed under the MIT License.
π― Developed with β€οΈ using Python