Skip to content

satvikndxd/skynet

Repository files navigation

Gemini Multimodal Playground

A web application that demonstrates the capabilities of Google's Gemini Pro Vision model for multimodal interactions. This playground allows users to experiment with image and text inputs to explore the model's understanding and response generation.

Features

  • Upload and process images with Gemini Pro Vision
  • Interactive chat interface with real-time responses
  • Support for both image and text inputs
  • Modern and intuitive user interface
  • Secure API authentication
  • Backend logging for debugging and analytics
  • Responsive design for desktop and mobile users
  • Easily extendable architecture for additional features

Architecture

The application follows a client-server architecture:

  • Frontend: Built with React.js (or any preferred frontend framework) for a seamless user experience
  • Backend: Uses Python (Flask/FastAPI/Django) to handle API requests and integrate with Gemini Pro Vision
  • Database: Optional database support (e.g., PostgreSQL, MongoDB) for storing user interactions and logs
  • Cloud Services: Google Cloud APIs for AI processing and authentication

Prerequisites

  • Python 3.8 or higher
  • Google Cloud API credentials (Gemini Pro Vision access required)
  • Node.js and npm (for frontend development)
  • Flask/FastAPI/Django (for backend API development)

Installation

  1. Clone the repository:
git clone https://github.com/yourusername/gemini-multimodal-playground.git
cd gemini-multimodal-playground
  1. Install dependencies:
pip install -r requirements.txt
  1. Set up your environment variables:
export GOOGLE_API_KEY=your_api_key_here

Usage

  1. Start the backend server:
python app.py  # Flask example
  1. Start the frontend application:
cd frontend
npm install
npm start
  1. Open your browser and navigate to http://localhost:5000 (or specified frontend port)

  2. Upload an image and start interacting with the Gemini Pro Vision model

API Integration

Request Format

{
  "text": "Describe the object in the image",
  "image": "base64-encoded-image-string"
}

Response Format

{
  "response": "This is a cat sitting on a chair."
}

Deployment

Docker (Optional)

Build and run the application using Docker:

docker build -t gemini-playground .
docker run -p 5000:5000 gemini-playground

Cloud Deployment

You can deploy this application on platforms like:

  • Google Cloud Run
  • AWS Lambda with API Gateway
  • Heroku
  • Vercel (for frontend)

Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Google's Gemini Pro Vision model
  • All contributors to this project
  • The open-source community

Contact

For any queries or suggestions, please open an issue in the GitHub repository or reach out via email.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors