Skip to content

Real-Time Photo Describer and Translator with Speech Output

Notifications You must be signed in to change notification settings

savka777/Photo-Translator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Photo Translator

Real-Time Photo Describer and Translator with Speech Output

This project is designed to capture an image from a camera, translate the main object in the image into a specified language using GPT-4V, and then generate an audio output of the translated text using ElevenLabs' API.

Features

  • Image Capture: Uses OpenCV to capture an image from the webcam.
  • Image Processing: Converts the captured image to a base64 encoded string for processing.
  • Translation: Sends the encoded image to OpenAI's GPT-4V, requesting a description of the central object in the specified language.
  • Audio Output: Utilizes ElevenLabs API to convert the translated text into spoken audio.

You need to have API keys for OpenAI and ElevenLabs You need to install the required packages (pip install opencv-python-headless pillow requests elevenlabs)

About

Real-Time Photo Describer and Translator with Speech Output

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages