Convert text to speech using the ElevenLabs
client. Ensure rawtext.md
is updated, all environment keys are loaded, and outputs are available in .mp3
, .txt
, and .log
formats.
You can see sample output here: https://github.com/smol-ai/temp
- make sure ffmpeg is installed
brew install imagemagick
-
Install Dependencies:
pip install -r requirements.txt
-
Set Environment Variables:
export ELEVENLABS_API_KEY=your_api_key export OPENAI_API_KEY=your_api_key export CARTESIA_API_KEY=your_api_key # note that this thing generates a lot of tokens. we used up 52k cahracters just developing this.
-
Update
rawtext.md
:- Add or modify the text you want to convert to speech.
-
Run the Script:
python main.py
To generate a video from the audio and transcript, you can use the video.py
script. This script uses the MoviePy library to combine the audio and image, and the OpenAI library to generate a default image using DALL·E.
- Make sure you have the
moviepy
andopenai
libraries installed. You can install them using pip:pip install moviepy openai
-
Set Environment Variables:
export OPENAI_API_KEY=your_api_key
-
Run the Script:
python video.py
- Video:
final_video.mp4
file
- The
video.py
script assumes that thecombined_dialogue.mp3
anddialogue_transcript.txt
files are present in the same directory. - The script generates a default image using DALL·E and resizes it to 1080x1080 pixels.
- The script combines the audio and image to create a video, and adds captions using the transcript.
- The final video is saved as
final_video.mp4
in the same directory.
- Audio:
.mp3
files - Transcript:
.txt
files - Logs:
.log
files