# Overview

This notebook will outline the end to end overview of setting up the elevenlabs voice model, to utilizing the voice model to describe images recorded via openCV.

#### Outline
1. Train a voice model
2. Process and store images via webcam
3. Evaluate images via OpenAI
4. Narrate images via ElevenLabs

### Precursor - Download and prepare audio from youtube (example)

In [None]:
# yt-dlp requires ffmpeg, https://ffmpeg.org/
!python -m pip install yt-dlp

In [None]:
# download an audio file from youtube as an mp3
!yt-dlp -x --audio-format mp3 --audio-quality 196K -o "audio.%(ext)s" https://www.youtube.com/watch?v=GGoCBAo9N_g
!rm -rf audio.webm

In [None]:
# extract 3 min from the mp3
!ffmpeg -i audio.mp3 -ss 300 -t 180 out.mp3

### 0. Prepare dependencies

In [None]:
import os
import base64
import json
import errno
import time
import pandas as pd
from elevenlabs import Voice, VoiceSettings, play, stream
from elevenlabs.client import ElevenLabs
from dotenv import load_dotenv, find_dotenv

In [None]:
load_dotenv(find_dotenv()) # load .env file to be used with os.getenv()

### 1. Train a voice model
- Prepare files for training
- Conduct training
- Verify output

In [None]:
# initiate client session with elevenlabs.io
el_client = ElevenLabs(
	api_key=os.getenv("ELEVEN_API_KEY")
)

In [None]:
# create simple voice clone
voice = el_client.clone(
	name="SaDa",
	description="A voice purely for test purposes internally, not to be distributed",
	files=["out.mp3"]
)

In [None]:
type(voice)

#### Listing and selecting available voices (Elevenlabs)

In [None]:
# load the available voices
response = el_client.voices.get_all()
json_response = json.loads(response.json())
data_to_load = json_response['voices']
jdf = pd.DataFrame(data_to_load)
jdf.head()

In [None]:
# Expand the labels column into new columns of the original dataframe
labels_keys = set()
for labels in jdf['labels']:
	labels_keys.update(labels.keys())
for key in labels_keys:
	jdf[key] = jdf['labels'].apply(lambda x: x.get(key, None))
jdf.drop('labels', axis=1, inplace=True)

In [None]:
# create a simplified list of available voices
df = jdf[['voice_id','name','description','accent','age','use case','gender']].copy()
df

In [None]:
select_voice = df.iloc[46]['voice_id']
select_voice

#### Generating voice content

In [None]:
audio = el_client.generate(
	text="This is a test",
	voice=select_voice,
	stream=True
)
stream(audio)

In [None]:
# Use the voice ID as the argument for running main.py
print(select_voice)