<a href="https://colab.research.google.com/github/hoodini/cohere-with-tts-colab/blob/main/Cohere%2C_LangChain_%26_11Labs_TTS_by_Yuval_Avidani.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Smart TTS with Cohere, LangChain and ElevenLabs API!
This notebook allows you to use langchain to get responses from Cohere, and then, based on the keys you set up, it will automatically understand which TTS service to use to reply with the best audio file that fits.
It supports both Hebrew and English.

## Developed by Yuval Avidani, by a friend and cite if you are using it!

@yuvalav on X

***Check out my AI Communities and Content:***
[https://linktr.ee/yuvai](https://linktr.ee/yuvai)

<img src="https://s3-prod-ue1-images.s3.amazonaws.com/image_studio/generated/bee1043327044956920a08330536a2fe.webp?AWSAccessKeyId=AKIAQDJRGGOPGCRKJ35P&Signature=z%2Byo%2BR8Rwo0L%2BHWi08wVeDk5etI%3D&Expires=1800919805" alt="Yuval's Image">



---------------------------------------
Instructions:
1. Run Cell 1, move to cell 2, replace TYPE_YOUR_COHERE_API_KEY_HERE with your actual Cohere API Key (you can get one for free on their website, this is mandatory!)
2. Enter your prompt in cell 3 and Run it.
3. Run cell number 4 and enjoy.

To re-run, just change the prompt in cell 3 and run again cells 3-4.

In order to use the 11labs voices, you need to create an account and get your api key, then select a voice id or create a custom one and provide the voice id. It can be instered in the key icon on the upper left hand side of the sidebar. Just add the name as it shows in the cells, and its values, and make sure to select Access to Notebook.

# 1. Install Dependencies

In [None]:
!pip install langchain langchain-openai cohere langchain-cohere faiss-cpu beautifulsoup4 langchainhub gtts

Collecting langchain
  Downloading langchain-0.1.16-py3-none-any.whl (817 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m817.7/817.7 kB[0m [31m9.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-openai
  Downloading langchain_openai-0.1.4-py3-none-any.whl (33 kB)
Collecting cohere
  Downloading cohere-5.3.4-py3-none-any.whl (151 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m151.2/151.2 kB[0m [31m10.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-cohere
  Downloading langchain_cohere-0.1.4-py3-none-any.whl (30 kB)
Collecting faiss-cpu
  Downloading faiss_cpu-1.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (27.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.0/27.0 MB[0m [31m32.1 MB/s[0m eta [36m0:00:00[0m
Collecting langchainhub
  Downloading langchainhub-0.1.15-py3-none-any.whl (4.6 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain)
  Downloading dataclasses_json-0

## Setting up Keys

In [None]:
import os
from google.colab import userdata

# Optional setting of environment variables from secrets
optional_vars = {
    'LANGCHAIN_TRACING_V2': userdata.get('LANGCHAIN_TRACING_V2'),
    'LANGCHAIN_API_KEY': userdata.get('LANGCHAIN_API_KEY'),
    'COHERE_API_KEY': userdata.get('COHERE_API_KEY'),
    'TAVILY_API_KEY': userdata.get('TAVILY_API_KEY'),
    'ELEVENLABS_API_KEY': userdata.get('ELEVENLABS_API_KEY'),
    'ELEVENLABS_VOICE_ID': userdata.get('ELEVENLABS_VOICE_ID'),
    'ELEVANLABS_VOICE_ID_SECOND': userdata.get('ELEVANLABS_VOICE_ID_SECOND')
}

# Set environment variables only if they are not None
for key, value in optional_vars.items():
    if value is not None:
        os.environ[key] = value

# Retrieving variables for use, handling None cases
api_key = os.getenv('ELEVENLABS_API_KEY')
voice_id = os.getenv('ELEVENLABS_VOICE_ID')
voice_id_2 = os.getenv('ELEVANLABS_VOICE_ID_SECOND')

## Defining the tools

In [None]:
import requests
from IPython.display import Audio, display
from gtts import gTTS
import os
import json

# Function to generate text-to-speech
def hebrew_text_to_speech(text, lang='iw'):
    tts = gTTS(text, lang=lang)
    filename = "output.mp3"
    tts.save(filename)
    return filename

# Function to convert speech to speech using ElevenLabs API
def convert_speech_to_speech(api_key, voice_id, input_audio_path):
    """Converts speech using the ElevenLabs API and saves the converted speech."""
    url = f"https://api.elevenlabs.io/v1/speech-to-speech/{voice_id}"
    headers = {"xi-api-key": api_key}
    files = {
        'audio': open(input_audio_path, 'rb'),
        'model_id': (None, 'eleven_english_sts_v2'),  # Adjust the model_id as necessary
        'voice_settings': (None, '{"stability": 0.5, "similarity_boost": 0.5}')  # Modify voice settings as desired
    }
    response = requests.post(url, headers=headers, files=files)
    if response.status_code == 200:
        output_path = 'converted_output.mp3'
        with open(output_path, 'wb') as f:
            f.write(response.content)
        return output_path
    else:
        return None

def detect_language(text):
  # Simple heuristic: check if the majority of characters are Hebrew
  return "he" if any("\u0590" <= c <= "\u05EA" for c in text) else "en"

def elevenlabs_tts(text, api_key, voice_id):
    if not api_key or not voice_id:
        print("API key or voice ID not provided for ElevenLabs. Falling back to gTTS.")
        return None
    url = f"https://api.elevenlabs.io/v1/text-to-speech/{voice_id}"
    payload = {"text": text}
    headers = {
        "xi-api-key": api_key,
        "Content-Type": "application/json"
    }
    response = requests.post(url, json=payload, headers=headers)
    if response.status_code == 200:
        audio_path = "elevenlabs_output.mp3"
        with open(audio_path, 'wb') as f:
            f.write(response.content)
        return audio_path
    else:
        print("Failed to generate TTS using ElevenLabs:", response.text)
        return None

# 2. Set up the Cohere Chain and System Prompt (MUST TYPE COHERE API KEY!)

In [None]:
from langchain_cohere import ChatCohere

llm = ChatCohere(cohere_api_key="TYPE_YOUR_COHERE_API_KEY_HERE")

from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful Assistant. You respond in short and concise answers, and you always respond in the same language the user used. If english - respond in english, if Hebrew - respond in Hebrew. You were create by Yuval Avidani, founder of HACKIT and YUVAL AI Community."),
    ("user", "{input}")
])

chain = prompt | llm

# Define a function to extract content from AIMessage object
def get_content_from_aimessage(aimessage):
    return aimessage.content  # Adjust this based on how the actual AIMessage object stores content

# Sample AIMessage object creation (simulated)
class AIMessage:
    def __init__(self, content):
        self.content = content

# 3. Type in the Prompt and get the response in a clean output

In [None]:
input = """
Who was the Lubavitcher Rabbi? Respond in 1 sentence.
"""

# Extract the content using the function defined above
content = get_content_from_aimessage(chain.invoke(input))

# Remove new lines and other unwanted characters
clean_text = content.replace('\n', ' ').replace('\r', '')

# Print the cleaned text
print(clean_text)

Menachem Mendel Schneerson was the seventh Lubavitcher Rebbe and is considered one of the most influential Jewish leaders of the 20th century.


# 4. The MAGIC: Smart TTS! This auto detects if 11Labs Key / Voice ID has been set in the environment variables. If so, the TTS will use it, otherwise, it will use the default gTTS.

The fun thing is you can type in your prompt in English or Hebrew and it will AUTOMATICALLY do the magic and respond with the relevant TTS Audio file.

In [None]:
# Detect language
language = detect_language(clean_text)
print(f"Detected language: {language}")

# Process text based on detected language and API key availability
if language == "he" and api_key and voice_id:
    print("Processing Hebrew with TTS and STS")
    tts_file = hebrew_text_to_speech(clean_text)
    tts_file = convert_speech_to_speech(api_key, voice_id, tts_file)
elif language == "he":
    print("Fallback to gTTS for Hebrew")
    tts_file = hebrew_text_to_speech(clean_text)
elif language == "en" and api_key and voice_id:
    print("Using ElevenLabs for English TTS")
    tts_file = elevenlabs_tts(clean_text, api_key, voice_id)
else:
    print("Fallback to gTTS for English")
    tts_file = hebrew_text_to_speech(clean_text, lang='en')

# Play and offer download for the TTS audio file
if tts_file:
    audio = Audio(tts_file, autoplay=True)
    display(audio)
    display(FileLink(tts_file, result_html_prefix="Click here to download: "))
else:
    print("Failed to process text-to-speech.")

Detected language: en
