# **Text-to-Speech Conversion using Sarvam AI API**

This notebook demonstrates how to convert text into speech using the Sarvam AI Text-to-Speech API.The resulting audio files are saved as `.wav` files.

## **Prerequisites**

Before running this notebook, ensure you have the following installed:

- Python 3.7 or higher
- Required Python packages: `sarvamai`

You can install the required packages using pip:

In [None]:
! pip install -Uqq sarvamai

## **Import Required Libraries**

First, let's import all the necessary libraries.

In [2]:
from sarvamai import SarvamAI
from sarvamai.play import play, save

### **2. Call the API endpoint through the SDK, by passing API Parameters**

To use the TTS Bulbul API, you need an API subscription key. Follow these steps to set up your API key:

1. **Obtain your API key**: If you don’t have an API key, sign up on the [Sarvam AI Dashboard](https://dashboard.sarvam.ai/) to get one.
2. **Replace the placeholder key**: In the code below, replace "YOUR_SARVAM_AI_API_KEY" with your actual API key.

In [3]:
SARVAM_API_KEY = "YOUR_SARVAM_API_KEY"

In [None]:
# Use the API key you set above
# SARVAM_API_KEY is already set in the previous cell

In [6]:
client = SarvamAI(api_subscription_key=SARVAM_API_KEY)

In [7]:
text = "Netaji Subhash Marg से Dayanand Road की तरफ, south की तरफ़ जाने से शुरू करें। Dayanand Road पर पहुँचने के बाद, बाएँ मुड़ जाएँ। 350 meters तक सीधा चलते रहें।आपको बायें तरफ़, United Bank of India ATM दिखेगा। Dayanand School के दाएँ तरफ़ से गुजरने के बाद, बाएँ मुड़ें। 120 meters के बाद, Ghata Masjid Road पर, right turn करें। 280 meters तक चलते रहें। Mahatma Gandhi Marg पे रहें और, 2.9 kilometers तक Old Delhi की तरफ जाएँ। फिर, HC Sen Marg पर continue करें, और Paranthe Wali Gali तक drive करें।"

## **Understanding the Parameters**  

| Parameter | Type | Description | Range/Options |
|-----------|------|-------------|---------------|
| `text` | string | Text to convert to speech (max 2500 chars for v3) | Required |
| `model` | string | TTS model to use | `bulbul:v3` (latest), `bulbul:v2` (legacy) |
| `target_language_code` | string | Output language in BCP-47 format | `en-IN`, `hi-IN`, `bn-IN`, `ta-IN`, `te-IN`, `gu-IN`, `kn-IN`, `ml-IN`, `mr-IN`, `pa-IN`, `od-IN` |
| `speaker` | string | Voice speaker to use | See speakers below |
| `pace` | float | Speech rate | 0.5 to 2.0 (default: 1.0) |
| `temperature` | float | Controls expressiveness and randomness | 0.01 to 2.0 (default: 0.6) |
| `speech_sample_rate` | integer | Audio quality | 8000, 16000, 22050, 24000 (default), 32000, 44100, 48000 Hz |

> **Note:** `pitch`, `loudness`, and `enable_preprocessing` are **NOT supported** in bulbul:v3. Preprocessing is automatically enabled in v3.

### **Available Voice Speakers (bulbul:v3)**

**Default:** Shubh

**All Speakers:**
Shubh, Aditya, Ritu, Priya, Neha, Rahul, Pooja, Rohan, Simran, Kavya, Amit, Dev, Ishita, Shreya, Ratan, Varun, Manan, Sumit, Roopa, Kabir, Aayan, Ashutosh, Advait, Amelia, Sophia, Anand, Tanya, Tarun, Sunny, Mani, Gokul, Vijay, Shruti, Suhani, Mohit, Kavitha, Rehan, Soham, Rupali


In [8]:
response = client.text_to_speech.convert(
    text=text,
    target_language_code="hi-IN",
    model="bulbul:v3",
    speaker="shubh",
    pace=1.0,
    temperature=0.6,
)

### **3. Save/Play the audio output from TTS**

**To play audio, which you just inputted**

In [9]:
play(response)

**To save audio, which you just put for TTS to be saved**

In [10]:
save(response, "output.wav")

## **Output**

After running the notebook, you will have one `output.wav` file containing the speech for the text you passed.

## **Error Handling**

You may encounter these errors while using the API:  

| Error Code | HTTP Status | Cause | Solution |
|------------|-------------|-------|----------|
| `invalid_api_key_error` | 403 Forbidden | Invalid API key | Use a valid API key from the [Sarvam AI Dashboard](https://dashboard.sarvam.ai/) |
| `insufficient_quota_error` | 429 Too Many Requests | Exceeded API quota | Check your usage, upgrade if needed, or implement exponential backoff |
| `internal_server_error` | 500 Internal Server Error | Issue on servers | Try again later. If persistent, contact support |
| `invalid_request_error` | 400 Bad Request | Incorrect request formatting | Verify your request structure and parameters |
| `rate_limit_exceeded_error` | 429 Too Many Requests | Rate limit exceeded | Implement rate limiting and retry with backoff |

## **Conclusion**
This notebook provides a step-by-step guide to converting text into speech using the Sarvam AI Bulbul v3 TTS API. You can modify the text, language, speaker, and other parameters to suit your specific needs.

### **Additional Resources**

For more details, refer to our official documentation and we are always there to support and help you on our Discord Server:

- **Documentation**: [docs.sarvam.ai](https://docs.sarvam.ai)  
- **Community**: [Join the Discord Community](https://discord.gg/hTuVuPNF)

---

### **Final Notes**

- Keep your API key secure.
- Bulbul v3 has automatic preprocessing enabled — no need to set `enable_preprocessing`.
- Experiment with 30+ speakers (Shubh, Aditya, Ritu, Priya, Neha, Rahul, Pooja, Rohan, Simran, Kavya, etc.) to find the best voice for your use case.
- Adjust `pace` and `temperature` to customize the speech output. Lower temperature for stable output, higher for more expressive speech.
- `pitch` and `loudness` are NOT supported in bulbul:v3.

**Keep Building!**