# **Text-to-Speech Conversion using Sarvam AI API**

This notebook demonstrates how to convert text into speech using the Sarvam AI Text-to-Speech API.The resulting audio files are saved as `.wav` files.

## **Prerequisites**

Before running this notebook, ensure you have the following installed:

- Python 3.7 or higher
- Required Python packages: `sarvamai`

You can install the required packages using pip:

In [None]:
! pip install -Uqq sarvamai

## **Import Required Libraries**

First, let's import all the necessary libraries.

In [2]:
from sarvamai import SarvamAI
from sarvamai.play import play, save

### **2. Call the API endpoint through the SDK, by passing API Parameters**

To use the TTS Bulbul API, you need an API subscription key. Follow these steps to set up your API key:

1. **Obtain your API key**: If you don‚Äôt have an API key, sign up on the [Sarvam AI Dashboard](https://dashboard.sarvam.ai/) to get one.
2. **Replace the placeholder key**: In the code below, replace "YOUR_SARVAM_AI_API_KEY" with your actual API key.

In [3]:
SARVAM_API_KEY = "YOUR_SARVAM_API_KEY"

In [None]:
# Use the API key you set above
# SARVAM_API_KEY is already set in the previous cell

In [6]:
client = SarvamAI(api_subscription_key=SARVAM_API_KEY)

In [7]:
text = "Netaji Subhash Marg ‡§∏‡•á Dayanand Road ‡§ï‡•Ä ‡§§‡§∞‡§´, south ‡§ï‡•Ä ‡§§‡§∞‡§´‡§º ‡§ú‡§æ‡§®‡•á ‡§∏‡•á ‡§∂‡•Å‡§∞‡•Ç ‡§ï‡§∞‡•á‡§Ç‡•§ Dayanand Road ‡§™‡§∞ ‡§™‡§π‡•Å‡§Å‡§ö‡§®‡•á ‡§ï‡•á ‡§¨‡§æ‡§¶, ‡§¨‡§æ‡§è‡§Å ‡§Æ‡•Å‡§°‡§º ‡§ú‡§æ‡§è‡§Å‡•§ 350 meters ‡§§‡§ï ‡§∏‡•Ä‡§ß‡§æ ‡§ö‡§≤‡§§‡•á ‡§∞‡§π‡•á‡§Ç‡•§‡§Ü‡§™‡§ï‡•ã ‡§¨‡§æ‡§Ø‡•á‡§Ç ‡§§‡§∞‡§´‡§º, United Bank of India ATM ‡§¶‡§ø‡§ñ‡•á‡§ó‡§æ‡•§ Dayanand School ‡§ï‡•á ‡§¶‡§æ‡§è‡§Å ‡§§‡§∞‡§´‡§º ‡§∏‡•á ‡§ó‡•Å‡§ú‡§∞‡§®‡•á ‡§ï‡•á ‡§¨‡§æ‡§¶, ‡§¨‡§æ‡§è‡§Å ‡§Æ‡•Å‡§°‡§º‡•á‡§Ç‡•§ 120 meters ‡§ï‡•á ‡§¨‡§æ‡§¶, Ghata Masjid Road ‡§™‡§∞, right turn ‡§ï‡§∞‡•á‡§Ç‡•§ 280 meters ‡§§‡§ï ‡§ö‡§≤‡§§‡•á ‡§∞‡§π‡•á‡§Ç‡•§ Mahatma Gandhi Marg ‡§™‡•á ‡§∞‡§π‡•á‡§Ç ‡§î‡§∞, 2.9 kilometers ‡§§‡§ï Old Delhi ‡§ï‡•Ä ‡§§‡§∞‡§´ ‡§ú‡§æ‡§è‡§Å‡•§ ‡§´‡§ø‡§∞, HC Sen Marg ‡§™‡§∞ continue ‡§ï‡§∞‡•á‡§Ç, ‡§î‡§∞ Paranthe Wali Gali ‡§§‡§ï drive ‡§ï‡§∞‡•á‡§Ç‡•§"

## **Understanding the Parameters**  

| Parameter | Type | Description | Range/Options |
|-----------|------|-------------|---------------|
| `text` | string | Text to convert to speech | Required |
| `model` | string | TTS model to use | `bulbul:v2` |
| `target_language_code` | string | Output language in BCP-47 format | `en-IN`, `hi-IN`, `bn-IN`, `ta-IN`, `te-IN`, `gu-IN`, `kn-IN`, `ml-IN`, `mr-IN`, `pa-IN`, `od-IN` |
| `speaker` | string | Voice speaker to use | See speakers below |
| `pitch` | float | Voice pitch adjustment | -1.0 to 1.0 (default: 0.0) |
| `pace` | float | Speech rate | 0.5 to 2.0 (default: 1.0) |
| `loudness` | float | Volume control | 0.5 to 2.0 (default: 1.0) |
| `speech_sample_rate` | integer | Audio quality | 8000, 16000, or 24000 Hz |
| `enable_preprocessing` | boolean | Text preprocessing for mixed-language text | true/false (default: false) |

### **Available Voice Speakers**

**Female Voices:**
- **Anushka:** Clear and professional (default)
- **Manisha:** Warm and friendly
- **Vidya:** Articulate and precise
- **Arya:** Young and energetic

**Male Voices:**
- **Abhilash:** Deep and authoritative
- **Karun:** Natural and conversational
- **Hitesh:** Professional and engaging


In [8]:
response = client.text_to_speech.convert(
    text=text,
    target_language_code="hi-IN",
    speaker="anushka",
    enable_preprocessing=True,
)

### **3. Save/Play the audio output from TTS**

**To play audio, which you just inputted**

In [9]:
play(response)

**To save audio, which you just put for TTS to be saved**

In [10]:
save(response, "output.wav")

## **Output**

After running the notebook, you will have one `output.wav` file containing the speech for the text you passed.

## **Error Handling**

You may encounter these errors while using the API:  

| Error Code | HTTP Status | Cause | Solution |
|------------|-------------|-------|----------|
| `invalid_api_key_error` | 403 Forbidden | Invalid API key | Use a valid API key from the [Sarvam AI Dashboard](https://dashboard.sarvam.ai/) |
| `insufficient_quota_error` | 429 Too Many Requests | Exceeded API quota | Check your usage, upgrade if needed, or implement exponential backoff |
| `internal_server_error` | 500 Internal Server Error | Issue on servers | Try again later. If persistent, contact support |
| `invalid_request_error` | 400 Bad Request | Incorrect request formatting | Verify your request structure and parameters |
| `rate_limit_exceeded_error` | 429 Too Many Requests | Rate limit exceeded | Implement rate limiting and retry with backoff |

## **Conclusion**
This notebook provides a step-by-step guide to converting text into speech using the Sarvam AI Bulbul v2 TTS API. You can modify the text, language, speaker, and other parameters to suit your specific needs.

### **Additional Resources**

For more details, refer to the our official documentation and we are always there to support and help you on our Discord Server:

- **Documentation**: [docs.sarvam.ai](https://docs.sarvam.ai)  
- **Community**: [Join the Discord Community](https://discord.gg/hTuVuPNF)

---

### **Final Notes**

- Keep your API key secure.
- Use `enable_preprocessing=True` for mixed-language text.
- Experiment with different speakers (Anushka, Manisha, Vidya, Arya, Abhilash, Karun, Hitesh) to find the best voice for your use case.
- Adjust `pitch`, `pace`, and `loudness` to customize the speech output.

**Keep Building!** üöÄ