# üéôÔ∏è Telugu Voice Banking Assistant - Proof of Concept

## What This Notebook Does
This is a beginner-friendly prototype that:
1. Converts Telugu speech to text (using OpenAI Whisper)
2. Understands banking questions
3. Generates responses in Telugu
4. Converts Telugu text back to speech

## Before You Start - Setup Checklist ‚úÖ

### 1. Get Your API Keys (Free/Low Cost)

**OpenAI API Key** (for speech recognition)
- Go to: https://platform.openai.com/signup
- Create account (free)
- Go to API Keys section
- Click "Create new secret key"
- Copy the key (starts with `sk-`)
- New users get $5 free credits!

**Anthropic API Key** (for understanding Telugu)
- Go to: https://console.anthropic.com/
- Sign up (free)
- Go to API Keys
- Create new key
- Copy the key
- New users get $5 free credits!

### 2. Prepare Audio Files
- Record 3-5 Telugu voice messages on your phone
- Banking questions like:
  - "‡∞®‡∞æ ‡∞ñ‡∞æ‡∞§‡∞æ ‡∞¨‡±ç‡∞Ø‡∞æ‡∞≤‡±Ü‡∞®‡±ç‡∞∏‡±ç ‡∞é‡∞Ç‡∞§?" (What's my balance?)
  - "‡∞ö‡∞ø‡∞µ‡∞∞‡∞ø ‡∞≤‡∞æ‡∞µ‡∞æ‡∞¶‡±á‡∞µ‡±Ä‡∞≤‡±Å ‡∞ö‡±Ç‡∞™‡∞ø‡∞Ç‡∞ö‡±Å" (Show recent transactions)
  - "‡∞®‡∞æ ‡∞ñ‡∞æ‡∞§‡∞æ ‡∞®‡∞Ç‡∞¨‡∞∞‡±ç ‡∞è‡∞Æ‡∞ø‡∞ü‡∞ø?" (What's my account number?)
- Save as .mp3, .wav, or .m4a files

### 3. Cost Estimate
- Whisper API: ~$0.006 per minute of audio
- Claude API: ~$0.01 per request
- **Total for testing: ~$2-3**

---

## üìù What You'll Learn From This Experiment
- How well does speech recognition work for Telugu?
- Can AI understand different Telugu accents?
- What banking questions do elderly users actually ask?
- Do responses sound natural in Telugu?
- What breaks or confuses the system?

---

## Step 1: Install Required Libraries

Run this cell first. It will install all the tools we need.

**What this does:**
- `openai`: For speech-to-text (Whisper) and text-to-speech
- `anthropic`: For understanding Telugu and generating responses
- `gtts`: Free text-to-speech (backup option)
- `pydub`: For handling audio files

In [None]:
# Install required packages (uses the kernel's own Python to avoid version mismatch)
import sys
!{sys.executable} -m pip install openai anthropic gtts pydub -q

print("‚úÖ All libraries installed successfully!")

## Step 2: Enter Your API Keys

‚ö†Ô∏è **IMPORTANT:** Keep these keys secret! Don't share this notebook with keys in it.

Replace `'your-openai-key-here'` and `'your-anthropic-key-here'` with your actual keys.

In [None]:
import os
from getpass import getpass
from openai import OpenAI
from anthropic import Anthropic

# üîë Enter your API keys securely (you'll be prompted to paste them)
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") or getpass("Enter your OpenAI API key: ")
ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY") or getpass("Enter your Anthropic API key: ")

# Initialize clients
openai_client = OpenAI(api_key=OPENAI_API_KEY)
anthropic_client = Anthropic(api_key=ANTHROPIC_API_KEY)

print("‚úÖ API keys configured!")

## Step 3: Upload Your Telugu Audio File

**How to upload:**
1. Click the folder icon on the left sidebar
2. Click the upload button (up arrow icon)
3. Select your Telugu audio file from your computer
4. Wait for upload to complete
5. Replace `'your_audio_file.mp3'` below with the exact filename

**Supported formats:** .mp3, .mp4, .mpeg, .mpga, .m4a, .wav, .webm

In [None]:
# üìÅ Enter your audio filename here
audio_filename = 'Telugu voice memos.mp3'  # Updated to your actual file

# Check if file exists
import os
if os.path.exists(audio_filename):
    print(f"‚úÖ Found audio file: {audio_filename}")
    file_size = os.path.getsize(audio_filename) / 1024  # Size in KB
    print(f"üìä File size: {file_size:.1f} KB")
else:
    print(f"‚ùå File not found: {audio_filename}")
    print("Please upload the file and update 'audio_filename' above to match your file's exact name.")
    # List available audio files in the current directory to help
    audio_extensions = ('.mp3', '.wav', '.m4a', '.mp4', '.mpeg', '.mpga', '.webm')
    available = [f for f in os.listdir('.') if f.lower().endswith(audio_extensions)]
    if available:
        print(f"\nüîç Audio files found in current directory: {available}")

## Step 4: Convert Speech to Text (Telugu ‚Üí Text)

**What this does:**
- Sends your audio to OpenAI Whisper
- Whisper transcribes the Telugu speech
- Shows you what it heard

**What to look for:**
- Is the transcription accurate?
- Did it capture the Telugu correctly?
- Any words misunderstood?

In [6]:
def transcribe_audio(audio_file_path):
    """
    Convert Telugu audio to text using OpenAI Whisper
    """
    try:
        print("üéôÔ∏è Transcribing audio...")
        
        with open(audio_file_path, 'rb') as audio_file:
            transcript = openai_client.audio.transcriptions.create(
                model="whisper-1",
                file=audio_file,
                prompt="Telugu language audio. ‡∞§‡±Ü‡∞≤‡±Å‡∞ó‡±Å ‡∞≠‡∞æ‡∞∑."  # Hint to Whisper that this is Telugu
            )
        
        transcribed_text = transcript.text
        print("\n" + "="*50)
        print("üìù TRANSCRIPTION:")
        print("="*50)
        print(transcribed_text)
        print("="*50 + "\n")
        
        return transcribed_text
        
    except Exception as e:
        print(f"‚ùå Error during transcription: {e}")
        return None

# Run transcription
transcribed_text = transcribe_audio(audio_filename)

üéôÔ∏è Transcribing audio...

üìù TRANSCRIPTION:
‡∞®‡∞ï‡±ç ‡∞ñ‡∞æ‡∞§‡∞æ ‡∞¨‡∞≤‡∞Ç‡∞∏‡±ç ‡∞é‡∞Ç‡∞§‡∞æ? ‡∞ú‡∞ø‡∞µ‡∞∞‡∞ø ‡∞≤‡∞æ‡∞µ‡∞æ ‡∞¶‡±á‡∞µ‡∞ø‡∞≤‡±Å ‡∞ö‡±Ç‡∞™‡∞ø‡∞Ç‡∞ö‡±Å. ‡∞®‡∞ï‡±ç ‡∞ñ‡∞æ‡∞§‡∞æ ‡∞®‡∞Ç‡∞¨‡∞∞‡±Å ‡∞é‡∞Ç‡∞§‡∞æ? ‡∞Æ‡±Ä‡∞∞‡±ç ‡∞ö‡±Ü‡∞™‡±ç‡∞™‡±á‡∞Ø‡∞¶‡∞ø ‡∞®‡∞æ‡∞ï‡±ç ‡∞µ‡∞æ‡∞∞‡±ç‡∞¶‡∞Æ‡±ç ‡∞Ü‡∞µ‡∞∞‡±Å‡∞°‡±ç ‡∞≤‡±á‡∞¶‡±Å. ‡∞®‡∞æ‡∞ï‡±ç ‡∞Ö‡∞ï‡∞æ‡∞Ç‡∞°‡±ç ‡∞®‡∞Ç‡∞¨‡∞∞‡±Å ‡∞®‡∞æ‡∞ï‡±ç ‡∞ó‡±Ç‡∞∞‡±ç‡∞§‡±Å‡∞ï‡±ç ‡∞≤‡±á‡∞¶‡±Å.



## Step 5: Understand the Banking Question

**What this does:**
- Sends the Telugu text to Claude
- Claude identifies what the person is asking about
- Categorizes the intent (balance check, transactions, help, etc.)
- Generates an appropriate response in Telugu

**What to look for:**
- Did it understand the question correctly?
- Is the response natural and helpful?
- Does it sound like something a bank would say?

In [7]:
def process_banking_query(telugu_text):
    """
    Understand Telugu banking question and generate response
    """
    if not telugu_text:
        print("‚ùå No text to process")
        return None
    
    try:
        print("ü§î Understanding the question...")
        
        prompt = f"""You are a helpful Telugu banking assistant for elderly customers.

The customer said (in Telugu): "{telugu_text}"

Please:
1. Identify what they're asking about (balance check, recent transactions, account number, help, etc.)
2. Provide a clear, respectful response in Telugu
3. For this demo, you can make up sample data (e.g., sample balance: ‚Çπ25,000)
4. Keep the response conversational and easy to understand
5. Use respectful terms appropriate for elderly users

Respond ONLY in Telugu, as if you're a real banking assistant."""
        
        message = anthropic_client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1000,
            messages=[{"role": "user", "content": prompt}]
        )
        
        response_text = message.content[0].text
        
        print("\n" + "="*50)
        print("ü§ñ BANKING ASSISTANT RESPONSE:")
        print("="*50)
        print(response_text)
        print("="*50 + "\n")
        
        return response_text
        
    except Exception as e:
        print(f"‚ùå Error processing query: {e}")
        return None

# Process the transcribed text
banking_response = process_banking_query(transcribed_text)

ü§î Understanding the question...

ü§ñ BANKING ASSISTANT RESPONSE:
‡∞®‡∞Æ‡∞∏‡±ç‡∞ï‡∞æ‡∞∞‡∞Ç ‡∞ó‡∞æ‡∞∞‡±Å! ‡∞Æ‡±Ä ‡∞∏‡∞Æ‡∞∏‡±ç‡∞Ø‡∞≤‡±Å ‡∞Ö‡∞∞‡±ç‡∞•‡∞Æ‡∞Ø‡±ç‡∞Ø‡∞æ‡∞Ø‡∞ø. ‡∞®‡±á‡∞®‡±Å ‡∞Æ‡±Ä‡∞ï‡±Å ‡∞∏‡∞π‡∞æ‡∞Ø‡∞Ç ‡∞ö‡±á‡∞∏‡±ç‡∞§‡∞æ‡∞®‡±Å.

**‡∞Æ‡±Ä ‡∞ñ‡∞æ‡∞§‡∞æ ‡∞µ‡∞ø‡∞µ‡∞∞‡∞æ‡∞≤‡±Å:**

üè¶ **‡∞ñ‡∞æ‡∞§‡∞æ ‡∞¨‡±ç‡∞Ø‡∞æ‡∞≤‡±Ü‡∞®‡±ç‡∞∏‡±ç:** ‚Çπ25,000 (‡∞á‡∞∞‡∞µ‡±à ‡∞ê‡∞¶‡±Å ‡∞µ‡±á‡∞≤‡±Å ‡∞∞‡±Ç‡∞™‡∞æ‡∞Ø‡∞≤‡±Å)

üì± **‡∞Æ‡±Ä ‡∞ñ‡∞æ‡∞§‡∞æ ‡∞®‡∞Ç‡∞¨‡∞∞‡±Å:** 1234567890123456

üí≥ **‡∞á‡∞ü‡±Ä‡∞µ‡∞≤‡∞ø ‡∞≤‡∞æ‡∞µ‡∞æ‡∞¶‡±á‡∞µ‡±Ä‡∞≤‡±Å:**
- 15/12 - ‡∞™‡±Ü‡∞®‡±ç‡∞∑‡∞®‡±ç ‡∞ú‡∞Æ - ‚Çπ18,000
- 17/12 - ATM ‡∞¶‡±ç‡∞µ‡∞æ‡∞∞‡∞æ ‡∞°‡∞¨‡±ç‡∞¨‡±Å ‡∞§‡±Ä‡∞∏‡±Å‡∞ï‡±Å‡∞®‡±ç‡∞®‡∞æ‡∞∞‡±Å - ‚Çπ3,000  
- 19/12 - ‡∞µ‡∞ø‡∞¶‡±ç‡∞Ø‡±Å‡∞§‡±ç ‡∞¨‡∞ø‡∞≤‡±ç‡∞≤‡±Å - ‚Çπ1,500

‡∞ó‡∞æ‡∞∞‡±Å, ‡∞Æ‡±Ä ‡∞ñ‡∞æ‡∞§‡∞æ ‡∞®‡∞Ç‡∞¨‡∞∞‡±Å ‡∞ó‡±Å‡∞∞‡±ç‡∞§‡±Å‡∞Ç‡∞ö‡±Å‡∞ï‡±ã‡∞µ‡∞°‡∞æ‡∞®‡∞ø‡∞ï‡∞ø ‡∞á‡∞¨‡±ç‡∞¨‡∞Ç‡∞¶‡∞ø ‡∞™‡∞°‡±Å‡∞§‡±Å‡∞®‡±ç‡∞®‡∞æ‡∞∞‡∞æ? ‡∞¶‡±Ä‡∞®‡±ç‡∞®‡∞ø ‡∞í‡∞ï ‡∞ï‡∞æ‡∞ó‡∞ø‡∞§‡∞Ç‡∞≤‡±ã ‡∞∞‡∞æ‡∞∏‡±Å‡∞ï‡±Å‡∞®‡∞ø ‡∞∏‡±Å‡∞

## Step 6: Convert Response to Speech (Text ‚Üí Voice)

**What this does:**
- Converts the Telugu text response to speech
- Creates an audio file you can play
- This completes the voice ‚Üí voice interaction loop

**Note:** We're using Google Text-to-Speech (gTTS) which is free but has limited Telugu voice quality.
For production, you'd use better TTS services.

**What to listen for:**
- Is the pronunciation clear?
- Would elderly users understand it?
- Does it sound natural or robotic?

In [8]:
from gtts import gTTS
from IPython.display import Audio, display
import os

def text_to_speech(telugu_text, output_filename="response.mp3"):
    """
    Convert Telugu text to speech
    """
    if not telugu_text:
        print("‚ùå No text to convert to speech")
        return None
    
    try:
        print("üîä Converting to speech...")
        
        # Create TTS object
        tts = gTTS(text=telugu_text, lang='te', slow=False)
        
        # Save to file
        tts.save(output_filename)
        
        print(f"‚úÖ Audio saved as: {output_filename}")
        print("\nüéß Click below to play:")
        
        # Play audio in notebook
        display(Audio(output_filename, autoplay=False))
        
        return output_filename
        
    except Exception as e:
        print(f"‚ùå Error creating speech: {e}")
        return None

# Convert response to speech
audio_response = text_to_speech(banking_response)

üîä Converting to speech...
‚úÖ Audio saved as: response.mp3

üéß Click below to play:


## üéØ Complete End-to-End Test

This cell runs the entire pipeline at once:
1. Upload audio ‚Üí Transcribe ‚Üí Understand ‚Üí Respond ‚Üí Convert to speech

Use this for quick testing with multiple audio files!

In [None]:
def complete_voice_banking_test(audio_file):
    """
    Complete pipeline: Voice input ‚Üí Voice output
    """
    print("\n" + "="*70)
    print("üöÄ STARTING COMPLETE VOICE BANKING TEST")
    print("="*70 + "\n")
    
    # Step 1: Transcribe
    print("Step 1/3: Speech to Text")
    transcribed = transcribe_audio(audio_file)
    if not transcribed:
        return
    
    # Step 2: Process query
    print("\nStep 2/3: Understanding Question & Generating Response")
    response = process_banking_query(transcribed)
    if not response:
        return
    
    # Step 3: Text to speech
    print("\nStep 3/3: Converting Response to Speech")
    audio_out = text_to_speech(response, f"response_{audio_file}")
    
    print("\n" + "="*70)
    print("‚úÖ COMPLETE! Test finished successfully.")
    print("="*70)

# Run the complete test
# Replace 'your_audio_file.mp3' with your actual filename
complete_voice_banking_test(audio_filename)

## üìä Testing Framework - Record Your Observations

Use this section to document what you learn!

### Test Multiple Audio Files

Upload 3-5 different audio files and test each one.
Record your observations below.

In [None]:
# Test multiple files at once
# Add your uploaded filenames to this list
test_files = [
    # 'recording1.mp3',  # <-- Replace with your actual filenames
    # 'recording2.wav',
    # 'recording3.m4a',
]

if not test_files:
    print("‚ö†Ô∏è No test files configured. Add your audio filenames to the 'test_files' list above.")
else:
    for audio_file in test_files:
        if os.path.exists(audio_file):
            print(f"\n\n{'='*70}")
            print(f"Testing: {audio_file}")
            print(f"{'='*70}")
            complete_voice_banking_test(audio_file)
        else:
            print(f"‚ùå File not found: {audio_file}")

## üìù Observation Template

Copy this template and fill it out for each test:

```
TEST #1
----------------
Speaker: [Elderly family member / You / Friend]
Question asked: [What they said in Telugu]
==================================================
üìù TRANSCRIPTION:
==================================================
‡∞®‡∞ï‡±ç ‡∞ñ‡∞æ‡∞§‡∞æ ‡∞¨‡∞≤‡∞Ç‡∞∏‡±ç ‡∞é‡∞Ç‡∞§‡∞æ? ‡∞ú‡∞ø‡∞µ‡∞∞‡∞ø ‡∞≤‡∞æ‡∞µ‡∞æ ‡∞¶‡±á‡∞µ‡∞ø‡∞≤‡±Å ‡∞ö‡±Ç‡∞™‡∞ø‡∞Ç‡∞ö‡±Å. ‡∞®‡∞ï‡±ç ‡∞ñ‡∞æ‡∞§‡∞æ ‡∞®‡∞Ç‡∞¨‡∞∞‡±Å ‡∞é‡∞Ç‡∞§‡∞æ? ‡∞Æ‡±Ä‡∞∞‡±ç ‡∞ö‡±Ü‡∞™‡±ç‡∞™‡±á‡∞Ø‡∞¶‡∞ø ‡∞®‡∞æ‡∞ï‡±ç ‡∞µ‡∞æ‡∞∞‡±ç‡∞¶‡∞Æ‡±ç ‡∞Ü‡∞µ‡∞∞‡±Å‡∞°‡±ç ‡∞≤‡±á‡∞¶‡±Å. ‡∞®‡∞æ‡∞ï‡±ç ‡∞Ö‡∞ï‡∞æ‡∞Ç‡∞°‡±ç ‡∞®‡∞Ç‡∞¨‡∞∞‡±Å ‡∞®‡∞æ‡∞ï‡±ç ‡∞ó‡±Ç‡∞∞‡±ç‡∞§‡±Å‡∞ï‡±ç ‡∞≤‡±á‡∞¶‡±Å.
==================================================
Transcription accuracy: [Excellent / Good / Poor] - Good
Did it understand the intent?: [Yes / No / Partially] - Yes
Response quality: [Natural / Acceptable / Robotic] - Robotic
Voice output quality: [Clear / Understandable / Unclear] - Clear 
Would the user be satisfied?: [Yes / No] - no (Lot of asterisk being spelled out)
Issues noticed: [List any problems] Some of the transcription was not perfectly right but the meaning was understood. The response repeated a lot of the 'asterisk' present in the output but that should be rectifiable
```

---

## üéì Next Steps After Testing

Based on your tests, consider:

### If transcription is poor:
- Research question: How to improve Telugu speech recognition for regional accents?
- Could fine-tuning Whisper on Telugu dialects help?

### If understanding is poor:
- Research question: How to build Telugu-specific intent classifiers?
- What banking terminology is unique to Telugu users?

### If users don't trust it:
- Research question: What security measures make elderly users comfortable?
- How to design voice authentication for Telugu speakers?

### If it works well:
- Research question: How to scale this to millions of users?
- What's the business model for regional language banking AI?

---

## üí° Ideas for Expansion

If this excites you, try:
1. Test with 5-10 different family members
2. Compare Telugu vs code-switched (Telugu + English) input
3. Add actual banking API integration (mock data for now)
4. Build simple security layer (voice verification)
5. Create user feedback survey in Telugu

---

## üÜò Troubleshooting

**"API key invalid"**
- Make sure you copied the entire key
- Check for extra spaces
- Regenerate key if needed

**"File not found"**
- Filename must match exactly (case-sensitive)
- File must be uploaded to Colab (see left sidebar)

**"Poor transcription quality"**
- Try recording in quieter environment
- Speak more clearly
- Check if audio file is corrupted

**"Rate limit exceeded"**
- You've hit free tier limits
- Wait a few minutes
- Or add credits to your API account

---

## üìß Share Your Results!

After testing, you'll have valuable insights about:
- Whether voice banking in Telugu is technically feasible
- What challenges are worth solving
- Whether this direction excites you for a PhD

Good luck with your experiments! üöÄ

---

**Created for:** PhD Research Exploration
**Focus:** Telugu Voice Banking for Elderly Users
**Difficulty:** Beginner-Friendly
**Estimated Cost:** $2-5 for testing