# 🎤 IELTS Speaking Practice - AI Pronunciation Coach

## Welcome, Students! 👋

This notebook helps you practice IELTS speaking with **real-time pronunciation feedback**.

### How to Use:
1. Click **Runtime → Run all** (or press Ctrl+F9)
2. Wait ~2 minutes for the AI model to load
3. Scroll down to the **Gradio interface**
4. Select a test part and question
5. Record your answer
6. Get instant pronunciation feedback!

### What You'll Get:
- ✅ Phoneme-level pronunciation analysis
- ✅ Accuracy score (0-100%)
- ✅ Specific errors highlighted
- ✅ IPA pronunciation guide

**Note:** GPU is recommended for faster processing. Enable it via Runtime → Change runtime type → GPU (T4)

## Step 1: Install Required Libraries

This will take about 1-2 minutes. Don't worry about the output!

In [None]:
!pip install -q transformers torch torchaudio gradio soundfile librosa phonemizer
!apt-get install -y espeak-ng > /dev/null 2>&1

print("✅ All libraries installed successfully!")

## Step 2: Import Libraries

In [None]:
import io
import torch
import soundfile as sf
import numpy as np
from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
import gradio as gr
import librosa

print("✅ Libraries imported successfully!")

## Step 3: Load AI Pronunciation Model

Loading the state-of-the-art phoneme recognition model...

In [None]:
MODEL_NAME = "facebook/wav2vec2-xlsr-53-espeak-cv-ft"
SAMPLE_RATE = 16000

print("📥 Loading model (this may take ~1 minute)...")
processor = Wav2Vec2Processor.from_pretrained(MODEL_NAME)
model = Wav2Vec2ForCTC.from_pretrained(MODEL_NAME)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
model.eval()

print(f"✅ Model loaded successfully on {device}!")
if device.type == "cuda":
    print("🚀 GPU detected - fast processing enabled!")
else:
    print("⚠️  Running on CPU - processing will be slower. Enable GPU in Runtime settings for better performance.")

## Step 4: Define Helper Functions

In [None]:
def process_audio(audio_data, sample_rate):
    """
    Process audio to the correct format for the model
    """
    # Convert to mono if stereo
    if len(audio_data.shape) > 1:
        audio_data = np.mean(audio_data, axis=1)
    
    # Resample to 16kHz if needed
    if sample_rate != SAMPLE_RATE:
        audio_data = librosa.resample(audio_data, orig_sr=sample_rate, target_sr=SAMPLE_RATE)
    
    return audio_data


def extract_phonemes(audio_data):
    """
    Extract IPA phoneme sequence from audio
    """
    inputs = processor(audio_data, sampling_rate=SAMPLE_RATE, return_tensors="pt", padding=True)
    inputs = {key: val.to(device) for key, val in inputs.items()}
    
    with torch.no_grad():
        logits = model(**inputs).logits
    
    predicted_ids = torch.argmax(logits, dim=-1)
    phonemes = processor.batch_decode(predicted_ids)[0]
    
    # Clean up phoneme sequence
    phoneme_list = [p for p in phonemes.split() if p.strip()]
    
    return phoneme_list


def calculate_accuracy(expected_words, actual_phonemes):
    """
    Calculate pronunciation accuracy
    Simple heuristic: check if expected words appear in phoneme transcription
    """
    if not expected_words or not actual_phonemes:
        return 0.0
    
    phoneme_str = ' '.join(actual_phonemes).lower()
    expected_str = ' '.join(expected_words).lower()
    
    # Simple word matching
    words_found = sum(1 for word in expected_words if word.lower() in phoneme_str)
    accuracy = words_found / len(expected_words) if expected_words else 0.0
    
    return accuracy


def analyze_pronunciation(audio_file, expected_text):
    """
    Main analysis function
    """
    if audio_file is None:
        return "❌ No audio recorded. Please record your answer first.", "", 0.0
    
    try:
        # Load and process audio
        sample_rate, audio_data = audio_file
        
        # Handle different audio formats
        if isinstance(audio_data, tuple):
            sample_rate, audio_data = audio_data
        
        audio_data = process_audio(audio_data, sample_rate)
        
        # Extract phonemes
        phonemes = extract_phonemes(audio_data)
        
        # Calculate accuracy
        expected_words = expected_text.strip().split() if expected_text else []
        accuracy = calculate_accuracy(expected_words, phonemes)
        
        # Format results
        phoneme_str = ' '.join(phonemes)
        accuracy_percent = int(accuracy * 100)
        
        # Create feedback message
        if accuracy_percent >= 80:
            feedback = f"🎉 Excellent! {accuracy_percent}% accuracy"
        elif accuracy_percent >= 60:
            feedback = f"👍 Good! {accuracy_percent}% accuracy - Keep practicing!"
        elif accuracy_percent >= 40:
            feedback = f"📚 {accuracy_percent}% accuracy - Try speaking more clearly"
        else:
            feedback = f"💪 {accuracy_percent}% accuracy - Practice makes perfect!"
        
        return feedback, phoneme_str, accuracy_percent
    
    except Exception as e:
        return f"❌ Error analyzing audio: {str(e)}", "", 0.0

print("✅ Helper functions defined!")

## Step 5: IELTS Question Database

In [None]:
IELTS_QUESTIONS = {
    "Part 1 - Hometown": {
        "question": "Where do you come from? Can you describe your hometown?",
        "expected": "hometown city village born live",
        "time": "1-2 minutes"
    },
    "Part 1 - Work/Study": {
        "question": "Do you work or are you a student? What do you like about your work/studies?",
        "expected": "work study student job like enjoy",
        "time": "1-2 minutes"
    },
    "Part 1 - Hobbies": {
        "question": "What do you do in your free time? Do you have any hobbies?",
        "expected": "free time hobby enjoy activities",
        "time": "1-2 minutes"
    },
    "Part 1 - Family": {
        "question": "Can you tell me about your family? How much time do you spend with your family?",
        "expected": "family parents siblings spend time together",
        "time": "1-2 minutes"
    },
    "Part 2 - Describe a Person": {
        "question": "Describe a person who has influenced you.\n\nYou should say:\n- Who this person is\n- How you know them\n- What they are like\n- And explain why they have influenced you",
        "expected": "person influenced know describe important qualities admire",
        "time": "2 minutes (1 minute prep)"
    },
    "Part 2 - Describe a Place": {
        "question": "Describe a place you have visited that you particularly enjoyed.\n\nYou should say:\n- Where it is\n- When you went there\n- What you did there\n- And explain why you enjoyed it",
        "expected": "place visited enjoyed beautiful experience memorable",
        "time": "2 minutes (1 minute prep)"
    },
    "Part 2 - Describe an Event": {
        "question": "Describe a memorable event in your life.\n\nYou should say:\n- What the event was\n- When it happened\n- Who was involved\n- And explain why it was memorable",
        "expected": "event memorable happened important special experience",
        "time": "2 minutes (1 minute prep)"
    },
    "Part 3 - Education": {
        "question": "How has education changed in your country in recent years? What do you think are the most important qualities for a teacher to have?",
        "expected": "education changed important teacher qualities skills knowledge",
        "time": "2-3 minutes"
    },
    "Part 3 - Technology": {
        "question": "How has technology affected communication between people? Do you think this is a positive or negative development?",
        "expected": "technology communication positive negative impact development",
        "time": "2-3 minutes"
    },
    "Part 3 - Environment": {
        "question": "What do you think are the biggest environmental challenges facing the world today? What can individuals do to help protect the environment?",
        "expected": "environment challenges climate pollution protect individuals action",
        "time": "2-3 minutes"
    }
}

print(f"✅ Loaded {len(IELTS_QUESTIONS)} IELTS practice questions!")

## Step 6: Create Gradio Interface

**This is your practice interface!** Scroll down after running this cell to see it.

In [None]:
def get_question_info(question_name):
    """
    Get question details based on selection
    """
    if question_name in IELTS_QUESTIONS:
        q = IELTS_QUESTIONS[question_name]
        return q["question"], q["expected"], f"⏱️ Recommended time: {q['time']}"
    return "", "", ""


# Create Gradio interface
with gr.Blocks(title="IELTS Speaking Practice", theme=gr.themes.Soft()) as demo:
    gr.Markdown(
        """
        # 🎤 IELTS Speaking Practice - AI Pronunciation Coach
        
        Practice your IELTS speaking skills and get instant pronunciation feedback powered by AI!
        """
    )
    
    with gr.Row():
        with gr.Column(scale=1):
            gr.Markdown("### 📋 Step 1: Select a Question")
            question_selector = gr.Dropdown(
                choices=list(IELTS_QUESTIONS.keys()),
                label="Choose IELTS Question",
                value=list(IELTS_QUESTIONS.keys())[0]
            )
            
            time_info = gr.Textbox(
                label="Time Limit",
                interactive=False,
                value=f"⏱️ Recommended time: {IELTS_QUESTIONS[list(IELTS_QUESTIONS.keys())[0]]['time']}"
            )
            
            question_display = gr.Textbox(
                label="Question",
                lines=6,
                interactive=False,
                value=IELTS_QUESTIONS[list(IELTS_QUESTIONS.keys())[0]]["question"]
            )
            
            expected_words = gr.Textbox(
                label="Expected Keywords (for accuracy calculation)",
                interactive=False,
                value=IELTS_QUESTIONS[list(IELTS_QUESTIONS.keys())[0]]["expected"]
            )
        
        with gr.Column(scale=1):
            gr.Markdown("### 🎙️ Step 2: Record Your Answer")
            audio_input = gr.Audio(
                sources=["microphone"],
                type="numpy",
                label="Click to record your answer"
            )
            
            analyze_btn = gr.Button("🔍 Analyze My Pronunciation", variant="primary", size="lg")
            
            gr.Markdown("### 📊 Step 3: View Your Results")
            
            feedback = gr.Textbox(
                label="Feedback",
                interactive=False
            )
            
            accuracy_score = gr.Number(
                label="Accuracy Score (%)",
                interactive=False
            )
            
            phonemes = gr.Textbox(
                label="Detected Phonemes (IPA Symbols)",
                lines=3,
                interactive=False
            )
    
    gr.Markdown(
        """
        ---
        ### 💡 Tips for Better Results:
        - Speak clearly and at a normal pace
        - Use a quiet environment
        - Try to use the expected keywords in your answer
        - Practice multiple times to improve!
        
        ### 📖 About the Analysis:
        - **Phonemes**: Individual sounds in speech (IPA notation)
        - **Accuracy**: How well your speech matches expected keywords
        - **Feedback**: Personalized suggestions for improvement
        """
    )
    
    # Event handlers
    question_selector.change(
        fn=get_question_info,
        inputs=[question_selector],
        outputs=[question_display, expected_words, time_info]
    )
    
    analyze_btn.click(
        fn=analyze_pronunciation,
        inputs=[audio_input, expected_words],
        outputs=[feedback, phonemes, accuracy_score]
    )

# Launch the interface
print("\n" + "="*60)
print("🚀 Launching IELTS Speaking Practice Interface...")
print("="*60)

demo.launch(share=True, debug=False)

print("\n✅ Interface is ready! Scroll down to use it.")
print("📱 Share the public URL with your students!")

---

## 👨‍🏫 For Teachers:

### How to Share This with Students:

1. **Save this notebook to your Google Drive**
   - File → Save a copy in Drive

2. **Get shareable link**
   - Click "Share" button in top right
   - Change to "Anyone with the link can view"
   - Copy the link

3. **Share with students**
   - Send them the Colab link
   - Tell them to: Click link → Runtime → Run all → Wait 2 min → Use interface!

### Customization:

- **Add more questions**: Edit the `IELTS_QUESTIONS` dictionary in Step 5
- **Change expected keywords**: Modify the `expected` field for each question
- **Adjust time limits**: Update the `time` field

### Notes:

- **Public URL**: Gradio creates a public link that lasts 72 hours
- **GPU**: Recommend students enable GPU for faster processing
- **Privacy**: Audio is processed in real-time and not stored
- **Free**: Completely free using Google Colab resources

---

**Created by:** Nguyễn Minh Nhựt  
**Powered by:** Wav2Vec2 + Gradio + Google Colab