# Dynamic ASL Recognition - Words & Phrases
### Real-time Recognition of ASL Words with Movement

**This extends your static ASL model to recognize dynamic signs (words with movement)**

---
## üîß STEP 1: Install Required Libraries

In [None]:
!pip install mediapipe opencv-python kaggle --quiet

print("‚úÖ Installation complete!")
print("\nüî¥ RESTART RUNTIME: Runtime ‚Üí Restart runtime")
print("Then run Step 2")

---
## üìö STEP 2: Import Libraries (After Restart)

In [None]:
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import cv2
import mediapipe as mp
import os
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import matplotlib.pyplot as plt
from google.colab.patches import cv2_imshow
from google.colab import files
import pickle
from IPython.display import display, Javascript, HTML
from google.colab.output import eval_js
from base64 import b64decode
import time

print("‚úÖ All libraries imported!")
print(f"TensorFlow: {tf.__version__}")
print(f"MediaPipe: {mp.__version__}")

---
## üìÇ STEP 3: Download Dynamic ASL Dataset

We'll use the **WLASL** (Word-Level American Sign Language) dataset or create our own training data.

In [None]:
print("üìÅ Upload your kaggle.json file")
print("Get from: https://www.kaggle.com/settings\n")

uploaded = files.upload()

!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

print("\n‚úÖ Kaggle configured!")

In [None]:
# For this demo, we'll create our own dataset by recording sequences
# In production, you'd use a proper video dataset like WLASL

print("üìä For dynamic signs, we need VIDEO sequences")
print("\nüéØ We'll collect data in real-time for these words:")
print("   - HELLO")
print("   - THANK YOU")
print("   - PLEASE")
print("   - YES")
print("   - NO")
print("\nüí° Or we can use a pre-recorded dataset...")

---
## ü§ñ STEP 4: Initialize MediaPipe

In [None]:
mp_hands = mp.solutions.hands
mp_drawing = mp.solutions.drawing_utils

hands = mp_hands.Hands(
    static_image_mode=False,
    max_num_hands=2,  # Support both hands for some signs
    min_detection_confidence=0.5,
    min_tracking_confidence=0.5
)

print("‚úÖ MediaPipe initialized for dynamic recognition!")

---
## üé¨ STEP 5: Video Capture Function for Sequences

In [None]:
def capture_sequence(num_frames=30, delay_ms=100):
    """
    Capture a sequence of frames (for dynamic signs)
    num_frames: Number of frames to capture (default 30 = ~3 seconds at 10 FPS)
    delay_ms: Delay between frames in milliseconds
    """
    print(f"üìπ Capturing {num_frames} frames...\n")
    
    frames = []
    
    for i in range(num_frames):
        # JavaScript to capture frame
        js = Javascript('''
            async function captureFrame(quality) {
                if (!window.stream) {
                    const video = document.createElement('video');
                    video.style.display = 'block';
                    window.stream = await navigator.mediaDevices.getUserMedia({video: true});
                    
                    const div = document.createElement('div');
                    div.id = 'video-container';
                    document.body.appendChild(div);
                    div.appendChild(video);
                    video.srcObject = window.stream;
                    await video.play();
                    window.video = video;
                    
                    google.colab.output.setIframeHeight(document.documentElement.scrollHeight, true);
                    await new Promise(resolve => setTimeout(resolve, 1000));
                }
                
                const canvas = document.createElement('canvas');
                canvas.width = window.video.videoWidth;
                canvas.height = window.video.videoHeight;
                canvas.getContext('2d').drawImage(window.video, 0, 0);
                return canvas.toDataURL('image/jpeg', quality);
            }
        ''')
        
        if i == 0:
            display(js)
        
        data = eval_js('captureFrame(0.8)')
        binary = b64decode(data.split(',')[1])
        
        filename = f'frame_{i}.jpg'
        with open(filename, 'wb') as f:
            f.write(binary)
        
        frames.append(filename)
        
        if (i + 1) % 10 == 0:
            print(f"‚úì Captured {i + 1}/{num_frames} frames")
        
        time.sleep(delay_ms / 1000)
    
    # Stop camera
    display(Javascript('''
        if (window.stream) {
            window.stream.getVideoTracks()[0].stop();
            document.getElementById('video-container').remove();
            window.stream = null;
        }
    '''))
    
    print(f"\n‚úÖ Captured {num_frames} frames!")
    return frames

print("‚úÖ Video sequence capture function ready!")

---
## üîç STEP 6: Extract Landmarks from Sequence

In [None]:
def extract_sequence_landmarks(frame_paths):
    """
    Extract hand landmarks from a sequence of frames
    Returns: array of shape (num_frames, 63) - 21 landmarks √ó 3 coordinates per frame
    """
    sequence_landmarks = []
    
    for frame_path in frame_paths:
        image = cv2.imread(frame_path)
        if image is None:
            # If frame is missing, use zeros
            sequence_landmarks.append(np.zeros(63))
            continue
        
        image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        results = hands.process(image_rgb)
        
        if results.multi_hand_landmarks:
            # Get first hand
            hand = results.multi_hand_landmarks[0]
            landmarks = []
            for lm in hand.landmark:
                landmarks.extend([lm.x, lm.y, lm.z])
            sequence_landmarks.append(np.array(landmarks))
        else:
            # No hand detected, use zeros
            sequence_landmarks.append(np.zeros(63))
    
    return np.array(sequence_landmarks)

print("‚úÖ Sequence landmark extraction ready!")

---
## üìä STEP 7: Collect Training Data for Dynamic Signs

**We'll record multiple examples of each word**

In [None]:
# Define the words we want to recognize
WORDS = ['HELLO', 'THANK_YOU', 'PLEASE', 'YES', 'NO']

# Number of examples to collect per word
SAMPLES_PER_WORD = 5  # Increase this for better accuracy

# Number of frames per sequence
SEQUENCE_LENGTH = 30  # 3 seconds at 10 FPS

print("üìä DATA COLLECTION SETUP")
print("="*60)
print(f"Words to learn: {WORDS}")
print(f"Samples per word: {SAMPLES_PER_WORD}")
print(f"Frames per sample: {SEQUENCE_LENGTH}")
print("="*60)
print("\nüí° We'll collect training data by recording you performing each sign")
print("   You'll perform each sign multiple times for training")

In [None]:
# Collect training data
X_sequences = []  # Store landmark sequences
y_sequences = []  # Store labels

print("\nüé¨ STARTING DATA COLLECTION")
print("="*60)
print("\nüì∏ When ready, we'll record you performing each sign")
print("\n‚ö†Ô∏è  Make sure:")
print("   - Good lighting")
print("   - Clear background")
print("   - Perform sign smoothly")
print("   - Repeat each sign naturally")
print("\n" + "="*60)

input("\nPress ENTER when ready to start...")

for word in WORDS:
    print(f"\n\n{'='*60}")
    print(f"üìù WORD: {word}")
    print("="*60)
    print(f"\nüéØ Learn the sign: https://www.handspeak.com/word/search/?id={word.lower()}")
    
    for sample in range(SAMPLES_PER_WORD):
        print(f"\nüìπ Recording sample {sample + 1}/{SAMPLES_PER_WORD} for '{word}'")
        input(f"   Press ENTER and immediately perform the sign for '{word}'...")
        
        # Capture sequence
        frames = capture_sequence(num_frames=SEQUENCE_LENGTH, delay_ms=100)
        
        # Extract landmarks
        landmarks = extract_sequence_landmarks(frames)
        
        # Store
        X_sequences.append(landmarks)
        y_sequences.append(word)
        
        print(f"   ‚úÖ Sample {sample + 1} recorded!")
        
        # Cleanup
        for frame in frames:
            if os.path.exists(frame):
                os.remove(frame)

# Convert to arrays
X_sequences = np.array(X_sequences)
y_sequences = np.array(y_sequences)

print("\n\n" + "="*60)
print("‚úÖ DATA COLLECTION COMPLETE!")
print("="*60)
print(f"\nüìä Total samples: {len(X_sequences)}")
print(f"üìä Sequence shape: {X_sequences.shape}")
print(f"   ({X_sequences.shape[0]} samples, {X_sequences.shape[1]} frames, {X_sequences.shape[2]} features)")

---
## üéØ STEP 8: Prepare Data for Training

In [None]:
# Encode labels
label_encoder = LabelEncoder()
y_encoded = label_encoder.fit_transform(y_sequences)
y_categorical = keras.utils.to_categorical(y_encoded)

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X_sequences, y_categorical,
    test_size=0.2,
    random_state=42,
    stratify=y_encoded
)

print("‚úÖ Data prepared!")
print(f"\nüìä Training: {len(X_train)} samples")
print(f"üìä Testing: {len(X_test)} samples")
print(f"üìä Classes: {label_encoder.classes_}")

# Save label encoder
with open('dynamic_label_encoder.pkl', 'wb') as f:
    pickle.dump(label_encoder, f)

---
## üß† STEP 9: Build LSTM Model for Sequences

In [None]:
# Build LSTM model for temporal sequences
model = keras.Sequential([
    # Input: (sequence_length, 63 features)
    layers.Input(shape=(SEQUENCE_LENGTH, 63)),
    
    # LSTM layers to capture temporal patterns
    layers.LSTM(128, return_sequences=True),
    layers.Dropout(0.3),
    
    layers.LSTM(256, return_sequences=True),
    layers.Dropout(0.3),
    
    layers.LSTM(128),
    layers.Dropout(0.4),
    
    # Dense layers
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.5),
    
    layers.Dense(64, activation='relu'),
    layers.Dropout(0.3),
    
    # Output
    layers.Dense(len(label_encoder.classes_), activation='softmax')
])

model.compile(
    optimizer=keras.optimizers.Adam(0.001),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

print("‚úÖ LSTM Model built!\n")
model.summary()

---
## üöÄ STEP 10: Train Model

In [None]:
callbacks = [
    keras.callbacks.EarlyStopping(monitor='val_loss', patience=15, restore_best_weights=True),
    keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=7)
]

print("üöÄ Training LSTM model...\n")

history = model.fit(
    X_train, y_train,
    validation_data=(X_test, y_test),
    epochs=100,
    batch_size=8,
    callbacks=callbacks,
    verbose=1
)

print("\n‚úÖ Training complete!")

---
## üìà STEP 11: Evaluate Model

In [None]:
test_loss, test_accuracy = model.evaluate(X_test, y_test, verbose=0)

print("="*60)
print("üìä MODEL PERFORMANCE")
print("="*60)
print(f"\n‚úÖ Accuracy: {test_accuracy*100:.2f}%")
print(f"‚úÖ Loss: {test_loss:.4f}")
print("="*60)

# Plot
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

ax1.plot(history.history['accuracy'], label='Train', linewidth=2)
ax1.plot(history.history['val_accuracy'], label='Val', linewidth=2)
ax1.set_title('Accuracy', fontsize=14, fontweight='bold')
ax1.legend()
ax1.grid(True, alpha=0.3)

ax2.plot(history.history['loss'], label='Train', linewidth=2)
ax2.plot(history.history['val_loss'], label='Val', linewidth=2)
ax2.set_title('Loss', fontsize=14, fontweight='bold')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Save
model.save('dynamic_asl_model.h5')
print("\n‚úÖ Model saved!")

---
## üéØ STEP 12: Real-Time Dynamic Sign Recognition!

In [None]:
print("="*60)
print("üé¨ REAL-TIME DYNAMIC SIGN RECOGNITION")
print("="*60)
print(f"\nüìö Trained words: {list(label_encoder.classes_)}")
print(f"\nüí° Perform a sign for {SEQUENCE_LENGTH * 0.1:.1f} seconds")
print("="*60)

input("\nPress ENTER when ready to perform a sign...")

# Capture sequence
print("\nüìπ Recording... Perform your sign NOW!\n")
frames = capture_sequence(num_frames=SEQUENCE_LENGTH, delay_ms=100)

# Extract landmarks
print("\nüîç Analyzing...")
landmarks = extract_sequence_landmarks(frames)

# Predict
landmarks = landmarks.reshape(1, SEQUENCE_LENGTH, 63)
prediction = model.predict(landmarks, verbose=0)

idx = np.argmax(prediction)
word = label_encoder.inverse_transform([idx])[0]
confidence = prediction[0][idx]

# Display result
print("\n" + "="*60)
print("üéØ PREDICTION RESULT")
print("="*60)
print(f"\n‚úÖ Detected Word: {word}")
print(f"‚úÖ Confidence: {confidence*100:.2f}%")

# Top predictions
print("\nüìä All predictions:")
for i in np.argsort(prediction[0])[::-1]:
    w = label_encoder.inverse_transform([i])[0]
    c = prediction[0][i] * 100
    print(f"   {w}: {c:.2f}%")

print("\n" + "="*60)
print("üí° Run this cell again to test another sign!")
print("="*60)

# Cleanup
for frame in frames:
    if os.path.exists(frame):
        os.remove(frame)

---
## üíæ STEP 13: Download Model

In [None]:
files.download('dynamic_asl_model.h5')
files.download('dynamic_label_encoder.pkl')
print("‚úÖ Model downloaded!")

---
## üìù Notes

### üéØ What We Built:
- **Dynamic ASL word recognition** with movement
- **LSTM model** to capture temporal patterns
- **Sequence-based prediction** (30 frames = ~3 seconds)
- Works with words like HELLO, THANK YOU, PLEASE, etc.

### üöÄ To Improve:
1. **Collect MORE samples** per word (20-50 samples)
2. **Add more words** to vocabulary
3. **Use pre-recorded dataset** like WLASL for better accuracy
4. **Increase sequence length** for longer/complex signs
5. **Support 2-hand signs** (already enabled in MediaPipe)

### üìö ASL Resources:
- Learn signs: https://www.handspeak.com
- ASL dictionary: https://www.lifeprint.com
- Sign videos: https://www.signingsavvy.com

---
## üéâ DYNAMIC ASL RECOGNITION COMPLETE!