# ASL Sign Language Recognition - FRESH START
### Real-time Hand Detection with MediaPipe + CNN

**IMPORTANT: Run cells IN ORDER. Do NOT skip cells!**

---
## üîß STEP 1: Clean Installation
**Run this cell, then RESTART RUNTIME when it finishes**

In [None]:
# Complete fresh installation
!pip install mediapipe opencv-python kaggle --quiet

print("="*60)
print("‚úÖ INSTALLATION COMPLETE!")
print("="*60)
print("\nüî¥ CRITICAL: You MUST restart runtime now!")
print("\nüìç Click: Runtime ‚Üí Restart runtime")
print("üìç Or press: Ctrl+M then press .")
print("\n‚ö†Ô∏è  After restart, run STEP 2 below")
print("="*60)

---
## üìö STEP 2: Import All Libraries
**Run this AFTER restarting runtime**

In [None]:
# Import all required libraries
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import cv2
import mediapipe as mp
import os
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import matplotlib.pyplot as plt
from google.colab.patches import cv2_imshow
from google.colab import files
import pickle

print("="*60)
print("‚úÖ ALL LIBRARIES IMPORTED SUCCESSFULLY!")
print("="*60)
print(f"\nüì¶ TensorFlow: {tf.__version__}")
print(f"üì¶ MediaPipe: {mp.__version__}")
print(f"üì¶ NumPy: {np.__version__}")
print("\n‚úÖ Ready to continue! Run the next cells in order.")
print("="*60)

---
## üìÇ STEP 3: Upload Kaggle API Key

In [None]:
print("üìÅ Upload your kaggle.json file")
print("\nüîó Get it from: https://www.kaggle.com/settings")
print("   ‚Üí Click 'Create New Token' under API section\n")

uploaded = files.upload()

# Setup Kaggle
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

print("\n‚úÖ Kaggle API configured successfully!")

---
## üì• STEP 4: Download ASL Dataset

In [None]:
print("üì• Downloading ASL Alphabet Dataset...\n")
print("‚è≥ This will take 2-3 minutes...\n")

!kaggle datasets download -d grassknoted/asl-alphabet --quiet

print("\nüì¶ Extracting dataset...")
!unzip -q asl-alphabet.zip -d asl_data

print("\n‚úÖ Dataset ready!")
print("\nüìä Available classes:")
!ls asl_data/asl_alphabet_train/asl_alphabet_train/

---
## ü§ñ STEP 5: Initialize MediaPipe

In [None]:
# Initialize MediaPipe Hands
mp_hands = mp.solutions.hands
mp_drawing = mp.solutions.drawing_utils

hands = mp_hands.Hands(
    static_image_mode=False,
    max_num_hands=1,
    min_detection_confidence=0.5,
    min_tracking_confidence=0.5
)

print("‚úÖ MediaPipe Hands initialized!")
print("\n‚öôÔ∏è  Configuration:")
print("   - Max hands: 1")
print("   - Detection confidence: 50%")
print("   - Tracking confidence: 50%")

---
## üîç STEP 6: Define Hand Landmark Extraction

In [None]:
def extract_landmarks(image_path):
    """
    Extract 21 hand landmarks (x,y,z) = 63 features total
    """
    image = cv2.imread(image_path)
    if image is None:
        return None
    
    # Convert to RGB
    image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    
    # Process with MediaPipe
    results = hands.process(image_rgb)
    
    if results.multi_hand_landmarks:
        # Get first hand
        hand = results.multi_hand_landmarks[0]
        
        # Extract all landmarks
        landmarks = []
        for lm in hand.landmark:
            landmarks.extend([lm.x, lm.y, lm.z])
        
        return np.array(landmarks)
    
    return None

print("‚úÖ Landmark extraction function ready!")

---
## üìä STEP 7: Process Dataset & Extract Features
**This will take 5-10 minutes**

In [None]:
dataset_path = 'asl_data/asl_alphabet_train/asl_alphabet_train/'
classes = sorted(os.listdir(dataset_path))

print(f"üìö Found {len(classes)} classes\n")
print(f"Classes: {classes}\n")

# Storage
X = []  # Features
y = []  # Labels

# Use 300 images per class (increase for better accuracy)
IMAGES_PER_CLASS = 300

print(f"‚öôÔ∏è  Processing {IMAGES_PER_CLASS} images per class...\n")
print("="*60)

for class_name in classes:
    class_path = os.path.join(dataset_path, class_name)
    images = os.listdir(class_path)[:IMAGES_PER_CLASS]
    
    success_count = 0
    
    for img_name in images:
        img_path = os.path.join(class_path, img_name)
        landmarks = extract_landmarks(img_path)
        
        if landmarks is not None:
            X.append(landmarks)
            y.append(class_name)
            success_count += 1
    
    print(f"‚úì '{class_name}': {success_count}/{IMAGES_PER_CLASS} processed")

# Convert to arrays
X = np.array(X)
y = np.array(y)

print("="*60)
print(f"\n‚úÖ Feature extraction complete!")
print(f"\nüìä Total samples: {len(X)}")
print(f"üìä Feature shape: {X.shape}")
print(f"üìä Classes: {len(np.unique(y))}")

---
## üéØ STEP 8: Prepare Training Data

In [None]:
# Encode labels
label_encoder = LabelEncoder()
y_encoded = label_encoder.fit_transform(y)
y_categorical = keras.utils.to_categorical(y_encoded)

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y_categorical, 
    test_size=0.2, 
    random_state=42, 
    stratify=y_encoded
)

print("‚úÖ Data prepared!\n")
print(f"üìä Training samples: {len(X_train)}")
print(f"üìä Testing samples: {len(X_test)}")
print(f"üìä Number of classes: {len(label_encoder.classes_)}")

# Save label encoder
with open('label_encoder.pkl', 'wb') as f:
    pickle.dump(label_encoder, f)

print("\n‚úÖ Label encoder saved!")

---
## üß† STEP 9: Build CNN Model

In [None]:
# Build model
model = keras.Sequential([
    # Input: 63 features (21 landmarks √ó 3 coordinates)
    layers.Input(shape=(63,)),
    layers.Reshape((21, 3)),  # Reshape to (landmarks, coordinates)
    
    # Conv1D layers
    layers.Conv1D(64, 3, activation='relu', padding='same'),
    layers.BatchNormalization(),
    layers.Dropout(0.3),
    
    layers.Conv1D(128, 3, activation='relu', padding='same'),
    layers.BatchNormalization(),
    layers.Dropout(0.3),
    
    layers.Conv1D(256, 3, activation='relu', padding='same'),
    layers.BatchNormalization(),
    layers.GlobalMaxPooling1D(),
    
    # Dense layers
    layers.Dense(512, activation='relu'),
    layers.BatchNormalization(),
    layers.Dropout(0.5),
    
    layers.Dense(256, activation='relu'),
    layers.BatchNormalization(),
    layers.Dropout(0.4),
    
    # Output layer
    layers.Dense(len(label_encoder.classes_), activation='softmax')
])

# Compile
model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=0.001),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

print("‚úÖ Model built!\n")
model.summary()

---
## üöÄ STEP 10: Train Model
**This will take 5-10 minutes**

In [None]:
# Callbacks
callbacks = [
    keras.callbacks.EarlyStopping(
        monitor='val_loss',
        patience=10,
        restore_best_weights=True
    ),
    keras.callbacks.ReduceLROnPlateau(
        monitor='val_loss',
        factor=0.5,
        patience=5,
        min_lr=0.00001
    )
]

print("üöÄ Starting training...\n")
print("="*60)

history = model.fit(
    X_train, y_train,
    validation_data=(X_test, y_test),
    epochs=50,
    batch_size=32,
    callbacks=callbacks,
    verbose=1
)

print("\n="*60)
print("‚úÖ TRAINING COMPLETE!")
print("="*60)

---
## üìà STEP 11: Evaluate Model

In [None]:
# Evaluate
test_loss, test_accuracy = model.evaluate(X_test, y_test, verbose=0)

print("="*60)
print("üìä MODEL PERFORMANCE")
print("="*60)
print(f"\n‚úÖ Test Accuracy: {test_accuracy*100:.2f}%")
print(f"‚úÖ Test Loss: {test_loss:.4f}\n")
print("="*60)

# Plot training history
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Accuracy
ax1.plot(history.history['accuracy'], label='Train', linewidth=2, color='blue')
ax1.plot(history.history['val_accuracy'], label='Validation', linewidth=2, color='orange')
ax1.set_title('Model Accuracy', fontsize=14, fontweight='bold')
ax1.set_xlabel('Epoch', fontsize=12)
ax1.set_ylabel('Accuracy', fontsize=12)
ax1.legend(fontsize=11)
ax1.grid(True, alpha=0.3)

# Loss
ax2.plot(history.history['loss'], label='Train', linewidth=2, color='blue')
ax2.plot(history.history['val_loss'], label='Validation', linewidth=2, color='orange')
ax2.set_title('Model Loss', fontsize=14, fontweight='bold')
ax2.set_xlabel('Epoch', fontsize=12)
ax2.set_ylabel('Loss', fontsize=12)
ax2.legend(fontsize=11)
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Save model
model.save('asl_model.h5')
print("\n‚úÖ Model saved as 'asl_model.h5'")

---
## üì∏ STEP 12: Setup Webcam Functions

In [None]:
from IPython.display import display, Javascript
from google.colab.output import eval_js
from base64 import b64decode

def take_photo(filename='photo.jpg', quality=0.8):
    """Capture photo from webcam"""
    js = Javascript('''
        async function takePhoto(quality) {
            const div = document.createElement('div');
            const video = document.createElement('video');
            video.style.display = 'block';
            const stream = await navigator.mediaDevices.getUserMedia({video: true});

            document.body.appendChild(div);
            div.appendChild(video);
            video.srcObject = stream;
            await video.play();

            google.colab.output.setIframeHeight(document.documentElement.scrollHeight, true);
            await new Promise((resolve) => setTimeout(resolve, 1000));

            const canvas = document.createElement('canvas');
            canvas.width = video.videoWidth;
            canvas.height = video.videoHeight;
            canvas.getContext('2d').drawImage(video, 0, 0);
            stream.getVideoTracks()[0].stop();
            div.remove();
            return canvas.toDataURL('image/jpeg', quality);
        }
    ''')
    display(js)
    data = eval_js('takePhoto({})'.format(quality))
    binary = b64decode(data.split(',')[1])
    with open(filename, 'wb') as f:
        f.write(binary)
    return filename

def predict_sign(image_path):
    """Predict ASL sign from image"""
    landmarks = extract_landmarks(image_path)
    
    if landmarks is None:
        return None, None, None
    
    # Predict
    landmarks = landmarks.reshape(1, -1)
    prediction = model.predict(landmarks, verbose=0)
    
    # Get results
    idx = np.argmax(prediction)
    letter = label_encoder.inverse_transform([idx])[0]
    confidence = prediction[0][idx]
    
    return letter, confidence, prediction[0]

print("‚úÖ Webcam functions ready!")

---
## üéØ STEP 13: REAL-TIME ASL RECOGNITION!
**Run this cell multiple times to test different signs**

In [None]:
print("="*60)
print("üì∏ CAPTURING FROM WEBCAM...")
print("="*60)
print("\nüëã Show your ASL sign now!\n")

# Capture photo
photo = take_photo('capture.jpg')

# Load and process image
image = cv2.imread(photo)
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# Draw hand landmarks
results = hands.process(image_rgb)
if results.multi_hand_landmarks:
    for hand_landmarks in results.multi_hand_landmarks:
        mp_drawing.draw_landmarks(
            image, 
            hand_landmarks, 
            mp_hands.HAND_CONNECTIONS,
            mp_drawing.DrawingSpec(color=(0, 255, 0), thickness=2, circle_radius=3),
            mp_drawing.DrawingSpec(color=(255, 0, 0), thickness=2)
        )

# Predict sign
letter, confidence, all_predictions = predict_sign(photo)

if letter:
    # Add text to image
    cv2.putText(image, f"Sign: {letter}", (10, 50),
                cv2.FONT_HERSHEY_SIMPLEX, 1.5, (0, 255, 0), 3)
    cv2.putText(image, f"{confidence*100:.1f}%", (10, 100),
                cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
    
    print("="*60)
    print("üéØ PREDICTION RESULTS")
    print("="*60)
    print(f"\n‚úÖ Detected Sign: {letter}")
    print(f"‚úÖ Confidence: {confidence*100:.2f}%\n")
    
    # Top 3 predictions
    top3_idx = np.argsort(all_predictions)[-3:][::-1]
    print("üìä Top 3 Predictions:")
    for i, idx in enumerate(top3_idx, 1):
        l = label_encoder.inverse_transform([idx])[0]
        c = all_predictions[idx] * 100
        print(f"   {i}. {l}: {c:.2f}%")
    
else:
    cv2.putText(image, "NO HAND DETECTED!", (10, 50),
                cv2.FONT_HERSHEY_SIMPLEX, 1.2, (0, 0, 255), 3)
    print("="*60)
    print("‚ùå NO HAND DETECTED")
    print("="*60)
    print("\nüí° Tips:")
    print("   - Ensure good lighting")
    print("   - Keep hand centered in frame")
    print("   - Show your hand clearly")

print("\n" + "="*60)
print("üì∑ CAPTURED IMAGE:")
print("="*60 + "\n")
cv2_imshow(image)

print("\n" + "="*60)
print("üí° Run this cell again to test another sign!")
print("="*60)

---
## üíæ STEP 14: Download Trained Model (Optional)

In [None]:
print("üì• Downloading model files...\n")

files.download('asl_model.h5')
files.download('label_encoder.pkl')

print("\n‚úÖ Model files downloaded!")
print("\nüì¶ You now have:")
print("   - asl_model.h5 (trained model)")
print("   - label_encoder.pkl (label mappings)")
print("\nüí° You can use these files later without retraining!")

---
## üìù Notes

### ‚úÖ What This System Does:
- Recognizes **static ASL alphabet signs** (A-Z + Space, Delete, Nothing)
- Uses **MediaPipe** for hand landmark detection (21 points)
- Trains a **CNN** model on hand coordinates
- **Real-time prediction** from webcam
- Expected accuracy: **90-95%**

### üöÄ To Improve Accuracy:
1. Increase `IMAGES_PER_CLASS` to 500-1000
2. Train for more epochs
3. Ensure good lighting during testing
4. Keep hand centered and steady

### üìπ For Dynamic Signs (Words):
Dynamic signs require:
- Video dataset with motion sequences
- LSTM or GRU architecture
- Temporal feature extraction

Let me know if you want to extend to dynamic signs!

---
## üéâ PROJECT COMPLETE!
You now have a working ASL recognition system!