# Practical Tensor Operations for Deep Learning

This notebook demonstrates **real-world applications** of tensor operations in deep learning.

## Topics Covered

1. **Boolean Masking** - Attention masks, padding, data filtering
2. **Data Preprocessing** - Normalization, standardization
3. **Feature Engineering** - Polynomial features, time features
4. **Batch Operations** - Variable-length sequences, efficient batching

In [None]:
import torch
import numpy as np
import matplotlib.pyplot as plt
import sys
sys.path.append('..')

from pytorch_lab.tensor_ops import (
    BooleanMasking,
    create_attention_mask,
    DataPreprocessor,
    normalize_features,
    FeatureEngineer,
    create_polynomial_features,
    BatchOperations,
)

torch.manual_seed(42)

## Part 1: Boolean Masking for Transformers

**Problem**: Sentences have different lengths, but we need fixed-size batches.

**Solution**: Pad sequences and use attention masks.

In [None]:
# Batch of 3 sentences with different lengths
sentence_lengths = torch.tensor([5, 3, 7])
max_length = 10

mask = create_attention_mask(sentence_lengths, max_length)
print("Attention Mask:")
print(mask)
print("\nTrue = real token, False = padding")

## Part 2: Filtering Outliers

**Problem**: Sensor data contains errors and anomalies.

**Solution**: Use boolean masking to filter outliers.

In [None]:
# Simulate sensor data with outliers
normal_data = torch.randn(95, 3) * 10 + 50
outliers = torch.tensor([[999.0, 50.0, 45.0], [48.0, -999.0, 52.0]])
data = torch.cat([normal_data, outliers], dim=0)

clean_data, mask = BooleanMasking.filter_outliers(data, n_std=3.0)
print(f"Original: {len(data)} samples")
print(f"Cleaned: {len(clean_data)} samples")
print(f"Removed: {(~mask).sum()} outliers")

## Part 3: Data Normalization

**Problem**: Features have different scales (age: 0-100, income: 0-1M).

**Solution**: Normalize or standardize features.

In [None]:
# Features on different scales
age = torch.rand(1000) * 60 + 20
income = torch.rand(1000) * 180000 + 20000
data = torch.stack([age, income], dim=1)

print("Original:")
print(f"Age: mean={age.mean():.1f}, std={age.std():.1f}")
print(f"Income: mean={income.mean():.1f}, std={income.std():.1f}")

# Normalize to [0, 1]
normalized = normalize_features(data, dim=0)
print("\nNormalized to [0, 1]:")
print(f"Range: [{normalized.min():.3f}, {normalized.max():.3f}]")

## Part 4: Proper Train-Test Preprocessing

**Critical**: Fit preprocessing on training data only!

In [None]:
data = torch.randn(1000, 5) * 10 + 50
train_data = data[:800]
test_data = data[800:]

# Fit on train only
preprocessor = DataPreprocessor()
preprocessor.fit_standardize(train_data, dim=0)

# Transform both
train_std = preprocessor.standardize(train_data)
test_std = preprocessor.standardize(test_data)

print("Train mean:", train_std.mean(dim=0).numpy())
print("Test mean:", test_std.mean(dim=0).numpy())
print("\n✓ Test mean is NOT exactly 0 - this is correct!")

## Part 5: Polynomial Features

**Problem**: Linear models can't capture non-linear relationships.

**Solution**: Add polynomial features (x²,x³, etc.).

In [None]:
x = torch.linspace(-3, 3, 100).unsqueeze(1)
y = 0.5 * x**2 + 0.3 * x + torch.randn(100, 1) * 0.5

x_poly = create_polynomial_features(x, degree=3)
print(f"Original features: {x.shape}")
print(f"With polynomial (degree=3): {x_poly.shape}")
print("Features: x, x², x³")

## Part 6: Handling Variable-Length Sequences

**Problem**: Text/time series have different lengths.

**Solution**: Pad sequences for batching.

In [None]:
# Variable-length sequences
sequences = [
    torch.randn(5, 10),  # Length 5
    torch.randn(3, 10),  # Length 3
    torch.randn(7, 10),  # Length 7
]

padded, lengths = BatchOperations.pad_sequences(sequences)
print(f"Padded shape: {padded.shape}")
print(f"Original lengths: {lengths.tolist()}")
print("\n✓ All sequences now have same length for batching")

## Summary

### Boolean Masking Use Cases
- **Transformers**: Attention masks for padded sequences
- **Data Cleaning**: Filter outliers and invalid values
- **Feature Selection**: Select top-k important features

### Preprocessing Use Cases
- **Normalization**: Scale features to [0,1] for bounded outputs
- **Standardization**: Zero mean, unit variance for gradient descent
- **Train-Test Split**: Always fit on train, transform both

### Feature Engineering Use Cases
- **Polynomial Features**: Capture non-linear patterns
- **Time Features**: Cyclical encoding for periodic data
- **Interaction Features**: Capture feature relationships

### Batch Operations Use Cases
- **Padding**: Handle variable-length sequences
- **Collation**: Custom batching for DataLoader
- **Stratified Split**: Maintain class balance