# Module 1 - Exercise 3: Tensor Mastery

## Learning Objectives
- Master advanced tensor indexing and slicing techniques
- Implement and understand broadcasting rules in detail
- Analyze memory layout and performance considerations
- Use einsum operations for complex tensor manipulations
- Implement custom tensor operations and functions
- Work with tensor storage, strides, and memory efficiency

## Prerequisites
- Completion of Exercise 1: Environment & Basics
- Completion of Exercise 2: Mathematical Implementation
- Understanding of tensor operations and memory concepts
- Basic knowledge of linear algebra and matrix operations

## Setup and Test Repository

First, let's clone the test repository and set up our environment for step-by-step validation.

In [None]:
# Clone the test repository
!git clone https://github.com/racousin/data_science_practice.git /tmp/tests 2>/dev/null || true
!cd /tmp/tests && pwd && ls -la tests/python_deep_learning/module1/

# Import the test module
import sys
sys.path.append('/tmp/tests')
print("Test repository setup complete!")

## Environment Setup

Import necessary libraries for advanced tensor operations and analysis.

In [None]:
import torch
import numpy as np
import matplotlib.pyplot as plt
import time
from typing import Tuple, List, Optional
import warnings

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)

# Import test functions
from tests.python_deep_learning.module1.test_exercise3 import *

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

## Section 1: Advanced Indexing Techniques

Master sophisticated indexing operations including fancy indexing, boolean masks, and multidimensional indexing.

In [None]:
# Create sample data for advanced indexing
data = torch.randn(100, 50, 25)
print(f"Sample data shape: {data.shape}")

# TODO: Implement fancy indexing to select specific elements
def fancy_indexing_selection(tensor: torch.Tensor) -> torch.Tensor:
    """
    Select elements from tensor using fancy indexing:
    - Select rows [0, 10, 20, 30] from first dimension
    - Select columns [5, 15, 25, 35, 45] from second dimension
    - Select all elements from third dimension
    
    Args:
        tensor: Input tensor of shape (100, 50, 25)
    
    Returns:
        Selected tensor of shape (4, 5, 25)
    """
    # TODO: Implement fancy indexing
    row_indices = None
    col_indices = None
    
    # Use advanced indexing to select elements
    selected = None
    
    return selected

# TODO: Implement boolean masking with multiple conditions
def complex_boolean_mask(tensor: torch.Tensor, threshold1: float, threshold2: float) -> torch.Tensor:
    """
    Apply complex boolean masking:
    - Find elements where absolute value > threshold1 AND value < threshold2
    - Return the masked tensor with only these elements
    
    Args:
        tensor: Input tensor
        threshold1: First threshold for absolute value
        threshold2: Second threshold for raw value
    
    Returns:
        Masked tensor (1D)
    """
    # TODO: Create complex boolean mask
    mask = None  # abs(tensor) > threshold1 AND tensor < threshold2
    
    # TODO: Apply mask
    masked_tensor = None
    
    return masked_tensor

# TODO: Implement conditional indexing with where
def conditional_replacement(tensor: torch.Tensor, condition_value: float, replacement: float) -> torch.Tensor:
    """
    Use torch.where to conditionally replace values:
    - Replace all values > condition_value with replacement
    - Keep other values unchanged
    
    Args:
        tensor: Input tensor
        condition_value: Threshold for replacement
        replacement: Value to use for replacement
    
    Returns:
        Tensor with conditional replacements
    """
    # TODO: Use torch.where for conditional replacement
    result = None
    
    return result

# Test the implementations
selected = fancy_indexing_selection(data)
print(f"Fancy indexing result shape: {selected.shape if selected is not None else 'None'}")

masked = complex_boolean_mask(data, 0.5, 1.0)
print(f"Boolean mask result size: {masked.size(0) if masked is not None else 'None'}")

replaced = conditional_replacement(data[:10, :10, 0], 0.0, -999.0)
print(f"Conditional replacement result shape: {replaced.shape if replaced is not None else 'None'}")

In [None]:
# Test your advanced indexing implementations
try:
    test_advanced_indexing(locals())
    print("✅ Section 1: Advanced Indexing - All tests passed!")
except Exception as e:
    print(f"❌ Section 1: Advanced Indexing - Tests failed: {e}")
    print("Please complete the advanced indexing implementations above before proceeding.")

## Section 2: Broadcasting Rules and Implementation

Understand and implement complex broadcasting scenarios with detailed analysis.

In [None]:
# TODO: Implement broadcasting rule checker
def check_broadcasting_compatibility(shape1: Tuple[int, ...], shape2: Tuple[int, ...]) -> Tuple[bool, Tuple[int, ...]]:
    """
    Check if two tensor shapes are compatible for broadcasting.
    
    Broadcasting rules:
    1. Start from the rightmost dimension
    2. Dimensions are compatible if they are equal, one of them is 1, or one is missing
    3. Missing dimensions are assumed to be 1
    
    Args:
        shape1: First tensor shape
        shape2: Second tensor shape
    
    Returns:
        (is_compatible, result_shape)
    """
    # TODO: Implement broadcasting compatibility check
    # Reverse the shapes to start from rightmost dimension
    shape1_rev = None
    shape2_rev = None
    
    # Make shapes same length by padding with 1s
    max_len = max(len(shape1_rev), len(shape2_rev))
    shape1_padded = None
    shape2_padded = None
    
    # Check compatibility and compute result shape
    result_shape_rev = []
    is_compatible = True
    
    # TODO: Implement the broadcasting logic
    for i in range(max_len):
        dim1 = None
        dim2 = None
        
        # Check compatibility
        if dim1 == dim2 or dim1 == 1 or dim2 == 1:
            result_shape_rev.append(max(dim1, dim2))
        else:
            is_compatible = False
            break
    
    result_shape = tuple(reversed(result_shape_rev)) if is_compatible else ()
    
    return is_compatible, result_shape

# TODO: Implement manual broadcasting
def manual_broadcast(tensor1: torch.Tensor, tensor2: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
    """
    Manually broadcast two tensors to compatible shapes without using built-in broadcasting.
    
    Args:
        tensor1: First tensor
        tensor2: Second tensor
    
    Returns:
        (broadcasted_tensor1, broadcasted_tensor2)
    """
    # Check compatibility first
    compatible, target_shape = check_broadcasting_compatibility(tensor1.shape, tensor2.shape)
    
    if not compatible:
        raise ValueError(f"Tensors with shapes {tensor1.shape} and {tensor2.shape} are not broadcastable")
    
    # TODO: Manually expand tensors to target shape
    # Use unsqueeze and expand operations
    broadcasted1 = tensor1
    broadcasted2 = tensor2
    
    # Add dimensions and expand to match target shape
    # Implement the manual broadcasting logic here
    
    return broadcasted1, broadcasted2

# TODO: Implement advanced broadcasting scenarios
def advanced_broadcasting_operations() -> dict:
    """
    Demonstrate various broadcasting scenarios and return results.
    
    Returns:
        Dictionary with operation results
    """
    results = {}
    
    # Scenario 1: Matrix-vector operations
    matrix = torch.randn(100, 50)
    vector = torch.randn(50)
    
    # TODO: Add vector to each row of matrix
    results['matrix_vector_add'] = None
    
    # Scenario 2: 3D tensor with 2D tensor
    tensor_3d = torch.randn(10, 20, 30)
    tensor_2d = torch.randn(20, 1)
    
    # TODO: Multiply 3D tensor with 2D tensor
    results['3d_2d_multiply'] = None
    
    # Scenario 3: Broadcasting with singleton dimensions
    a = torch.randn(1, 8, 1, 16)
    b = torch.randn(7, 1, 5, 1)
    
    # TODO: Add tensors with singleton dimensions
    results['singleton_broadcast'] = None
    
    return results

# Test broadcasting implementations
test_shapes = [
    ((3, 4), (4,)),
    ((2, 1, 3), (3,)),
    ((1, 5, 1), (3, 1, 4)),
    ((3, 4), (2, 3, 4)),
    ((3, 4), (5, 4)),  # This should be incompatible
]

print("Broadcasting compatibility tests:")
for shape1, shape2 in test_shapes:
    compatible, result = check_broadcasting_compatibility(shape1, shape2)
    print(f"{shape1} + {shape2} -> Compatible: {compatible}, Result: {result}")

# Test advanced operations
broadcast_results = advanced_broadcasting_operations()
print(f"\nAdvanced broadcasting results:")
for key, value in broadcast_results.items():
    if value is not None:
        print(f"{key}: shape {value.shape}")
    else:
        print(f"{key}: Not implemented")

In [None]:
# Test your broadcasting implementations
try:
    test_broadcasting_rules(locals())
    print("✅ Section 2: Broadcasting Rules - All tests passed!")
except Exception as e:
    print(f"❌ Section 2: Broadcasting Rules - Tests failed: {e}")
    print("Please complete the broadcasting implementations above before proceeding.")

## Section 3: Memory Layout and Performance Analysis

Analyze tensor memory layout, strides, and optimize operations for performance.

In [None]:
# TODO: Implement memory layout analyzer
def analyze_memory_layout(tensor: torch.Tensor) -> dict:
    """
    Analyze the memory layout of a tensor.
    
    Args:
        tensor: Input tensor to analyze
    
    Returns:
        Dictionary with memory layout information
    """
    analysis = {}
    
    # TODO: Extract memory layout information
    analysis['shape'] = None
    analysis['strides'] = None
    analysis['is_contiguous'] = None
    analysis['storage_size'] = None
    analysis['element_size'] = None
    analysis['memory_usage_bytes'] = None
    analysis['data_ptr'] = None
    
    return analysis

# TODO: Implement stride calculation
def calculate_manual_strides(shape: Tuple[int, ...]) -> Tuple[int, ...]:
    """
    Manually calculate strides for a given shape (row-major order).
    
    Args:
        shape: Tensor shape
    
    Returns:
        Calculated strides
    """
    # TODO: Calculate strides from shape
    strides = []
    
    # Strides are calculated from right to left
    # stride[i] = product of all dimensions to the right of dimension i
    
    return tuple(strides)

# TODO: Implement memory-efficient operations
def memory_efficient_operations() -> dict:
    """
    Demonstrate memory-efficient tensor operations.
    
    Returns:
        Dictionary with operation timings and memory usage
    """
    results = {}
    
    # Create large tensors for testing
    size = (1000, 1000)
    tensor1 = torch.randn(size)
    tensor2 = torch.randn(size)
    
    # TODO: Compare in-place vs regular operations
    # Time regular addition
    start_time = time.time()
    regular_result = None  # tensor1 + tensor2
    regular_time = time.time() - start_time
    
    # Time in-place addition
    tensor1_copy = tensor1.clone()
    start_time = time.time()
    # TODO: Implement in-place addition
    inplace_time = time.time() - start_time
    
    results['regular_time'] = regular_time
    results['inplace_time'] = inplace_time
    
    # TODO: Compare view vs copy operations
    original = torch.randn(100, 100, 100)
    
    # Time view operation
    start_time = time.time()
    view_result = None  # Reshape using view
    view_time = time.time() - start_time
    
    # Time copy operation
    start_time = time.time()
    copy_result = None  # Reshape using contiguous + view
    copy_time = time.time() - start_time
    
    results['view_time'] = view_time
    results['copy_time'] = copy_time
    
    return results

# TODO: Implement contiguity converter
def make_contiguous_if_needed(tensor: torch.Tensor) -> torch.Tensor:
    """
    Make tensor contiguous if it's not already contiguous.
    
    Args:
        tensor: Input tensor
    
    Returns:
        Contiguous tensor
    """
    # TODO: Check contiguity and make contiguous if needed
    if tensor.is_contiguous():
        return tensor
    else:
        return tensor.contiguous()

# Test memory layout analysis
test_tensors = [
    torch.randn(10, 20),
    torch.randn(10, 20).t(),  # Transposed (non-contiguous)
    torch.randn(2, 3, 4, 5),
    torch.randn(100)[::2]     # Strided (non-contiguous)
]

print("Memory layout analysis:")
for i, tensor in enumerate(test_tensors):
    analysis = analyze_memory_layout(tensor)
    print(f"\nTensor {i+1}:")
    for key, value in analysis.items():
        if value is not None:
            print(f"  {key}: {value}")

# Test stride calculation
test_shapes = [(10, 20), (2, 3, 4), (5, 1, 3, 2)]
print("\nManual stride calculation:")
for shape in test_shapes:
    calculated = calculate_manual_strides(shape)
    actual = torch.randn(shape).stride()
    print(f"Shape {shape}: Calculated {calculated}, Actual {actual}")

# Test memory-efficient operations
print("\nMemory-efficient operations:")
efficiency_results = memory_efficient_operations()
for key, value in efficiency_results.items():
    if value is not None:
        print(f"{key}: {value:.6f} seconds")
    else:
        print(f"{key}: Not implemented")

In [None]:
# Test your memory layout implementations
try:
    test_memory_analysis(locals())
    print("✅ Section 3: Memory Layout Analysis - All tests passed!")
except Exception as e:
    print(f"❌ Section 3: Memory Layout Analysis - Tests failed: {e}")
    print("Please complete the memory analysis implementations above before proceeding.")

## Section 4: Einsum Operations

Master Einstein summation notation for complex tensor operations.

In [None]:
# TODO: Implement einsum operations for various scenarios
def einsum_operations() -> dict:
    """
    Implement various operations using einsum notation.
    
    Returns:
        Dictionary with einsum operation results
    """
    results = {}
    
    # Sample tensors for operations
    A = torch.randn(10, 20)
    B = torch.randn(20, 30)
    C = torch.randn(10, 20, 30)
    D = torch.randn(10, 30)
    v = torch.randn(20)
    
    # TODO: Matrix multiplication using einsum
    results['matrix_mult'] = None  # torch.einsum('ij,jk->ik', A, B)
    
    # TODO: Matrix-vector multiplication
    results['matvec_mult'] = None  # torch.einsum('ij,j->i', A, v)
    
    # TODO: Batch matrix multiplication
    batch_A = torch.randn(5, 10, 20)
    batch_B = torch.randn(5, 20, 30)
    results['batch_mult'] = None  # torch.einsum('bij,bjk->bik', batch_A, batch_B)
    
    # TODO: Element-wise multiplication and sum
    results['hadamard_sum'] = None  # torch.einsum('ij,ij->', A, A[:, :20] if A.shape[1] >= 20 else A)
    
    # TODO: Trace of a matrix
    square_matrix = torch.randn(15, 15)
    results['trace'] = None  # torch.einsum('ii->', square_matrix)
    
    # TODO: Transpose using einsum
    results['transpose'] = None  # torch.einsum('ij->ji', A)
    
    # TODO: Sum along specific axis
    results['sum_axis0'] = None  # torch.einsum('ijk->jk', C)
    results['sum_axis1'] = None  # torch.einsum('ijk->ik', C)
    
    # TODO: Diagonal extraction
    results['diagonal'] = None  # torch.einsum('ii->i', square_matrix)
    
    # TODO: Outer product
    u = torch.randn(10)
    v = torch.randn(15)
    results['outer_product'] = None  # torch.einsum('i,j->ij', u, v)
    
    return results

# TODO: Implement complex einsum scenarios
def complex_einsum_operations() -> dict:
    """
    Implement complex tensor operations using einsum.
    
    Returns:
        Dictionary with complex operation results
    """
    results = {}
    
    # TODO: Bilinear transformation
    # Given tensors A (batch, n), B (n, m, n), compute A @ B @ A.T for each batch
    A = torch.randn(8, 10)
    B = torch.randn(10, 10)
    results['bilinear'] = None  # torch.einsum('bi,ij,bj->b', A, B, A)
    
    # TODO: Attention mechanism computation
    # Query: (batch, seq_len, d_model)
    # Key: (batch, seq_len, d_model)
    # Compute attention weights
    Q = torch.randn(4, 20, 64)
    K = torch.randn(4, 20, 64)
    results['attention_weights'] = None  # torch.einsum('bqd,bkd->bqk', Q, K)
    
    # TODO: Tensor contraction
    # Contract over multiple dimensions
    T1 = torch.randn(3, 4, 5, 6)
    T2 = torch.randn(4, 6, 7, 8)
    results['tensor_contract'] = None  # torch.einsum('abcd,bedf->acef', T1, T2)
    
    return results

# TODO: Compare einsum vs traditional operations
def compare_einsum_performance() -> dict:
    """
    Compare performance of einsum vs traditional tensor operations.
    
    Returns:
        Dictionary with timing comparisons
    """
    results = {}
    
    # Setup large tensors
    A = torch.randn(500, 600)
    B = torch.randn(600, 700)
    
    # TODO: Time traditional matrix multiplication
    start_time = time.time()
    traditional_result = None  # torch.mm(A, B)
    traditional_time = time.time() - start_time
    
    # TODO: Time einsum matrix multiplication
    start_time = time.time()
    einsum_result = None  # torch.einsum('ij,jk->ik', A, B)
    einsum_time = time.time() - start_time
    
    results['traditional_time'] = traditional_time
    results['einsum_time'] = einsum_time
    results['speedup_ratio'] = traditional_time / einsum_time if einsum_time > 0 else float('inf')
    
    return results

# Test einsum operations
print("Basic einsum operations:")
basic_results = einsum_operations()
for operation, result in basic_results.items():
    if result is not None:
        print(f"{operation}: shape {result.shape if hasattr(result, 'shape') else type(result)}")
    else:
        print(f"{operation}: Not implemented")

print("\nComplex einsum operations:")
complex_results = complex_einsum_operations()
for operation, result in complex_results.items():
    if result is not None:
        print(f"{operation}: shape {result.shape if hasattr(result, 'shape') else type(result)}")
    else:
        print(f"{operation}: Not implemented")

print("\nPerformance comparison:")
perf_results = compare_einsum_performance()
for metric, value in perf_results.items():
    if value is not None:
        print(f"{metric}: {value:.6f}{'s' if 'time' in metric else ''}")
    else:
        print(f"{metric}: Not implemented")

In [None]:
# Test your einsum implementations
try:
    test_einsum_operations(locals())
    print("✅ Section 4: Einsum Operations - All tests passed!")
except Exception as e:
    print(f"❌ Section 4: Einsum Operations - Tests failed: {e}")
    print("Please complete the einsum implementations above before proceeding.")

## Section 5: Custom Tensor Functions

Implement custom tensor operations and understand their performance characteristics.

In [None]:
# TODO: Implement custom pooling operation
def custom_max_pool2d(input_tensor: torch.Tensor, kernel_size: int, stride: int = None) -> torch.Tensor:
    """
    Implement 2D max pooling from scratch using tensor operations.
    
    Args:
        input_tensor: Input tensor of shape (batch, channels, height, width)
        kernel_size: Size of pooling window
        stride: Stride of pooling operation (defaults to kernel_size)
    
    Returns:
        Pooled tensor
    """
    if stride is None:
        stride = kernel_size
    
    batch, channels, height, width = input_tensor.shape
    
    # Calculate output dimensions
    out_height = (height - kernel_size) // stride + 1
    out_width = (width - kernel_size) // stride + 1
    
    # TODO: Implement max pooling using unfold and max operations
    # Use torch.nn.functional.unfold or manual indexing
    output = None
    
    return output

# TODO: Implement custom convolution operation
def custom_conv2d(input_tensor: torch.Tensor, kernel: torch.Tensor, stride: int = 1, padding: int = 0) -> torch.Tensor:
    """
    Implement 2D convolution from scratch.
    
    Args:
        input_tensor: Input tensor of shape (batch, in_channels, height, width)
        kernel: Convolution kernel of shape (out_channels, in_channels, kernel_height, kernel_width)
        stride: Convolution stride
        padding: Zero padding
    
    Returns:
        Convolved tensor
    """
    # TODO: Add padding if needed
    if padding > 0:
        input_tensor = None  # Add padding
    
    batch, in_channels, height, width = input_tensor.shape
    out_channels, _, kernel_h, kernel_w = kernel.shape
    
    # Calculate output dimensions
    out_height = (height - kernel_h) // stride + 1
    out_width = (width - kernel_w) // stride + 1
    
    # TODO: Implement convolution using unfold and matrix multiplication
    output = None
    
    return output

# TODO: Implement custom normalization
def custom_layer_norm(input_tensor: torch.Tensor, normalized_shape: List[int], eps: float = 1e-5) -> torch.Tensor:
    """
    Implement layer normalization from scratch.
    
    Args:
        input_tensor: Input tensor
        normalized_shape: Shape over which to normalize
        eps: Small value for numerical stability
    
    Returns:
        Normalized tensor
    """
    # TODO: Implement layer normalization
    # Calculate mean and variance over the normalized dimensions
    # Apply normalization: (x - mean) / sqrt(var + eps)
    
    # Determine which dimensions to normalize over
    dims_to_normalize = None
    
    # Calculate statistics
    mean = None
    var = None
    
    # Apply normalization
    normalized = None
    
    return normalized

# TODO: Implement efficient batched operations
def batched_matrix_operations(matrices: torch.Tensor, vectors: torch.Tensor) -> dict:
    """
    Implement efficient batched matrix operations.
    
    Args:
        matrices: Batch of matrices, shape (batch, n, m)
        vectors: Batch of vectors, shape (batch, m)
    
    Returns:
        Dictionary with operation results
    """
    results = {}
    
    # TODO: Batched matrix-vector multiplication
    results['matvec'] = None  # torch.bmm(matrices, vectors.unsqueeze(-1)).squeeze(-1)
    
    # TODO: Batched matrix inverse (for square matrices)
    if matrices.shape[1] == matrices.shape[2]:  # Square matrices
        results['inverse'] = None  # torch.inverse(matrices)
    
    # TODO: Batched determinant
    if matrices.shape[1] == matrices.shape[2]:  # Square matrices
        results['determinant'] = None  # torch.det(matrices)
    
    # TODO: Batched eigenvalues
    if matrices.shape[1] == matrices.shape[2]:  # Square matrices
        try:
            results['eigenvalues'] = None  # torch.linalg.eigvals(matrices)
        except:
            results['eigenvalues'] = None
    
    return results

# Test custom functions
print("Testing custom tensor functions:")

# Test custom max pooling
test_input = torch.randn(2, 3, 8, 8)
pooled = custom_max_pool2d(test_input, kernel_size=2, stride=2)
print(f"Max pooling: {test_input.shape} -> {pooled.shape if pooled is not None else 'None'}")

# Test custom convolution
test_kernel = torch.randn(16, 3, 3, 3)
convolved = custom_conv2d(test_input, test_kernel, stride=1, padding=1)
print(f"Convolution: {test_input.shape} -> {convolved.shape if convolved is not None else 'None'}")

# Test custom layer norm
test_data = torch.randn(10, 20, 30)
normalized = custom_layer_norm(test_data, [20, 30])
print(f"Layer norm: {test_data.shape} -> {normalized.shape if normalized is not None else 'None'}")

# Test batched operations
batch_matrices = torch.randn(5, 10, 10)
batch_vectors = torch.randn(5, 10)
batch_results = batched_matrix_operations(batch_matrices, batch_vectors)
print("\nBatched operations:")
for op, result in batch_results.items():
    if result is not None:
        print(f"  {op}: shape {result.shape}")
    else:
        print(f"  {op}: Not implemented")

In [None]:
# Test your custom tensor function implementations
try:
    test_custom_functions(locals())
    print("✅ Section 5: Custom Tensor Functions - All tests passed!")
except Exception as e:
    print(f"❌ Section 5: Custom Tensor Functions - Tests failed: {e}")
    print("Please complete the custom function implementations above before proceeding.")

## Final Validation

Run the complete test suite to validate all your tensor mastery implementations.

In [None]:
# Run complete validation
print("Running complete test suite...\n")

all_tests_passed = True
test_sections = [
    ("Advanced Indexing", test_advanced_indexing),
    ("Broadcasting Rules", test_broadcasting_rules),
    ("Memory Analysis", test_memory_analysis),
    ("Einsum Operations", test_einsum_operations),
    ("Custom Functions", test_custom_functions)
]

for section_name, test_func in test_sections:
    try:
        test_func(locals())
        print(f"✅ {section_name} - PASSED")
    except Exception as e:
        print(f"❌ {section_name} - FAILED: {e}")
        all_tests_passed = False

print("\n" + "="*50)
if all_tests_passed:
    print("🎉 ALL TESTS PASSED! You have successfully completed Exercise 3.")
    print("You are now ready to proceed to Module 2: Automatic Differentiation.")
else:
    print("❌ Some tests failed. Please review the failed sections and complete the missing implementations.")
print("="*50)

## Summary

In this exercise, you have mastered advanced PyTorch tensor operations:

1. **Advanced Indexing**: Fancy indexing, boolean masking, and conditional operations
2. **Broadcasting Rules**: Understanding and implementing complex broadcasting scenarios
3. **Memory Layout**: Analyzing tensor storage, strides, and optimizing for performance
4. **Einsum Operations**: Using Einstein summation for complex tensor manipulations
5. **Custom Functions**: Implementing tensor operations from scratch

These advanced tensor manipulation skills provide the foundation for:

- **Memory-Efficient Computing**: Understanding when operations create views vs copies
- **Performance Optimization**: Using the most efficient tensor operations
- **Complex Neural Networks**: Advanced indexing for attention mechanisms and transformers
- **Custom Layer Implementation**: Building novel neural network components
- **Scientific Computing**: Advanced mathematical operations with tensors

You now have the tensor manipulation expertise needed to implement sophisticated deep learning architectures and optimize them for performance. This knowledge will be essential when building custom neural network layers, implementing attention mechanisms, and working with complex tensor operations in advanced deep learning models.