# 50 Homework Assignments - Python & Pandas for Machine Learning

## üìö Overview

This notebook contains **50 comprehensive homework assignments** covering:
- **Part 1**: Python Data Structures (Tuples, Sets, Dictionaries) - 15 problems
- **Part 2**: Python Functions & Functional Programming - 10 problems
- **Part 3**: Pandas Basics & Data Manipulation - 15 problems
- **Part 4**: Pandas Advanced Operations - 10 problems

### üìä Difficulty Levels:
- üü¢ **Easy**: Basic concepts, direct application
- üü° **Medium**: Multiple concepts, some logic required
- üî¥ **Hard**: Complex logic, multiple steps, real-world scenarios

### üìù Instructions:
1. Read each problem carefully
2. Write your solution in the provided code cell
3. Test your solution with the given examples
4. Compare with the solution (hidden in collapsed cells)
5. Try to solve without looking at solutions first!

---

# Part 1: Python Data Structures (15 Problems)

## Section A: Tuples (5 Problems)

### Problem 1: Image Dimensions Handler üü¢

**Topic**: Tuple unpacking and operations

**Description**:
You're working with image data in machine learning. Create a function that:
1. Takes an image dimension tuple `(height, width, channels)`
2. Returns a dictionary with keys: 'height', 'width', 'channels', 'total_pixels', 'is_grayscale'
3. `total_pixels = height √ó width`
4. `is_grayscale = True` if channels == 1, else False

**Example**:
```python
image_info((224, 224, 3))
# Output: {'height': 224, 'width': 224, 'channels': 3, 'total_pixels': 50176, 'is_grayscale': False}

image_info((128, 128, 1))
# Output: {'height': 128, 'width': 128, 'channels': 1, 'total_pixels': 16384, 'is_grayscale': True}
```

In [1]:
# YOUR SOLUTION HERE
def image_info(dimensions):
    width, height, channels = dimensions
    return {
        "widht": width,
        "height": height,
        "channels": channels,
        "total_pixels": width * height,
        "is_grayscale": channels == 1
    }

# Test your solution
print(image_info((224, 224, 3)))
print(image_info((128, 128, 1)))

{'widht': 224, 'height': 224, 'channels': 3, 'total_pixels': 50176, 'is_grayscale': False}
{'widht': 128, 'height': 128, 'channels': 1, 'total_pixels': 16384, 'is_grayscale': True}


### Problem 2: Coordinate Distance Calculator üü¢

**Topic**: Tuple arithmetic and math operations

**Description**:
Calculate the Euclidean distance between two points in 2D space.
- Formula: `distance = ‚àö((x2-x1)¬≤ + (y2-y1)¬≤)`
- Input: Two tuples `(x, y)` representing coordinates
- Output: Float (rounded to 2 decimal places)

**Example**:
```python
distance((0, 0), (3, 4))  # Output: 5.0
distance((1, 2), (4, 6))  # Output: 5.0
```

**Hint**: Use `math.sqrt()` or `** 0.5` for square root

In [2]:
# YOUR SOLUTION HERE
import math

def distance(point1, point2):
    return math.sqrt((point2[0] - point1[0]) ** 2 + (point2[1] - point1[1]) ** 2)

# Test your solution
print(distance((0, 0), (3, 4)))
print(distance((1, 2), (4, 6)))

5.0
5.0


### Problem 3: Training Configuration Validator üü°

**Topic**: Tuple immutability and validation

**Description**:
Create a function that validates ML training configurations stored as tuples:
- Input: `(learning_rate, batch_size, epochs, optimizer)`
- Validate:
  - `learning_rate`: must be between 0.0001 and 1.0
  - `batch_size`: must be power of 2 (16, 32, 64, 128, etc.)
  - `epochs`: must be positive integer
  - `optimizer`: must be one of ['adam', 'sgd', 'rmsprop']
- Return: tuple `(is_valid, error_messages_list)`

**Example**:
```python
validate_config((0.001, 32, 100, 'adam'))
# Output: (True, [])

validate_config((2.0, 30, -5, 'bad_optimizer'))
# Output: (False, ['learning_rate out of range', 'batch_size not power of 2', 'epochs must be positive', 'invalid optimizer'])
```

In [3]:
# YOUR SOLUTION HERE
def validate_config(config):
    learning_rate, batch_size, epochs, optimizer = config

    Error = []

    if learning_rate < 0.0001 or learning_rate > 1:
         Error.append("learning_rate must be between 0.0001 and 1.0")

    if batch_size not in [16, 32, 64, 128, 256, 512]:
        Error.append("batch_size must be 16, 32, 64, 128, 256, or 512")

    if not isinstance(epochs, int) or epochs <=0:
        Error.append("epochs must be a positive integer")

    if optimizer.lower() not in ['adam', 'sgd', 'rmsprop']:
        Error.append("optimizer must be 'adam', 'sgd', or 'rmsprop'")

    return len(Error) == 0, Error

config = (2, 32, 2, 'adam')

# Test your solution
print(validate_config((0.001, 32, 100, 'adam')))
print(validate_config((2.0, 30, -5, 'bad_optimizer')))

(True, [])
(False, ['learning_rate must be between 0.0001 and 1.0', 'batch_size must be 16, 32, 64, 128, 256, or 512', 'epochs must be a positive integer', "optimizer must be 'adam', 'sgd', or 'rmsprop'"])


### Problem 4: Data Split Generator üü°

**Topic**: Tuple creation and list manipulation

**Description**:
Create a function that splits data indices for train/validation/test sets:
- Input: `total_samples` (int), `train_ratio`, `val_ratio`, `test_ratio` (floats that sum to 1.0)
- Output: Tuple of three ranges: `(train_indices, val_indices, test_indices)`
- Each element should be a tuple of (start_index, end_index)

**Example**:
```python
split_data(100, 0.7, 0.2, 0.1)
# Output: ((0, 70), (70, 90), (90, 100))

split_data(1000, 0.8, 0.1, 0.1)
# Output: ((0, 800), (800, 900), (900, 1000))
```

In [4]:
# YOUR SOLUTION HERE
def split_data(total_samples, train_ratio, val_ratio, test_ratio):
    train_end = int(total_samples * train_ratio)
    val_end = train_end + int(total_samples * val_ratio)

    train_indices = (0, train_end - 1)
    val_indices = (train_end, val_end - 1)
    test_indices = (val_end, total_samples - 1)

    return train_indices, val_indices, test_indices

# Test your solution
print(split_data(100, 0.7, 0.2, 0.1))
print(split_data(1000, 0.8, 0.1, 0.1))

((0, 69), (70, 89), (90, 99))
((0, 799), (800, 899), (900, 999))


### Problem 5: Model Metrics Comparison üî¥

**Topic**: Complex tuple operations and sorting

**Description**:
You have multiple ML models with their performance metrics as tuples:
- Input: List of tuples `[(model_name, accuracy, precision, recall, f1_score), ...]`
- Tasks:
  1. Find the best model for each metric
  2. Calculate average of all metrics across models
  3. Return a dictionary with: `'best_models'`, `'averages'`, `'ranked_by_f1'`

**Example**:
```python
models = [
    ('Model_A', 0.85, 0.82, 0.88, 0.85),
    ('Model_B', 0.92, 0.90, 0.85, 0.87),
    ('Model_C', 0.88, 0.85, 0.92, 0.88)
]

compare_models(models)
# Output: {
#     'best_models': {
#         'accuracy': 'Model_B',
#         'precision': 'Model_B',
#         'recall': 'Model_C',
#         'f1_score': 'Model_C'
#     },
#     'averages': {'accuracy': 0.88, 'precision': 0.86, 'recall': 0.88, 'f1_score': 0.87},
#     'ranked_by_f1': ['Model_C', 'Model_B', 'Model_A']
# }
```

In [None]:
# YOUR SOLUTION HERE
def compare_models(models):
    metrics_index = ["accuracy", "precision", "recall", "f1_score"]

    #Best model
    best_model = {}
    for metric in metrics_index:
        index = metrics_index[metric]
        best_name = ""
        best_value = 0

        for model in models:
            if model[index] > best_value:
                best_value = model[index]
                best_name = model[0]

        best_model[metric] = best_name   

    #Average
    #          

# Test your solution
models = [
    ('Model_A', 0.85, 0.82, 0.88, 0.85),
    ('Model_B', 0.92, 0.90, 0.85, 0.87),
    ('Model_C', 0.88, 0.85, 0.92, 0.88)
]
print(compare_models(models))

## Section B: Sets (5 Problems)

### Problem 6: Duplicate Detector üü¢

**Topic**: Set basics and uniqueness

**Description**:
Find duplicate values in a list and return statistics:
- Input: List of any values
- Output: Dictionary with:
  - `'has_duplicates'`: Boolean
  - `'unique_count'`: Number of unique values
  - `'duplicate_count'`: Number of duplicate values
  - `'duplicates'`: Set of duplicate values

**Example**:
```python
find_duplicates([1, 2, 3, 2, 4, 3, 5])
# Output: {'has_duplicates': True, 'unique_count': 5, 'duplicate_count': 2, 'duplicates': {2, 3}}

find_duplicates([1, 2, 3, 4, 5])
# Output: {'has_duplicates': False, 'unique_count': 5, 'duplicate_count': 0, 'duplicates': set()}
```

In [None]:
# YOUR SOLUTION HERE
def find_duplicates(data):
    # Write your code here
    pass

# Test your solution
print(find_duplicates([1, 2, 3, 2, 4, 3, 5]))
print(find_duplicates([1, 2, 3, 4, 5]))

### Problem 7: Feature Engineering - Categorical Encoder üü°

**Topic**: Set operations for ML preprocessing

**Description**:
Create a simple categorical encoder:
- Input: List of categorical values (strings)
- Output: Dictionary with:
  - `'encoding_map'`: Dictionary mapping each unique category to an integer (0, 1, 2, ...)
  - `'encoded_data'`: List of encoded integers
  - `'num_categories'`: Number of unique categories

**Example**:
```python
encode_categories(['cat', 'dog', 'cat', 'bird', 'dog', 'cat'])
# Output: {
#     'encoding_map': {'cat': 0, 'dog': 1, 'bird': 2},
#     'encoded_data': [0, 1, 0, 2, 1, 0],
#     'num_categories': 3
# }
```

In [None]:
# YOUR SOLUTION HERE
def encode_categories(data):
    # Write your code here
    pass

# Test your solution
print(encode_categories(['cat', 'dog', 'cat', 'bird', 'dog', 'cat']))

### Problem 8: Dataset Overlap Analyzer üü°

**Topic**: Set operations (union, intersection, difference)

**Description**:
Analyze overlap between training and test datasets:
- Input: Two lists (train_ids, test_ids)
- Output: Dictionary with:
  - `'total_unique'`: Total unique IDs across both sets
  - `'overlap'`: IDs present in both sets (data leakage!)
  - `'only_train'`: IDs only in training
  - `'only_test'`: IDs only in test
  - `'has_leakage'`: Boolean (True if overlap exists)

**Example**:
```python
analyze_overlap([1, 2, 3, 4, 5], [4, 5, 6, 7, 8])
# Output: {
#     'total_unique': 8,
#     'overlap': {4, 5},
#     'only_train': {1, 2, 3},
#     'only_test': {6, 7, 8},
#     'has_leakage': True
# }
```

In [None]:
# YOUR SOLUTION HERE
def analyze_overlap(train_ids, test_ids):
    # Write your code here
    pass

# Test your solution
print(analyze_overlap([1, 2, 3, 4, 5], [4, 5, 6, 7, 8]))

### Problem 9: Text Preprocessing - Stopword Remover üü°

**Topic**: Set operations for text processing

**Description**:
Remove stopwords from text using sets:
- Input: Text string and set of stopwords
- Output: Dictionary with:
  - `'original_words'`: List of all words
  - `'filtered_words'`: List of words after removing stopwords
  - `'removed_count'`: Number of words removed
  - `'unique_filtered'`: Set of unique words after filtering

**Example**:
```python
stopwords = {'the', 'is', 'a', 'an', 'in', 'on', 'at'}
text = "the cat is on the mat in the house"

remove_stopwords(text, stopwords)
# Output: {
#     'original_words': ['the', 'cat', 'is', 'on', 'the', 'mat', 'in', 'the', 'house'],
#     'filtered_words': ['cat', 'mat', 'house'],
#     'removed_count': 6,
#     'unique_filtered': {'cat', 'mat', 'house'}
# }
```

In [None]:
# YOUR SOLUTION HERE
def remove_stopwords(text, stopwords):
    # Write your code here
    pass

# Test your solution
stopwords = {'the', 'is', 'a', 'an', 'in', 'on', 'at'}
text = "the cat is on the mat in the house"
print(remove_stopwords(text, stopwords))

### Problem 10: Feature Selection - Correlation Filter üî¥

**Topic**: Advanced set operations

**Description**:
Filter highly correlated feature pairs:
- Input: Dictionary of feature pairs and their correlation: `{('f1', 'f2'): 0.95, ...}`
- Threshold: Correlation threshold (e.g., 0.9)
- Output: Dictionary with:
  - `'highly_correlated_pairs'`: Set of feature pairs above threshold
  - `'features_to_remove'`: Set of features to remove (keep only one from each pair)
  - `'features_to_keep'`: Set of features to keep

**Example**:
```python
correlations = {
    ('f1', 'f2'): 0.95,
    ('f1', 'f3'): 0.5,
    ('f2', 'f4'): 0.92,
    ('f3', 'f4'): 0.3
}

filter_correlated(correlations, threshold=0.9)
# Output: {
#     'highly_correlated_pairs': {('f1', 'f2'), ('f2', 'f4')},
#     'features_to_remove': {'f2'},  # f2 appears in both correlated pairs
#     'features_to_keep': {'f1', 'f3', 'f4'}
# }
```

In [None]:
# YOUR SOLUTION HERE
def filter_correlated(correlations, threshold):
    # Write your code here
    pass

# Test your solution
correlations = {
    ('f1', 'f2'): 0.95,
    ('f1', 'f3'): 0.5,
    ('f2', 'f4'): 0.92,
    ('f3', 'f4'): 0.3
}
print(filter_correlated(correlations, threshold=0.9))

## Section C: Dictionaries (5 Problems)

### Problem 11: Hyperparameter Grid Search üü¢

**Topic**: Dictionary basics and nested structures

**Description**:
Generate all combinations of hyperparameters for grid search:
- Input: Dictionary where keys are parameter names and values are lists of options
- Output: List of dictionaries, each representing one parameter combination

**Example**:
```python
params = {
    'learning_rate': [0.01, 0.1],
    'batch_size': [32, 64],
    'epochs': [10, 20]
}

generate_grid(params)
# Output: [
#     {'learning_rate': 0.01, 'batch_size': 32, 'epochs': 10},
#     {'learning_rate': 0.01, 'batch_size': 32, 'epochs': 20},
#     {'learning_rate': 0.01, 'batch_size': 64, 'epochs': 10},
#     ... (8 combinations total)
# ]
```

In [None]:
# YOUR SOLUTION HERE
from itertools import product

def generate_grid(params):
    # Write your code here
    # Hint: Use itertools.product for Cartesian product
    pass

# Test your solution
params = {
    'learning_rate': [0.01, 0.1],
    'batch_size': [32, 64],
    'epochs': [10, 20]
}
result = generate_grid(params)
print(f"Total combinations: {len(result)}")
print("First 3:", result[:3])

### Problem 12: Feature Statistics Calculator üü°

**Topic**: Dictionary operations and statistics

**Description**:
Calculate statistics for each feature in a dataset:
- Input: Dictionary where keys are feature names and values are lists of numbers
- Output: Dictionary with nested statistics for each feature:
  - 'mean', 'median', 'std', 'min', 'max', 'q1', 'q3'

**Example**:
```python
data = {
    'age': [25, 30, 35, 40, 45],
    'income': [50000, 60000, 70000, 80000, 90000]
}

calculate_stats(data)
# Output: {
#     'age': {'mean': 35.0, 'median': 35.0, 'std': 7.07, 'min': 25, 'max': 45, 'q1': 30.0, 'q3': 40.0},
#     'income': {'mean': 70000.0, 'median': 70000.0, 'std': 14142.14, 'min': 50000, 'max': 90000, 'q1': 60000.0, 'q3': 80000.0}
# }
```

In [None]:
# YOUR SOLUTION HERE
import statistics

def calculate_stats(data):
    # Write your code here
    # Hint: Use statistics module (mean, median, stdev) and sorted() for quartiles
    pass

# Test your solution
data = {
    'age': [25, 30, 35, 40, 45],
    'income': [50000, 60000, 70000, 80000, 90000]
}
print(calculate_stats(data))

### Problem 13: Data Aggregator üü°

**Topic**: Dictionary aggregation and grouping

**Description**:
Group and aggregate data by category:
- Input: List of dictionaries, each with 'category' and 'value' keys
- Output: Dictionary where:
  - Keys are categories
  - Values are dictionaries with: 'count', 'sum', 'avg', 'values_list'

**Example**:
```python
data = [
    {'category': 'A', 'value': 10},
    {'category': 'B', 'value': 20},
    {'category': 'A', 'value': 15},
    {'category': 'B', 'value': 25},
    {'category': 'A', 'value': 12}
]

aggregate_data(data)
# Output: {
#     'A': {'count': 3, 'sum': 37, 'avg': 12.33, 'values_list': [10, 15, 12]},
#     'B': {'count': 2, 'sum': 45, 'avg': 22.5, 'values_list': [20, 25]}
# }
```

In [None]:
# YOUR SOLUTION HERE
def aggregate_data(data):
    # Write your code here
    pass

# Test your solution
data = [
    {'category': 'A', 'value': 10},
    {'category': 'B', 'value': 20},
    {'category': 'A', 'value': 15},
    {'category': 'B', 'value': 25},
    {'category': 'A', 'value': 12}
]
print(aggregate_data(data))

### Problem 14: Confusion Matrix Builder üî¥

**Topic**: Nested dictionaries and ML metrics

**Description**:
Build a confusion matrix from predictions and actual values:
- Input: Two lists (actual, predicted) of same length
- Output: Dictionary with:
  - `'matrix'`: Nested dict {actual: {predicted: count}}
  - `'accuracy'`: Overall accuracy
  - `'per_class'`: Dict with precision, recall, f1 for each class

**Example**:
```python
actual = ['A', 'B', 'A', 'C', 'B', 'A', 'C', 'C']
predicted = ['A', 'B', 'B', 'C', 'A', 'A', 'C', 'B']

build_confusion_matrix(actual, predicted)
# Output: {
#     'matrix': {
#         'A': {'A': 2, 'B': 1},
#         'B': {'B': 1, 'A': 1},
#         'C': {'C': 2, 'B': 1}
#     },
#     'accuracy': 0.625,  # 5 correct out of 8
#     'per_class': {
#         'A': {'precision': 0.5, 'recall': 0.67, 'f1': 0.57},
#         'B': {'precision': 0.33, 'recall': 0.5, 'f1': 0.4},
#         'C': {'precision': 1.0, 'recall': 0.67, 'f1': 0.8}
#     }
# }
```

In [None]:
# YOUR SOLUTION HERE
def build_confusion_matrix(actual, predicted):
    # Write your code here
    pass

# Test your solution
actual = ['A', 'B', 'A', 'C', 'B', 'A', 'C', 'C']
predicted = ['A', 'B', 'B', 'C', 'A', 'A', 'C', 'B']
print(build_confusion_matrix(actual, predicted))

### Problem 15: Configuration Merger üî¥

**Topic**: Nested dictionary operations

**Description**:
Merge multiple configuration dictionaries with priority rules:
- Input: List of config dicts (later configs override earlier ones)
- Handle nested dictionaries (deep merge)
- Output: Single merged configuration

**Example**:
```python
configs = [
    {'model': {'layers': 3, 'dropout': 0.5}, 'lr': 0.01},
    {'model': {'layers': 5}, 'batch_size': 32},
    {'model': {'dropout': 0.3}, 'lr': 0.001}
]

merge_configs(configs)
# Output: {
#     'model': {'layers': 5, 'dropout': 0.3},
#     'lr': 0.001,
#     'batch_size': 32
# }
```

In [None]:
# YOUR SOLUTION HERE
def merge_configs(configs):
    # Write your code here
    # Hint: Use recursive approach for nested dicts
    pass

# Test your solution
configs = [
    {'model': {'layers': 3, 'dropout': 0.5}, 'lr': 0.01},
    {'model': {'layers': 5}, 'batch_size': 32},
    {'model': {'dropout': 0.3}, 'lr': 0.001}
]
print(merge_configs(configs))

---

# Part 2: Python Functions & Functional Programming (10 Problems)

## Section D: Functions (5 Problems)

### Problem 16: Decorator - Execution Timer üü°

**Topic**: Function decorators

**Description**:
Create a decorator that measures function execution time:
- Decorator should print: function name, execution time in milliseconds
- Should work with any function
- Return the original function's result

**Example**:
```python
@timer
def slow_function(n):
    time.sleep(n)
    return n * 2

result = slow_function(0.5)
# Output: "slow_function executed in 500.23 ms"
# result = 1.0
```

In [None]:
# YOUR SOLUTION HERE
import time

def timer(func):
    # Write your decorator here
    pass

# Test your decorator
@timer
def slow_function(n):
    time.sleep(n)
    return n * 2

result = slow_function(0.5)
print(f"Result: {result}")

### Problem 17: Memoization Cache üî¥

**Topic**: Closures and caching

**Description**:
Create a memoization decorator for expensive functions:
- Cache function results based on arguments
- Return cached result if same arguments are used again
- Track cache hits and misses

**Example**:
```python
@memoize
def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n-1) + fibonacci(n-2)

print(fibonacci(10))  # Computes and caches
print(fibonacci(10))  # Returns from cache
print(fibonacci.cache_info())  # {'hits': 1, 'misses': 11, 'size': 11}
```

In [None]:
# YOUR SOLUTION HERE
def memoize(func):
    # Write your memoization decorator here
    pass

# Test your decorator
@memoize
def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n-1) + fibonacci(n-2)

print("First call:", fibonacci(10))
print("Second call:", fibonacci(10))
print("Cache info:", fibonacci.cache_info())

### Problem 18: Partial Function Application üü°

**Topic**: Closures and function factories

**Description**:
Create a function factory for ML model evaluators:
- Input: Metric name ('accuracy', 'precision', 'recall', 'f1')
- Output: Function that calculates that metric given y_true and y_pred

**Example**:
```python
accuracy_fn = create_metric('accuracy')
precision_fn = create_metric('precision')

y_true = [1, 0, 1, 1, 0]
y_pred = [1, 0, 1, 0, 0]

print(accuracy_fn(y_true, y_pred))  # 0.8 (4/5 correct)
print(precision_fn(y_true, y_pred))  # 1.0 (all predicted 1s are correct)
```

In [None]:
# YOUR SOLUTION HERE
def create_metric(metric_name):
    # Write your function factory here
    pass

# Test your solution
accuracy_fn = create_metric('accuracy')
precision_fn = create_metric('precision')

y_true = [1, 0, 1, 1, 0]
y_pred = [1, 0, 1, 0, 0]

print("Accuracy:", accuracy_fn(y_true, y_pred))
print("Precision:", precision_fn(y_true, y_pred))

### Problem 19: Pipeline Builder üî¥

**Topic**: Function composition and *args

**Description**:
Create a pipeline that applies multiple functions in sequence:
- Input: Variable number of functions
- Output: Single function that applies all functions in order
- Data flows from left to right

**Example**:
```python
def normalize(x):
    return [i / max(x) for i in x]

def square(x):
    return [i ** 2 for i in x]

def sum_all(x):
    return sum(x)

pipeline = create_pipeline(normalize, square, sum_all)
result = pipeline([1, 2, 3, 4, 5])
# normalize: [0.2, 0.4, 0.6, 0.8, 1.0]
# square: [0.04, 0.16, 0.36, 0.64, 1.0]
# sum_all: 2.2
```

In [None]:
# YOUR SOLUTION HERE
def create_pipeline(*functions):
    # Write your pipeline builder here
    pass

# Test your solution
def normalize(x):
    return [i / max(x) for i in x]

def square(x):
    return [i ** 2 for i in x]

def sum_all(x):
    return sum(x)

pipeline = create_pipeline(normalize, square, sum_all)
result = pipeline([1, 2, 3, 4, 5])
print(f"Result: {result}")

### Problem 20: Currying üî¥

**Topic**: Currying and partial application

**Description**:
Transform a function that takes multiple arguments into a sequence of functions each taking a single argument:
- Input: Function with N parameters
- Output: Curried version that can be called step by step

**Example**:
```python
def add_three(a, b, c):
    return a + b + c

curried_add = curry(add_three)
result = curried_add(1)(2)(3)  # 6

# Or partial application
add_1 = curried_add(1)
add_1_2 = add_1(2)
result = add_1_2(3)  # 6
```

In [None]:
# YOUR SOLUTION HERE
def curry(func):
    # Write your currying function here
    # Hint: Use inspect.signature to get function arity
    pass

# Test your solution
def add_three(a, b, c):
    return a + b + c

curried_add = curry(add_three)
result1 = curried_add(1)(2)(3)
print(f"Curried: {result1}")

add_1 = curried_add(1)
add_1_2 = add_1(2)
result2 = add_1_2(3)
print(f"Partial application: {result2}")

## Section E: Functional Programming (5 Problems)

### Problem 21: Data Transformation Pipeline üü¢

**Topic**: map, filter, lambda

**Description**:
Transform raw data using functional programming:
- Input: List of strings representing numbers: `['1', '2', '3', '4', '5', '6']`
- Tasks:
  1. Convert to integers
  2. Filter even numbers only
  3. Square each number
  4. Return sum
- Use map, filter, lambda (no loops!)

**Example**:
```python
transform_data(['1', '2', '3', '4', '5', '6'])
# Steps: ['1','2','3','4','5','6'] -> [1,2,3,4,5,6] -> [2,4,6] -> [4,16,36] -> 56
# Output: 56
```

In [None]:
# YOUR SOLUTION HERE
def transform_data(data):
    # Write your solution using map, filter, lambda
    pass

# Test your solution
result = transform_data(['1', '2', '3', '4', '5', '6'])
print(f"Result: {result}")

### Problem 22: Feature Scaler üü°

**Topic**: Lambda and list comprehensions

**Description**:
Create different scaling functions using lambdas:
- Min-Max scaling: `(x - min) / (max - min)`
- Z-score normalization: `(x - mean) / std`
- Robust scaling: `(x - median) / IQR`
- Input: List of numbers and scaling type
- Output: Scaled list

**Example**:
```python
data = [1, 2, 3, 4, 5]

scale_features(data, 'minmax')
# Output: [0.0, 0.25, 0.5, 0.75, 1.0]

scale_features(data, 'zscore')
# Output: [-1.41, -0.71, 0.0, 0.71, 1.41]
```

In [None]:
# YOUR SOLUTION HERE
import statistics

def scale_features(data, method):
    # Write your scaling function here
    pass

# Test your solution
data = [1, 2, 3, 4, 5]
print("Min-Max:", scale_features(data, 'minmax'))
print("Z-score:", scale_features(data, 'zscore'))
print("Robust:", scale_features(data, 'robust'))

### Problem 23: reduce() - Custom Aggregations üü°

**Topic**: reduce() for complex aggregations

**Description**:
Use `reduce()` to implement custom aggregation functions:
1. Product of all elements
2. Maximum value
3. Concatenate strings with separator
4. Flatten nested lists

**Example**:
```python
product([1, 2, 3, 4, 5])  # 120
maximum([3, 1, 4, 1, 5, 9, 2, 6])  # 9
join_strings(['Hello', 'World', 'ML'], ' ')  # 'Hello World ML'
flatten([[1, 2], [3, 4], [5, 6]])  # [1, 2, 3, 4, 5, 6]
```

In [None]:
# YOUR SOLUTION HERE
from functools import reduce

def product(numbers):
    # Write using reduce
    pass

def maximum(numbers):
    # Write using reduce
    pass

def join_strings(strings, separator):
    # Write using reduce
    pass

def flatten(nested_list):
    # Write using reduce
    pass

# Test your solutions
print("Product:", product([1, 2, 3, 4, 5]))
print("Maximum:", maximum([3, 1, 4, 1, 5, 9, 2, 6]))
print("Join:", join_strings(['Hello', 'World', 'ML'], ' '))
print("Flatten:", flatten([[1, 2], [3, 4], [5, 6]]))

### Problem 24: Comprehension Olympics üî¥

**Topic**: Nested comprehensions and conditional logic

**Description**:
Solve these using comprehensions (no loops!):

1. **Matrix Transpose**: Transpose a 2D matrix
2. **Flatten with Condition**: Flatten matrix but only include even numbers
3. **Cartesian Product**: All pairs from two lists where sum > 5
4. **Dict Inversion**: Invert dict (values become keys), handle duplicates

**Example**:
```python
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
transpose(matrix)  # [[1, 4, 7], [2, 5, 8], [3, 6, 9]]

flatten_even(matrix)  # [2, 4, 6, 8]

cartesian([1, 2, 3], [4, 5, 6])  # [(2, 4), (2, 5), (2, 6), (3, 4), (3, 5), (3, 6)]

invert_dict({'a': 1, 'b': 2, 'c': 1})  # {1: ['a', 'c'], 2: ['b']}
```

In [None]:
# YOUR SOLUTION HERE
def transpose(matrix):
    # Write using list comprehension
    pass

def flatten_even(matrix):
    # Write using list comprehension
    pass

def cartesian(list1, list2, threshold=5):
    # Write using list comprehension
    pass

def invert_dict(d):
    # Write using dict comprehension
    pass

# Test your solutions
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
print("Transpose:", transpose(matrix))
print("Flatten even:", flatten_even(matrix))
print("Cartesian:", cartesian([1, 2, 3], [4, 5, 6]))
print("Invert:", invert_dict({'a': 1, 'b': 2, 'c': 1}))

### Problem 25: Function Combinator üî¥

**Topic**: Higher-order functions

**Description**:
Create function combinators:
1. `compose`: Combine functions right-to-left: `compose(f, g, h)(x)` = `f(g(h(x)))`
2. `pipe`: Combine functions left-to-right: `pipe(f, g, h)(x)` = `h(g(f(x)))`
3. `parallel`: Apply multiple functions and collect results: `parallel(f, g)(x)` = `[f(x), g(x)]`
4. `conditional`: Apply function based on predicate: `conditional(pred, f, g)(x)` = `f(x) if pred(x) else g(x)`

**Example**:
```python
add_2 = lambda x: x + 2
mult_3 = lambda x: x * 3
square = lambda x: x ** 2

compose(square, mult_3, add_2)(5)  # square(mult_3(add_2(5))) = square(21) = 441
pipe(add_2, mult_3, square)(5)  # square(mult_3(add_2(5))) = square(21) = 441
parallel(add_2, mult_3, square)(5)  # [7, 15, 25]
```

In [None]:
# YOUR SOLUTION HERE
def compose(*functions):
    # Write composer (right-to-left)
    pass

def pipe(*functions):
    # Write pipe (left-to-right)
    pass

def parallel(*functions):
    # Write parallel applicator
    pass

def conditional(predicate, true_func, false_func):
    # Write conditional applicator
    pass

# Test your solutions
add_2 = lambda x: x + 2
mult_3 = lambda x: x * 3
square = lambda x: x ** 2

print("Compose:", compose(square, mult_3, add_2)(5))
print("Pipe:", pipe(add_2, mult_3, square)(5))
print("Parallel:", parallel(add_2, mult_3, square)(5))
print("Conditional:", conditional(lambda x: x > 0, add_2, mult_3)(5))
print("Conditional:", conditional(lambda x: x > 0, add_2, mult_3)(-5))

---

# Part 3: Pandas Basics & Data Manipulation (15 Problems)

## Section F: Series & DataFrames (5 Problems)

### Problem 26: Series Statistics Calculator üü¢

**Topic**: Series creation and basic operations

**Description**:
Create a Pandas Series from a list and calculate comprehensive statistics:
- Input: List of numbers
- Output: Dictionary with: mean, median, std, var, min, max, q1, q3, skew

**Example**:
```python
import pandas as pd

series_stats([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
# Output: {'mean': 5.5, 'median': 5.5, 'std': 3.03, 'var': 9.17, 'min': 1.0, 'max': 10.0, 'q1': 3.25, 'q3': 7.75, 'skew': 0.0}
```

In [None]:
# YOUR SOLUTION HERE
import pandas as pd
import numpy as np

def series_stats(data):
    # Write your code here
    pass

# Test your solution
result = series_stats([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
print(result)

### Problem 27: DataFrame from Multiple Sources üü¢

**Topic**: DataFrame creation from different data structures

**Description**:
Create a function that builds a DataFrame from multiple input formats:
- Dictionary of lists
- List of dictionaries
- List of tuples with column names
- Dictionary of Series

**Example**:
```python
# From dict of lists
create_dataframe({'A': [1, 2, 3], 'B': [4, 5, 6]})

# From list of dicts
create_dataframe([{'A': 1, 'B': 4}, {'A': 2, 'B': 5}, {'A': 3, 'B': 6}])

# Both produce:
#    A  B
# 0  1  4
# 1  2  5
# 2  3  6
```

In [None]:
# YOUR SOLUTION HERE
import pandas as pd

def create_dataframe(data, columns=None):
    # Write your code here
    # Handle different input types
    pass

# Test your solution
print("Method 1: Dict of lists")
print(create_dataframe({'A': [1, 2, 3], 'B': [4, 5, 6]}))

print("\nMethod 2: List of dicts")
print(create_dataframe([{'A': 1, 'B': 4}, {'A': 2, 'B': 5}, {'A': 3, 'B': 6}]))

### Problem 28: Custom Index Creator üü°

**Topic**: DataFrame indexing and custom indices

**Description**:
Create a DataFrame with custom index based on requirements:
- Input: Data and index type ('numeric', 'date', 'custom')
- For 'numeric': Start from 100, step by 5
- For 'date': Daily dates from '2024-01-01'
- For 'custom': Use provided list as index

**Example**:
```python
data = {'Score': [85, 90, 78, 92, 88]}

create_with_index(data, 'numeric')
# Index: [100, 105, 110, 115, 120]

create_with_index(data, 'date')
# Index: DatetimeIndex(['2024-01-01', '2024-01-02', ...])

create_with_index(data, 'custom', custom_index=['A', 'B', 'C', 'D', 'E'])
# Index: ['A', 'B', 'C', 'D', 'E']
```

In [None]:
# YOUR SOLUTION HERE
import pandas as pd

def create_with_index(data, index_type, custom_index=None):
    # Write your code here
    pass

# Test your solution
data = {'Score': [85, 90, 78, 92, 88]}

print("Numeric index:")
print(create_with_index(data, 'numeric'))

print("\nDate index:")
print(create_with_index(data, 'date'))

print("\nCustom index:")
print(create_with_index(data, 'custom', custom_index=['A', 'B', 'C', 'D', 'E']))

### Problem 29: DataFrame Info Extractor üü°

**Topic**: DataFrame inspection and metadata

**Description**:
Extract comprehensive information about a DataFrame:
- Shape, column names, dtypes, memory usage
- Missing values count per column
- Unique values count per column
- Basic statistics for numeric columns

**Example**:
```python
df = pd.DataFrame({
    'Age': [25, 30, None, 35, 40],
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Salary': [50000, 60000, 70000, None, 90000]
})

extract_info(df)
# Output: {
#     'shape': (5, 3),
#     'columns': ['Age', 'Name', 'Salary'],
#     'dtypes': {'Age': 'float64', 'Name': 'object', 'Salary': 'float64'},
#     'missing_values': {'Age': 1, 'Name': 0, 'Salary': 1},
#     'unique_counts': {'Age': 4, 'Name': 5, 'Salary': 4},
#     'memory_mb': 0.00024,
#     'numeric_summary': {...}
# }
```

In [None]:
# YOUR SOLUTION HERE
import pandas as pd

def extract_info(df):
    # Write your code here
    pass

# Test your solution
df = pd.DataFrame({
    'Age': [25, 30, None, 35, 40],
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Salary': [50000, 60000, 70000, None, 90000]
})

info = extract_info(df)
print("DataFrame Info:")
for key, value in info.items():
    print(f"{key}: {value}")

### Problem 30: Row and Column Analyzer üî¥

**Topic**: Advanced DataFrame operations

**Description**:
Analyze rows and columns of a DataFrame:
- Find rows with any/all missing values
- Find columns with >50% missing data
- Identify duplicate rows
- Find numeric columns that are highly correlated (>0.9)
- Return detailed analysis dictionary

**Example**:
```python
df = pd.DataFrame({
    'A': [1, 2, None, 4, 5],
    'B': [None, None, None, None, 5],
    'C': [1, 2, 3, 4, 5],
    'D': [1.1, 2.1, 3.1, 4.1, 5.1]  # Highly correlated with C
})

analyze_dataframe(df)
# Output: {
#     'rows_with_missing': [0, 1, 2, 3],
#     'rows_all_missing': [],
#     'cols_over_50_missing': ['B'],
#     'duplicate_rows': [],
#     'highly_correlated_pairs': [('C', 'D', 0.999)]
# }
```

In [None]:
# YOUR SOLUTION HERE
import pandas as pd
import numpy as np

def analyze_dataframe(df):
    # Write your code here
    pass

# Test your solution
df = pd.DataFrame({
    'A': [1, 2, None, 4, 5],
    'B': [None, None, None, None, 5],
    'C': [1, 2, 3, 4, 5],
    'D': [1.1, 2.1, 3.1, 4.1, 5.1]
})

analysis = analyze_dataframe(df)
print("DataFrame Analysis:")
for key, value in analysis.items():
    print(f"{key}: {value}")

## Section G: Data Selection & Filtering (5 Problems)

### Problem 31: Multi-Column Selector üü¢

**Topic**: Column selection techniques

**Description**:
Select columns from DataFrame based on different criteria:
- By data type (numeric, object, datetime)
- By name pattern (contains, starts with, ends with)
- By value threshold (mean > threshold)
- Return new DataFrame with selected columns

**Example**:
```python
df = pd.DataFrame({
    'age': [25, 30, 35],
    'name': ['Alice', 'Bob', 'Charlie'],
    'salary': [50000, 60000, 70000],
    'bonus': [5000, 6000, 7000]
})

select_columns(df, method='numeric')
# Returns: DataFrame with ['age', 'salary', 'bonus']

select_columns(df, method='contains', pattern='sal')
# Returns: DataFrame with ['salary']
```

In [None]:
# YOUR SOLUTION HERE
import pandas as pd

def select_columns(df, method='all', pattern=None, dtype=None):
    # Write your code here
    pass

# Test your solution
df = pd.DataFrame({
    'age': [25, 30, 35],
    'name': ['Alice', 'Bob', 'Charlie'],
    'salary': [50000, 60000, 70000],
    'bonus': [5000, 6000, 7000]
})

print("Numeric columns:")
print(select_columns(df, method='numeric'))

print("\nColumns containing 'sal':")
print(select_columns(df, method='contains', pattern='sal'))

### Problem 32: Advanced Row Filtering üü°

**Topic**: Complex boolean indexing

**Description**:
Filter DataFrame rows using multiple conditions:
- Single condition (column > value)
- AND condition (col1 > val1 AND col2 < val2)
- OR condition (col1 > val1 OR col2 > val2)
- BETWEEN condition (val1 < col < val2)
- IN condition (col.isin([list]))
- String matching (col.str.contains())

**Example**:
```python
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'David'],
    'age': [25, 30, 35, 40],
    'city': ['NYC', 'LA', 'NYC', 'Chicago'],
    'salary': [50000, 60000, 70000, 80000]
})

filter_rows(df, conditions={'age': ('>', 30), 'city': ('==', 'NYC')}, logic='AND')
# Returns rows where age > 30 AND city == 'NYC'
```

In [None]:
# YOUR SOLUTION HERE
import pandas as pd

def filter_rows(df, conditions, logic='AND'):
    # Write your code here
    # conditions format: {'column': ('operator', value)}
    pass

# Test your solution
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'David'],
    'age': [25, 30, 35, 40],
    'city': ['NYC', 'LA', 'NYC', 'Chicago'],
    'salary': [50000, 60000, 70000, 80000]
})

print("Age > 30:")
print(filter_rows(df, {'age': ('>', 30)}, logic='AND'))

print("\nAge > 30 AND city == NYC:")
print(filter_rows(df, {'age': ('>', 30), 'city': ('==', 'NYC')}, logic='AND'))

### Problem 33: Loc vs iLoc Master üü°

**Topic**: Label-based and position-based indexing

**Description**:
Practice both .loc and .iloc for various selection tasks:
1. Select specific rows and columns by label
2. Select specific rows and columns by position
3. Select rows based on condition and specific columns
4. Select every nth row
5. Select last 3 rows and first 2 columns

**Example**:
```python
df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50],
    'C': [100, 200, 300, 400, 500]
}, index=['row1', 'row2', 'row3', 'row4', 'row5'])

# Using .loc
select_data(df, method='loc', rows=['row1', 'row3'], cols=['A', 'C'])
# Returns:
#       A    C
# row1  1  100
# row3  3  300

# Using .iloc
select_data(df, method='iloc', rows=[0, 2], cols=[0, 2])
# Same result but using positions
```

In [None]:
# YOUR SOLUTION HERE
import pandas as pd

def select_data(df, method='loc', rows=None, cols=None):
    # Write your code here
    pass

# Test your solution
df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50],
    'C': [100, 200, 300, 400, 500]
}, index=['row1', 'row2', 'row3', 'row4', 'row5'])

print("Using .loc:")
print(select_data(df, method='loc', rows=['row1', 'row3'], cols=['A', 'C']))

print("\nUsing .iloc:")
print(select_data(df, method='iloc', rows=[0, 2], cols=[0, 2]))

### Problem 34: Query Method Expert üü°

**Topic**: DataFrame.query() for readable filtering

**Description**:
Use the .query() method to filter DataFrames with string expressions:
- Simple comparisons
- Multiple conditions with 'and', 'or'
- Use of external variables with @
- String column operations

**Example**:
```python
df = pd.DataFrame({
    'age': [25, 30, 35, 40, 45],
    'salary': [50000, 60000, 70000, 80000, 90000],
    'department': ['IT', 'HR', 'IT', 'Sales', 'IT']
})

threshold = 65000

filter_with_query(df, "age > 30 and salary > @threshold")
# Returns rows where age > 30 and salary > 65000

filter_with_query(df, "department == 'IT'")
# Returns all IT department rows
```

In [None]:
# YOUR SOLUTION HERE
import pandas as pd

def filter_with_query(df, query_string):
    # Write your code here
    pass

# Test your solution
df = pd.DataFrame({
    'age': [25, 30, 35, 40, 45],
    'salary': [50000, 60000, 70000, 80000, 90000],
    'department': ['IT', 'HR', 'IT', 'Sales', 'IT']
})

threshold = 65000

print("Age > 30 and salary > threshold:")
print(filter_with_query(df, "age > 30 and salary > @threshold"))

print("\nDepartment == 'IT':")
print(filter_with_query(df, "department == 'IT'"))

### Problem 35: Top N Selector üî¥

**Topic**: nlargest, nsmallest, and ranking

**Description**:
Create a function that finds top/bottom N rows based on multiple criteria:
- Top N by single column
- Top N by multiple columns (with priority)
- Bottom N with ties handling
- Percentage-based selection (top 20%)

**Example**:
```python
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'score': [85, 92, 88, 92, 78],
    'time': [120, 100, 110, 95, 130]
})

select_top_n(df, n=2, column='score', method='largest')
# Returns:
#      name  score  time
# 1     Bob     92   100
# 3   David     92    95

select_top_n(df, n=2, columns=['score', 'time'], method='largest', priority='score')
# First sort by score (desc), then by time (desc)
```

In [None]:
# YOUR SOLUTION HERE
import pandas as pd

def select_top_n(df, n=5, column=None, columns=None, method='largest', keep='all'):
    # Write your code here
    pass

# Test your solution
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'score': [85, 92, 88, 92, 78],
    'time': [120, 100, 110, 95, 130]
})

print("Top 2 by score:")
print(select_top_n(df, n=2, column='score', method='largest'))

print("\nBottom 2 by time:")
print(select_top_n(df, n=2, column='time', method='smallest'))

## Section G: GroupBy & Aggregation (5 Problems)

### Problem 36: Basic GroupBy Aggregation üü¢

**Topic**: Single column groupby with simple aggregations

**Description**:
Group data by a column and apply aggregations:
- Sum, mean, median, count, std, min, max
- Custom aggregation functions
- Return dictionary with aggregation results

**Example**:
```python
df = pd.DataFrame({
    'category': ['A', 'B', 'A', 'B', 'A', 'C'],
    'value': [10, 20, 30, 40, 50, 60],
    'count': [1, 2, 3, 4, 5, 6]
})

group_and_aggregate(df, groupby='category', agg_col='value', agg_func='mean')
# Output:
# {
#     'A': 30.0,
#     'B': 30.0,
#     'C': 60.0
# }
```

In [None]:
# YOUR SOLUTION HERE
import pandas as pd

def group_and_aggregate(df, groupby, agg_col, agg_func='mean'):
    # Write your code here
    pass

# Test your solution
df = pd.DataFrame({
    'category': ['A', 'B', 'A', 'B', 'A', 'C'],
    'value': [10, 20, 30, 40, 50, 60],
    'count': [1, 2, 3, 4, 5, 6]
})

print("Mean by category:")
print(group_and_aggregate(df, groupby='category', agg_col='value', agg_func='mean'))

print("\nSum by category:")
print(group_and_aggregate(df, groupby='category', agg_col='value', agg_func='sum'))

### Problem 37: Multiple Aggregations üü°

**Topic**: Apply multiple aggregation functions at once

**Description**:
Group by column and apply multiple aggregations to different columns:
- Use .agg() with dictionary
- Named aggregations
- Return DataFrame with multi-level columns

**Example**:
```python
df = pd.DataFrame({
    'department': ['IT', 'HR', 'IT', 'HR', 'Sales'],
    'employee': ['A', 'B', 'C', 'D', 'E'],
    'salary': [50000, 45000, 55000, 48000, 52000],
    'experience': [5, 3, 7, 4, 6]
})

multi_agg(df, groupby='department')
# Returns DataFrame with:
#            salary              experience
#              mean  max  min      mean  max
# department                                
# HR          46500  48000  45000  3.5    4
# IT          52500  55000  50000  6.0    7
# Sales       52000  52000  52000  6.0    6
```

In [None]:
# YOUR SOLUTION HERE
import pandas as pd

def multi_agg(df, groupby):
    # Write your code here
    # Apply multiple aggregations
    pass

# Test your solution
df = pd.DataFrame({
    'department': ['IT', 'HR', 'IT', 'HR', 'Sales'],
    'employee': ['A', 'B', 'C', 'D', 'E'],
    'salary': [50000, 45000, 55000, 48000, 52000],
    'experience': [5, 3, 7, 4, 6]
})

result = multi_agg(df, groupby='department')
print(result)

### Problem 38: Transform and Apply üü°

**Topic**: GroupBy transform and apply methods

**Description**:
Use transform and apply for group-wise operations:
- Transform: Normalize values within each group (subtract group mean, divide by group std)
- Apply: Custom function to each group
- Return transformed DataFrame

**Example**:
```python
df = pd.DataFrame({
    'group': ['A', 'A', 'B', 'B', 'C', 'C'],
    'value': [10, 20, 30, 40, 50, 60]
})

# Normalize within groups
group_normalize(df, groupby='group', column='value')
# Returns DataFrame with normalized values:
#   group  value  normalized
# 0     A     10       -1.0
# 1     A     20        1.0
# 2     B     30       -1.0
# 3     B     40        1.0
# 4     C     50       -1.0
# 5     C     60        1.0
```

In [None]:
# YOUR SOLUTION HERE
import pandas as pd

def group_normalize(df, groupby, column):
    # Write your code here
    # Use transform to normalize within groups
    pass

# Test your solution
df = pd.DataFrame({
    'group': ['A', 'A', 'B', 'B', 'C', 'C'],
    'value': [10, 20, 30, 40, 50, 60]
})

result = group_normalize(df, groupby='group', column='value')
print(result)

### Problem 39: Pivot Table Creator üü°

**Topic**: Pivot tables and cross-tabulation

**Description**:
Create pivot tables from DataFrames:
- Simple pivot with one index and one column
- Multiple aggregations
- Add margins (totals)
- Fill missing values

**Example**:
```python
df = pd.DataFrame({
    'date': ['2024-01', '2024-01', '2024-02', '2024-02'],
    'product': ['A', 'B', 'A', 'B'],
    'sales': [100, 150, 120, 180],
    'quantity': [10, 15, 12, 18]
})

create_pivot(df, index='date', columns='product', values='sales')
# Returns:
# product     A      B
# date               
# 2024-01   100    150
# 2024-02   120    180
```

In [None]:
# YOUR SOLUTION HERE
import pandas as pd

def create_pivot(df, index, columns, values, aggfunc='sum', fill_value=None):
    # Write your code here
    pass

# Test your solution
df = pd.DataFrame({
    'date': ['2024-01', '2024-01', '2024-02', '2024-02'],
    'product': ['A', 'B', 'A', 'B'],
    'sales': [100, 150, 120, 180],
    'quantity': [10, 15, 12, 18]
})

print("Pivot table:")
print(create_pivot(df, index='date', columns='product', values='sales'))

### Problem 40: Advanced GroupBy with Filtering üî¥

**Topic**: Complex groupby operations with filtering

**Description**:
Perform advanced groupby operations:
- Filter groups based on group properties (size, sum, etc.)
- Rank within groups
- Calculate percentages within groups
- Find top N within each group

**Example**:
```python
df = pd.DataFrame({
    'store': ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'C', 'C'],
    'product': ['X', 'Y', 'Z', 'X', 'Y', 'X', 'Y', 'Z', 'W'],
    'sales': [100, 150, 200, 120, 180, 90, 110, 130, 140]
})

advanced_groupby(df, groupby='store', operations={
    'filter_size': 3,  # Keep only groups with 3+ items
    'top_n': 2,  # Top 2 products per store
    'rank': True,  # Add rank within group
    'pct': True  # Add percentage of total within group
})
```

In [None]:
# YOUR SOLUTION HERE
import pandas as pd

def advanced_groupby(df, groupby, operations):
    # Write your code here
    # Implement filter, rank, top N, percentages
    pass

# Test your solution
df = pd.DataFrame({
    'store': ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'C', 'C'],
    'product': ['X', 'Y', 'Z', 'X', 'Y', 'X', 'Y', 'Z', 'W'],
    'sales': [100, 150, 200, 120, 180, 90, 110, 130, 140]
})

operations = {'filter_size': 3, 'top_n': 2, 'rank': True}
result = advanced_groupby(df, groupby='store', operations=operations)
print(result)

---

# Part 4: Pandas Advanced Operations (10 Problems)

## Section I: Missing Data (3 Problems)

### Problem 41: Missing Data Detector üü¢

**Topic**: Identifying missing values

**Description**:
Create comprehensive missing data analysis:
- Count missing values per column
- Percentage of missing values
- Identify patterns (missing in pairs, etc.)
- Visualize missing data (text-based representation)

**Example**:
```python
df = pd.DataFrame({
    'A': [1, 2, None, 4, 5],
    'B': [None, 2, 3, None, 5],
    'C': [1, 2, 3, 4, 5]
})

analyze_missing(df)
# Output: {
#     'total_missing': 3,
#     'by_column': {'A': 1, 'B': 2, 'C': 0},
#     'percentage': {'A': 20.0, 'B': 40.0, 'C': 0.0},
#     'rows_with_missing': [0, 2, 3],
#     'complete_rows': 2
# }
```

In [None]:
# YOUR SOLUTION HERE
import pandas as pd
import numpy as np

def analyze_missing(df):
    # Write your code here
    pass

# Test your solution
df = pd.DataFrame({
    'A': [1, 2, None, 4, 5],
    'B': [None, 2, 3, None, 5],
    'C': [1, 2, 3, 4, 5]
})

result = analyze_missing(df)
print("Missing Data Analysis:")
for key, value in result.items():
    print(f"{key}: {value}")

### Problem 42: Smart Missing Value Imputer üü°

**Topic**: Filling missing values with different strategies

**Description**:
Implement multiple imputation strategies:
- Fill with mean/median/mode
- Forward fill (ffill) / Backward fill (bfill)
- Interpolate (linear, polynomial)
- Fill with group mean (groupby then fill)
- Fill with predictive model (simple linear regression)

**Example**:
```python
df = pd.DataFrame({
    'group': ['A', 'A', 'A', 'B', 'B', 'B'],
    'value': [10, None, 30, 40, None, 60]
})

impute_missing(df, column='value', method='mean')
# Fills with overall mean: 35

impute_missing(df, column='value', method='group_mean', groupby='group')
# Fills with group mean: A=20, B=50
```

In [None]:
# YOUR SOLUTION HERE
import pandas as pd
import numpy as np

def impute_missing(df, column, method='mean', groupby=None):
    # Write your code here
    # Implement different imputation strategies
    pass

# Test your solution
df = pd.DataFrame({
    'group': ['A', 'A', 'A', 'B', 'B', 'B'],
    'value': [10.0, None, 30.0, 40.0, None, 60.0]
})

print("Original:")
print(df)

print("\nImpute with mean:")
print(impute_missing(df, column='value', method='mean'))

print("\nImpute with group mean:")
print(impute_missing(df, column='value', method='group_mean', groupby='group'))

### Problem 43: Missing Data Dropper üü°

**Topic**: Removing rows/columns with missing values

**Description**:
Implement smart dropping strategies:
- Drop rows with any missing values
- Drop rows with all missing values
- Drop rows with > threshold% missing
- Drop columns with > threshold% missing
- Keep only rows with at least N non-null values

**Example**:
```python
df = pd.DataFrame({
    'A': [1, None, None, 4],
    'B': [None, None, None, None],
    'C': [1, 2, 3, 4],
    'D': [1, 2, None, 4]
})

drop_missing(df, axis=0, how='any', thresh=None)
# Drops rows [0, 1, 2] (have any missing)

drop_missing(df, axis=1, how='all')
# Drops column B (all missing)
```

In [None]:
# YOUR SOLUTION HERE
import pandas as pd
import numpy as np

def drop_missing(df, axis=0, how='any', thresh=None, subset=None):
    # Write your code here
    pass

# Test your solution
df = pd.DataFrame({
    'A': [1, None, None, 4],
    'B': [None, None, None, None],
    'C': [1, 2, 3, 4],
    'D': [1, 2, None, 4]
})

print("Original:")
print(df)

print("\nDrop rows with any missing:")
print(drop_missing(df, axis=0, how='any'))

print("\nDrop columns with all missing:")
print(drop_missing(df, axis=1, how='all'))

## Section J: Merge & Join (4 Problems)

### Problem 44: Basic Merge Operations üü¢

**Topic**: pd.merge() with different join types

**Description**:
Implement all four join types:
- Inner join (intersection)
- Left join (all from left)
- Right join (all from right)
- Outer join (union)

**Example**:
```python
df1 = pd.DataFrame({
    'key': ['A', 'B', 'C'],
    'value1': [1, 2, 3]
})

df2 = pd.DataFrame({
    'key': ['B', 'C', 'D'],
    'value2': [4, 5, 6]
})

merge_dataframes(df1, df2, on='key', how='inner')
# Returns:
#   key  value1  value2
# 0   B       2       4
# 1   C       3       5
```

In [None]:
# YOUR SOLUTION HERE
import pandas as pd

def merge_dataframes(df1, df2, on=None, how='inner', left_on=None, right_on=None):
    # Write your code here
    pass

# Test your solution
df1 = pd.DataFrame({
    'key': ['A', 'B', 'C'],
    'value1': [1, 2, 3]
})

df2 = pd.DataFrame({
    'key': ['B', 'C', 'D'],
    'value2': [4, 5, 6]
})

print("Inner join:")
print(merge_dataframes(df1, df2, on='key', how='inner'))

print("\nLeft join:")
print(merge_dataframes(df1, df2, on='key', how='left'))

print("\nOuter join:")
print(merge_dataframes(df1, df2, on='key', how='outer'))

### Problem 45: Concat and Join üü°

**Topic**: pd.concat() and DataFrame.join()

**Description**:
Concatenate DataFrames vertically and horizontally:
- Vertical concatenation (stacking rows)
- Horizontal concatenation (adding columns)
- Join on index
- Handle mismatched indices

**Example**:
```python
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

concat_vertical(df1, df2)
# Returns:
#    A  B
# 0  1  3
# 1  2  4
# 0  5  7
# 1  6  8

concat_horizontal(df1, df2)
# Returns:
#    A  B  A  B
# 0  1  3  5  7
# 1  2  4  6  8
```

In [None]:
# YOUR SOLUTION HERE
import pandas as pd

def concat_dataframes(df1, df2, axis=0, ignore_index=False):
    # Write your code here
    pass

# Test your solution
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

print("Vertical concatenation:")
print(concat_dataframes(df1, df2, axis=0))

print("\nHorizontal concatenation:")
print(concat_dataframes(df1, df2, axis=1))

### Problem 46: Merge with Indicators üü°

**Topic**: Advanced merge operations

**Description**:
Merge with indicator column to track source:
- Add _merge column
- Identify which rows came from left_only, right_only, or both
- Handle suffixes for overlapping columns
- Validate merge results

**Example**:
```python
df1 = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'name': ['Alice', 'Bob', 'Charlie', 'David']
})

df2 = pd.DataFrame({
    'id': [2, 3, 4, 5],
    'score': [85, 90, 78, 92]
})

merge_with_indicator(df1, df2, on='id')
# Returns DataFrame with _merge column showing:
# 'left_only', 'right_only', or 'both'
```

In [None]:
# YOUR SOLUTION HERE
import pandas as pd

def merge_with_indicator(df1, df2, on, how='outer'):
    # Write your code here
    pass

# Test your solution
df1 = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'name': ['Alice', 'Bob', 'Charlie', 'David']
})

df2 = pd.DataFrame({
    'id': [2, 3, 4, 5],
    'score': [85, 90, 78, 92]
})

result = merge_with_indicator(df1, df2, on='id')
print(result)

### Problem 47: Multi-Key Merge üî¥

**Topic**: Merging on multiple columns

**Description**:
Merge DataFrames on multiple keys:
- Merge on 2+ columns
- Handle composite keys
- Different key names in each DataFrame
- Validate merge integrity

**Example**:
```python
sales = pd.DataFrame({
    'year': [2023, 2023, 2024, 2024],
    'quarter': ['Q1', 'Q2', 'Q1', 'Q2'],
    'product': ['A', 'A', 'A', 'A'],
    'revenue': [100, 150, 120, 180]
})

targets = pd.DataFrame({
    'year': [2023, 2023, 2024, 2024],
    'quarter': ['Q1', 'Q2', 'Q1', 'Q2'],
    'target': [90, 140, 110, 170]
})

merge_multi_key(sales, targets, on=['year', 'quarter', 'product'])
# Merges on all three keys
```

In [None]:
# YOUR SOLUTION HERE
import pandas as pd

def merge_multi_key(df1, df2, on=None, how='inner', validate=None):
    # Write your code here
    pass

# Test your solution
sales = pd.DataFrame({
    'year': [2023, 2023, 2024, 2024],
    'quarter': ['Q1', 'Q2', 'Q1', 'Q2'],
    'product': ['A', 'A', 'A', 'A'],
    'revenue': [100, 150, 120, 180]
})

targets = pd.DataFrame({
    'year': [2023, 2023, 2024, 2024],
    'quarter': ['Q1', 'Q2', 'Q1', 'Q2'],
    'target': [90, 140, 110, 170]
})

result = merge_multi_key(sales, targets, on=['year', 'quarter'])
print(result)

## Section K: Advanced Transformations (3 Problems)

### Problem 48: String Operations üü°

**Topic**: String methods in pandas

**Description**:
Perform various string operations on DataFrame columns:
- Extract patterns (regex)
- Split strings into multiple columns
- Clean whitespace
- Change case
- Replace patterns

**Example**:
```python
df = pd.DataFrame({
    'email': ['alice@gmail.com', 'bob@yahoo.com', 'charlie@gmail.com'],
    'phone': ['123-456-7890', '234-567-8901', '345-678-9012']
})

string_operations(df, column='email', operation='extract_domain')
# Returns: ['gmail.com', 'yahoo.com', 'gmail.com']

string_operations(df, column='phone', operation='remove_hyphens')
# Returns: ['1234567890', '2345678901', '3456789012']
```

In [None]:
# YOUR SOLUTION HERE
import pandas as pd

def string_operations(df, column, operation, pattern=None):
    # Write your code here
    # Implement different string operations
    pass

# Test your solution
df = pd.DataFrame({
    'email': ['alice@gmail.com', 'bob@yahoo.com', 'charlie@gmail.com'],
    'phone': ['123-456-7890', '234-567-8901', '345-678-9012']
})

print("Extract domain from email:")
print(string_operations(df, column='email', operation='extract_domain'))

print("\nRemove hyphens from phone:")
print(string_operations(df, column='phone', operation='remove_hyphens'))

### Problem 49: DateTime Operations üî¥

**Topic**: Working with dates and times

**Description**:
Perform datetime operations:
- Parse strings to datetime
- Extract components (year, month, day, hour)
- Calculate date differences
- Resample time series
- Create date ranges

**Example**:
```python
df = pd.DataFrame({
    'date': ['2024-01-01', '2024-01-15', '2024-02-01'],
    'value': [100, 150, 200]
})

datetime_ops(df, column='date', operation='extract_month')
# Returns: [1, 1, 2]

datetime_ops(df, column='date', operation='days_since_first')
# Returns: [0, 14, 31]
```

In [None]:
# YOUR SOLUTION HERE
import pandas as pd

def datetime_ops(df, column, operation):
    # Write your code here
    # Convert to datetime and perform operations
    pass

# Test your solution
df = pd.DataFrame({
    'date': ['2024-01-01', '2024-01-15', '2024-02-01'],
    'value': [100, 150, 200]
})

print("Extract month:")
print(datetime_ops(df, column='date', operation='extract_month'))

print("\nDays since first date:")
print(datetime_ops(df, column='date', operation='days_since_first'))

### Problem 50: ML Feature Engineering Pipeline üî¥

**Topic**: Complete data preprocessing pipeline

**Description**:
Create a comprehensive feature engineering pipeline:
1. Handle missing values
2. Encode categorical variables
3. Scale numerical features
4. Create interaction features
5. Remove outliers
6. Split into train/test sets

**Example**:
```python
df = pd.DataFrame({
    'age': [25, 30, None, 40, 100],  # Has missing and outlier
    'income': [50000, 60000, 70000, 80000, 90000],
    'category': ['A', 'B', 'A', 'C', 'B'],  # Categorical
    'target': [0, 1, 0, 1, 0]
})

pipeline = FeatureEngineeringPipeline()
X_train, X_test, y_train, y_test = pipeline.fit_transform(
    df, 
    target_col='target',
    test_size=0.2
)

# Returns:
# - Cleaned data
# - Encoded categories
# - Scaled features
# - Train/test split
```

In [None]:
# YOUR SOLUTION HERE
import pandas as pd
import numpy as np

class FeatureEngineeringPipeline:
    def __init__(self):
        self.encoders = {}
        self.scalers = {}
        pass
    
    def fit_transform(self, df, target_col, test_size=0.2):
        # Write your complete pipeline here
        # 1. Handle missing values
        # 2. Encode categorical variables
        # 3. Scale numerical features
        # 4. Create interaction features
        # 5. Remove outliers
        # 6. Split into train/test
        pass

# Test your solution
df = pd.DataFrame({
    'age': [25, 30, 35, 40, 45, 50, 55, 60, 65, 70],
    'income': [50000, 60000, 70000, 80000, 90000, 55000, 65000, 75000, 85000, 95000],
    'category': ['A', 'B', 'A', 'C', 'B', 'A', 'C', 'B', 'A', 'C'],
    'target': [0, 1, 0, 1, 0, 1, 0, 1, 0, 1]
})

pipeline = FeatureEngineeringPipeline()
X_train, X_test, y_train, y_test = pipeline.fit_transform(df, target_col='target', test_size=0.2)

print(f"X_train shape: {X_train.shape}")
print(f"X_test shape: {X_test.shape}")
print(f"y_train shape: {y_train.shape}")
print(f"y_test shape: {y_test.shape}")

---

# üéâ Congratulations! You've Completed All 50 Homework Assignments!

## Summary:

### Part 1: Python Data Structures (15 problems)
‚úÖ **Tuples** (5 problems) - üü¢üü¢üü°üü°üî¥  
‚úÖ **Sets** (5 problems) - üü¢üü°üü°üü°üî¥  
‚úÖ **Dictionaries** (5 problems) - üü¢üü°üü°üî¥üî¥  

### Part 2: Python Functions & Functional Programming (10 problems)
‚úÖ **Functions & Decorators** (5 problems) - üü°üî¥üü°üî¥üî¥  
‚úÖ **Functional Programming** (5 problems) - üü¢üü°üü°üî¥üî¥  

### Part 3: Pandas Basics & Data Manipulation (15 problems)
‚úÖ **Series & DataFrames** (5 problems) - üü¢üü¢üü°üü°üî¥  
‚úÖ **Data Selection & Filtering** (5 problems) - üü¢üü°üü°üü°üî¥  
‚úÖ **GroupBy & Aggregation** (5 problems) - üü¢üü°üü°üü°üî¥  

### Part 4: Pandas Advanced Operations (10 problems)
‚úÖ **Missing Data** (3 problems) - üü¢üü°üü°  
‚úÖ **Merge & Join** (4 problems) - üü¢üü°üü°üî¥  
‚úÖ **Advanced Transformations** (3 problems) - üü°üî¥üî¥  

---

## üìä Difficulty Breakdown:
- üü¢ **Easy**: 10 problems (20%)
- üü° **Medium**: 25 problems (50%)
- üî¥ **Hard**: 15 problems (30%)

---

## üéØ Skills Acquired:

### Python Fundamentals:
- Advanced data structures (tuples, sets, dictionaries)
- Functions, decorators, and closures
- Functional programming (map, filter, reduce, lambda)
- List/dict/set comprehensions
- Higher-order functions and composition

### Pandas Mastery:
- Series and DataFrame creation and manipulation
- Data selection, filtering, and indexing
- GroupBy operations and aggregations
- Pivot tables and cross-tabulation
- Missing data handling strategies
- Merging, joining, and concatenating datasets
- String and datetime operations
- Feature engineering pipelines

### Machine Learning Applications:
- Data preprocessing and cleaning
- Feature engineering and encoding
- Train/test splitting
- Outlier detection and removal
- Data normalization and scaling

---

## üöÄ Next Steps:

1. **Practice More**: Solve these problems again without looking at solutions
2. **Real Datasets**: Apply concepts to Kaggle datasets
3. **Projects**: Build end-to-end data science projects
4. **Advanced Topics**: 
   - scikit-learn for ML models
   - matplotlib/seaborn for visualization
   - SQL integration with pandas
   - Big data with Dask/PySpark

---

## üìö Recommended Practice:

- **Kaggle**: Participate in competitions
- **LeetCode**: SQL and Python challenges
- **HackerRank**: Data structure challenges
- **Real World**: Personal data analysis projects

---

**Keep coding and happy learning! üéìüíª**