# Sklearn Statistics - Part 3: Encoding and Splitting

This notebook covers categorical encoding and train/test splitting.

**Topics covered:**
- Label Encoding
- One-Hot Encoding
- Ordinal Encoding
- Train/Test Split
- Stratified Splitting

**Problems:** 16 (Easy: 1-5, Medium: 6-11, Hard: 12-16)

In [None]:
# ============================================
# SETUP - Run this cell first!
# ============================================
import sys
sys.path.insert(0, '..')
from utils.checker import check

print("Checker loaded! Now import the libraries you need.")

---
## Problem 0: Import Required Libraries
**Difficulty:** Easy

### Concept
Before encoding and splitting data, you need to import the necessary libraries.

### Syntax
```python
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder, OneHotEncoder, OrdinalEncoder
from sklearn.model_selection import train_test_split, KFold
```

### Task
Import the required libraries for this notebook.

### Expected Properties
- All modules and classes should be importable

In [None]:
# Your solution:


In [None]:
# Verification
check.is_true('np' in dir(), "P0a: NumPy imported", "Import numpy as np")
check.is_true('pd' in dir(), "P0b: Pandas imported", "Import pandas as pd")
check.is_true('LabelEncoder' in dir(), "P0c: LabelEncoder imported", "Import LabelEncoder from sklearn.preprocessing")
check.is_true('OneHotEncoder' in dir(), "P0d: OneHotEncoder imported", "Import OneHotEncoder from sklearn.preprocessing")
check.is_true('train_test_split' in dir(), "P0e: train_test_split imported", "Import train_test_split from sklearn.model_selection")

---
## Problem 1: Label Encoding Basics
**Difficulty:** Easy

### Concept
Label Encoding converts categorical labels into integers. Each unique category gets a number (0, 1, 2, ...). This is useful for ordinal data or target variables in classification, but be careful with nominal data as it implies ordering.

### Syntax
```python
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
y_encoded = le.fit_transform(categories)
```

### Example
```python
>>> categories = ['cat', 'dog', 'bird', 'cat']
>>> le = LabelEncoder()
>>> le.fit_transform(categories)
array([1, 2, 0, 1])  # bird=0, cat=1, dog=2
```

### Task
Apply Label Encoding to `['red', 'green', 'blue', 'red', 'green']`. Store in `y_encoded`.

### Expected Properties
- `y_encoded` should be a numpy array
- Should have length 5
- Should contain exactly 3 unique values (one for each category)

In [None]:
# Your solution:
categories = ['red', 'green', 'blue', 'red', 'green']
y_encoded = None

In [None]:
# Verification
check.is_not_none(y_encoded, "P1: Not None")
check.is_type(y_encoded, np.ndarray, "P1: Type check")
check.has_length(y_encoded, 5, "P1: Correct length")
check.is_true(len(np.unique(y_encoded)) == 3, "P1: Three unique values", "Should have 3 unique encoded values")

---
## Problem 2: Get Label Encoder Classes
**Difficulty:** Easy

### Concept
After fitting a LabelEncoder, the `classes_` attribute contains the unique categories in sorted order. This shows the mapping between categories and their numeric codes.

### Syntax
```python
le = LabelEncoder()
le.fit(categories)
classes = le.classes_  # Array of unique categories in sorted order
```

### Example
```python
>>> categories = ['dog', 'cat', 'bird', 'cat']
>>> le = LabelEncoder()
>>> le.fit(categories)
>>> le.classes_
array(['bird', 'cat', 'dog'])  # Alphabetically sorted
```

### Task
Fit a LabelEncoder and get the classes. Store in `classes`.

### Expected Properties
- `classes` should be an array
- Should contain ['apple', 'banana', 'cherry'] in that order (alphabetical)

In [None]:
# Your solution:
categories = ['apple', 'banana', 'cherry', 'apple', 'banana']
le = LabelEncoder()
le.fit(categories)
classes = None

In [None]:
# Verification
check.is_not_none(classes, "P2: Not None")
check.is_type(classes, np.ndarray, "P2: Type check")
check.has_length(classes, 3, "P2: Correct length")
check.contains(list(classes), 'apple', "P2a: Contains apple")
check.contains(list(classes), 'banana', "P2b: Contains banana")
check.contains(list(classes), 'cherry', "P2c: Contains cherry")

---
## Problem 3: Basic Train/Test Split
**Difficulty:** Easy

### Concept
Splitting data into training and test sets is fundamental in machine learning. The training set is used to train the model, while the test set evaluates performance on unseen data. A common split is 80% train, 20% test.

### Syntax
```python
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, 
    test_size=0.2,  # 20% for test
    random_state=42  # For reproducibility
)
```

### Example
```python
>>> X = [[1], [2], [3], [4], [5]]
>>> y = [0, 0, 1, 1, 1]
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
>>> len(X_train), len(X_test)
(4, 1)  # 80% train, 20% test
```

### Task
Split the data into 80% train and 20% test using `test_size=0.2` and `random_state=42`. Store in `X_train`, `X_test`, `y_train`, `y_test`.

### Expected Properties
- All should be numpy arrays
- X_train should have 40 samples (80% of 50)
- X_test should have 10 samples (20% of 50)

In [None]:
# Your solution:
X = np.arange(100).reshape(50, 2)
y = np.arange(50)

X_train, X_test, y_train, y_test = None, None, None, None

In [None]:
# Verification
check.is_not_none(X_train, "P3a: X_train not None")
check.is_not_none(X_test, "P3b: X_test not None")
check.has_length(X_train, 40, "P3c: Train size")
check.has_length(X_test, 10, "P3d: Test size")

---
## Problem 4: Train/Test Split with Random State
**Difficulty:** Easy

### Concept
The `random_state` parameter ensures reproducible splits. Using the same random_state will always produce the same train/test split, which is crucial for debugging and comparing models.

### Syntax
```python
X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.2,
    random_state=42  # Ensures reproducibility
)
```

### Example
```python
>>> # Same random_state = same split every time
>>> split1 = train_test_split(X, y, random_state=42)
>>> split2 = train_test_split(X, y, random_state=42)
>>> # split1 and split2 are identical
```

### Task
Split data with `test_size=0.2` and `random_state=42`. Store in `X_train`, `X_test`, `y_train`, `y_test`.

### Expected Properties
- All should be numpy arrays
- X_train should have 8 samples
- X_test should have 2 samples

In [None]:
# Your solution:
X = np.arange(20).reshape(10, 2)
y = np.arange(10)

X_train, X_test, y_train, y_test = None, None, None, None

In [None]:
# Verification
check.is_not_none(X_train, "P4a: X_train not None")
check.is_not_none(X_test, "P4b: X_test not None")
check.has_length(X_train, 8, "P4c: Train size")
check.has_length(X_test, 2, "P4d: Test size")

---
## Problem 5: OneHotEncoder Basics
**Difficulty:** Easy

### Concept
One-Hot Encoding creates binary columns for each category. Each sample has 1 in the column for its category and 0 elsewhere. This avoids the ordering problem of Label Encoding for nominal data.

### Syntax
```python
from sklearn.preprocessing import OneHotEncoder

ohe = OneHotEncoder(sparse_output=False)  # Return dense array
X_encoded = ohe.fit_transform(X)
```

### Example
```python
>>> X = [['red'], ['green'], ['blue']]
>>> ohe = OneHotEncoder(sparse_output=False)
>>> ohe.fit_transform(X)
array([[0, 0, 1],  # red
       [0, 1, 0],  # green
       [1, 0, 0]]) # blue
```

### Task
Apply One-Hot Encoding to the categorical data. Use `sparse_output=False`. Store in `X_encoded`.

### Expected Properties
- `X_encoded` should be a numpy array
- Should have 3 columns (one per category)
- Each row should sum to 1 (only one category is active)

In [None]:
# Your solution:
X = np.array([['red'], ['green'], ['blue'], ['red']])
X_encoded = None

In [None]:
# Verification
check.is_not_none(X_encoded, "P5: Not None")
check.is_type(X_encoded, np.ndarray, "P5: Type check")
check.has_shape(X_encoded, (4, 3), "P5a: Correct shape")
_row_sums = X_encoded.sum(axis=1)
check.is_true(np.allclose(_row_sums, 1.0), "P5b: Row sums", "Each row should sum to 1")

---
## Problem 6: Inverse Transform LabelEncoder
**Difficulty:** Medium

### Concept
`inverse_transform()` converts encoded integers back to original category labels. This is useful for interpreting model predictions or displaying results.

### Syntax
```python
le = LabelEncoder()
y_encoded = le.fit_transform(categories)
y_decoded = le.inverse_transform(y_encoded)
```

### Example
```python
>>> categories = ['cat', 'dog', 'bird']
>>> le = LabelEncoder()
>>> encoded = le.fit_transform(categories)  # [1, 2, 0]
>>> le.inverse_transform(encoded)
array(['cat', 'dog', 'bird'])
```

### Task
Encode the categories with LabelEncoder, then use `inverse_transform()` to decode them back. Store in `y_decoded`.

### Expected Properties
- `y_decoded` should be an array
- Should match the original categories exactly

In [None]:
# Your solution:
categories = ['cat', 'dog', 'bird', 'cat', 'dog']
le = LabelEncoder()
y_encoded = le.fit_transform(categories)

y_decoded = None

In [None]:
# Verification
check.is_not_none(y_decoded, "P6: Not None")
check.is_type(y_decoded, np.ndarray, "P6: Type check")
check.has_length(y_decoded, 5, "P6: Correct length")
check.is_true(list(y_decoded) == categories, "P6: Matches original", "Should match original categories")

---
## Problem 7: OneHotEncoder Categories
**Difficulty:** Medium

### Concept
The `categories_` attribute shows what categories the encoder learned and their order. This is useful for understanding the column ordering in the encoded output.

### Syntax
```python
ohe = OneHotEncoder(sparse_output=False)
ohe.fit(X)
categories = ohe.categories_  # List of arrays, one per feature
```

### Example
```python
>>> X = [['S'], ['M'], ['L'], ['M']]
>>> ohe = OneHotEncoder(sparse_output=False)
>>> ohe.fit(X)
>>> ohe.categories_
[array(['L', 'M', 'S'], dtype=object)]  # Sorted alphabetically
```

### Task
Fit OneHotEncoder and get the learned categories. Store in `categories`.

### Expected Properties
- `categories` should be a list of arrays
- First (and only) element should contain ['large', 'medium', 'small'] in that order

In [None]:
# Your solution:
X = np.array([['small'], ['medium'], ['large'], ['medium']])
ohe = OneHotEncoder(sparse_output=False)
ohe.fit(X)

categories = None

In [None]:
# Verification
check.is_not_none(categories, "P7: Not None")
check.is_type(categories, list, "P7: Type check")
check.has_length(categories, 1, "P7a: One feature")
check.has_length(categories[0], 3, "P7b: Three categories")
check.contains(list(categories[0]), 'small', "P7c: Contains small")

---
## Problem 8: Stratified Split
**Difficulty:** Medium

### Concept
Stratified splitting maintains the class distribution in both train and test sets. This is crucial for imbalanced datasets to ensure both sets are representative.

### Syntax
```python
X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.2,
    stratify=y,  # Maintain class distribution
    random_state=42
)
```

### Example
```python
>>> # y has 80% class 0, 20% class 1
>>> X_train, X_test, y_train, y_test = train_test_split(
...     X, y, test_size=0.2, stratify=y)
>>> # Both train and test will have ~80% class 0, ~20% class 1
```

### Task
Perform stratified split with `test_size=0.2`, `random_state=42`, and `stratify=y`. Store in `X_train`, `X_test`, `y_train`, `y_test`.

### Expected Properties
- All should be numpy arrays
- Test set should maintain the 80/20 class distribution
- Ratio of class 1 in test should be approximately 0.2

In [None]:
# Your solution:
X = np.arange(100).reshape(50, 2)
# Imbalanced: 40 zeros, 10 ones
y = np.array([0]*40 + [1]*10)

X_train, X_test, y_train, y_test = None, None, None, None

In [None]:
# Verification
check.is_not_none(X_train, "P8a: X_train not None")
check.is_not_none(y_test, "P8b: y_test not None")
check.has_length(X_train, 40, "P8c: Train size")
check.has_length(X_test, 10, "P8d: Test size")
_test_ratio = y_test.sum() / len(y_test)
check.value_in_range(_test_ratio, 0.15, 0.25, "P8e: Maintains class distribution")

---
## Problem 9: OrdinalEncoder
**Difficulty:** Medium

### Concept
OrdinalEncoder is like LabelEncoder but for features (not targets) and allows specifying the category order. This is essential for ordinal categorical data where order matters (e.g., 'low' < 'medium' < 'high').

### Syntax
```python
from sklearn.preprocessing import OrdinalEncoder

enc = OrdinalEncoder(categories=[['low', 'medium', 'high']])
X_encoded = enc.fit_transform(X)
```

### Example
```python
>>> X = [['low'], ['high'], ['medium']]
>>> enc = OrdinalEncoder(categories=[['low', 'medium', 'high']])
>>> enc.fit_transform(X)
array([[0.],  # low
       [2.],  # high
       [1.]]) # medium
```

### Task
Apply OrdinalEncoder with the specified order: low=0, medium=1, high=2. Store in `X_encoded`.

### Expected Properties
- `X_encoded` should be a numpy array
- First element should be 0 (low)
- Third element should be 2 (high)

In [None]:
# Your solution:
X = np.array([['low'], ['medium'], ['high'], ['low'], ['high']])
X_encoded = None

In [None]:
# Verification
check.is_not_none(X_encoded, "P9: Not None")
check.is_type(X_encoded, np.ndarray, "P9: Type check")
check.has_shape(X_encoded, (5, 1), "P9: Correct shape")
check.is_true(X_encoded[0, 0] == 0.0, "P9a: low = 0", "'low' should be encoded as 0")
check.is_true(X_encoded[2, 0] == 2.0, "P9b: high = 2", "'high' should be encoded as 2")

---
## Problem 10: OneHot with drop='first'
**Difficulty:** Medium

### Concept
Dropping the first category avoids multicollinearity (perfect correlation) in linear models. With k categories, you only need k-1 binary features since the last category is implied when all others are 0.

### Syntax
```python
ohe = OneHotEncoder(sparse_output=False, drop='first')
X_encoded = ohe.fit_transform(X)
```

### Example
```python
>>> X = [['A'], ['B'], ['C']]
>>> ohe = OneHotEncoder(sparse_output=False, drop='first')
>>> ohe.fit_transform(X)
array([[0, 0],  # A (dropped, all zeros)
       [1, 0],  # B
       [0, 1]]) # C
```

### Task
Apply One-Hot Encoding with `drop='first'`. Store in `X_encoded`.

### Expected Properties
- `X_encoded` should be a numpy array
- Should have 2 columns (3 categories - 1 dropped)

In [None]:
# Your solution:
X = np.array([['red'], ['green'], ['blue'], ['red']])
X_encoded = None

In [None]:
# Verification
check.is_not_none(X_encoded, "P10: Not None")
check.is_type(X_encoded, np.ndarray, "P10: Type check")
check.has_shape(X_encoded, (4, 2), "P10: Correct shape (k-1 columns)")

---
## Problem 11: Split with Shuffle Off
**Difficulty:** Medium

### Concept
By default, `train_test_split` shuffles data. For time series or sequential data, you want `shuffle=False` to preserve temporal order.

### Syntax
```python
X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.2,
    shuffle=False  # Preserve order
)
```

### Example
```python
>>> X = [[1], [2], [3], [4], [5]]
>>> y = [1, 2, 3, 4, 5]
>>> X_train, X_test, y_train, y_test = train_test_split(
...     X, y, test_size=0.2, shuffle=False)
>>> y_train  # [1, 2, 3, 4]
>>> y_test   # [5]
```

### Task
Split data with `test_size=0.2` and `shuffle=False`. Store in `X_train`, `X_test`, `y_train`, `y_test`.

### Expected Properties
- All should be numpy arrays
- Without shuffle, test set should contain the last samples
- Last element of y_test should be 4 (the last value)

In [None]:
# Your solution:
X = np.arange(10).reshape(5, 2)
y = np.arange(5)

X_train, X_test, y_train, y_test = None, None, None, None

In [None]:
# Verification
check.is_not_none(X_train, "P11a: X_train not None")
check.is_not_none(y_test, "P11b: y_test not None")
check.has_length(X_train, 4, "P11c: Train size")
check.has_length(y_test, 1, "P11d: Test size")
check.is_true(y_test[-1] == 4, "P11e: Last element", "Without shuffle, last test element should be 4")

---
## Problem 12: Encode Multiple Columns
**Difficulty:** Hard

### Concept
OneHotEncoder can handle multiple categorical features at once. Each feature is encoded independently, and the results are concatenated.

### Syntax
```python
df = pd.DataFrame({'color': ['red', 'blue'], 'size': ['S', 'L']})
ohe = OneHotEncoder(sparse_output=False)
X_encoded = ohe.fit_transform(df)
```

### Example
```python
>>> df = pd.DataFrame({'A': ['x', 'y'], 'B': ['1', '2']})
>>> ohe = OneHotEncoder(sparse_output=False)
>>> ohe.fit_transform(df)
# 2 columns for A + 2 columns for B = 4 columns
```

### Task
Apply OneHotEncoder to the DataFrame with multiple categorical columns. Store in `X_encoded`.

### Expected Properties
- `X_encoded` should be a numpy array
- Should have 6 columns (3 colors + 3 sizes)

In [None]:
# Your solution:
df = pd.DataFrame({
    'color': ['red', 'green', 'blue', 'red'],
    'size': ['S', 'M', 'L', 'S']
})

X_encoded = None

In [None]:
# Verification
check.is_not_none(X_encoded, "P12: Not None")
check.is_type(X_encoded, np.ndarray, "P12: Type check")
check.has_shape(X_encoded, (4, 6), "P12: Correct shape (3+3 columns)")

---
## Problem 13: Handle Unknown Categories
**Difficulty:** Hard

### Concept
When the test set contains categories not seen in training, OneHotEncoder will error by default. Using `handle_unknown='ignore'` creates all-zero rows for unknown categories.

### Syntax
```python
ohe = OneHotEncoder(
    sparse_output=False,
    handle_unknown='ignore'  # Don't error on unknown categories
)
ohe.fit(X_train)
X_test_encoded = ohe.transform(X_test)
```

### Example
```python
>>> X_train = [['cat'], ['dog']]
>>> X_test = [['cat'], ['bird']]  # 'bird' not in training
>>> ohe = OneHotEncoder(sparse_output=False, handle_unknown='ignore')
>>> ohe.fit(X_train)
>>> ohe.transform(X_test)
array([[1, 0],  # cat
       [0, 0]]) # bird (unknown -> all zeros)
```

### Task
Fit OneHotEncoder on training data with `handle_unknown='ignore'`, then transform test data that contains an unknown category. Store in `X_test_encoded`.

### Expected Properties
- `X_test_encoded` should be a numpy array
- Unknown category ('yellow') should be encoded as all zeros

In [None]:
# Your solution:
X_train = np.array([['red'], ['green'], ['blue']])
X_test = np.array([['red'], ['yellow']])  # yellow not in training

X_test_encoded = None

In [None]:
# Verification
check.is_not_none(X_test_encoded, "P13: Not None")
check.is_type(X_test_encoded, np.ndarray, "P13: Type check")
check.has_shape(X_test_encoded, (2, 3), "P13: Correct shape")
check.is_true(np.array_equal(X_test_encoded[1], np.zeros(3)), "P13: Unknown as zeros", "Unknown category should be all zeros")

---
## Problem 14: Combine Encoding with Split
**Difficulty:** Hard

### Concept
**CRITICAL**: When combining encoding with train/test split, you must:
1. Split the data first
2. Fit the encoder ONLY on training data
3. Transform both train and test using the fitted encoder

This prevents data leakage.

### Syntax
```python
# Step 1: Split
X_train, X_test, y_train, y_test = train_test_split(X, y)

# Step 2: Fit on train only
ohe = OneHotEncoder(sparse_output=False)
ohe.fit(X_train)

# Step 3: Transform both
X_train_enc = ohe.transform(X_train)
X_test_enc = ohe.transform(X_test)
```

### Example
```python
>>> X = [['A'], ['B'], ['C'], ['A']]
>>> y = [0, 1, 0, 1]
>>> X_train, X_test = train_test_split(X, y, test_size=0.25)
>>> ohe = OneHotEncoder(sparse_output=False)
>>> ohe.fit(X_train)  # Only on train!
>>> X_train_enc = ohe.transform(X_train)
>>> X_test_enc = ohe.transform(X_test)
```

### Task
First split the data, then fit OneHotEncoder on training data only, and transform both sets. Store in `X_train_enc` and `X_test_enc`.

### Expected Properties
- Both should be numpy arrays
- X_train_enc should have 8 rows
- X_test_enc should have 2 rows
- Both should have same number of columns

In [None]:
# Your solution:
X = np.array([['A'], ['B'], ['C'], ['A'], ['B'], ['C'], ['A'], ['B'], ['C'], ['A']])
y = np.arange(10)

# Step 1: Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 2 & 3: Fit encoder on train, transform both
X_train_enc = None
X_test_enc = None

In [None]:
# Verification
check.is_not_none(X_train_enc, "P14a: Train not None")
check.is_not_none(X_test_enc, "P14b: Test not None")
check.has_shape(X_train_enc, (8, 3), "P14c: Train shape")
check.has_shape(X_test_enc, (2, 3), "P14d: Test shape")

---
## Problem 15: Get Feature Names from OneHotEncoder
**Difficulty:** Hard

### Concept
After encoding, `get_feature_names_out()` returns descriptive names for each encoded column. This is crucial for interpretability and creating DataFrames with meaningful column names.

### Syntax
```python
ohe = OneHotEncoder(sparse_output=False)
ohe.fit(df)
feature_names = ohe.get_feature_names_out()
```

### Example
```python
>>> df = pd.DataFrame({'color': ['red', 'blue']})
>>> ohe = OneHotEncoder(sparse_output=False)
>>> ohe.fit(df)
>>> ohe.get_feature_names_out()
array(['color_blue', 'color_red'], dtype=object)
```

### Task
Fit OneHotEncoder and get the feature names using `get_feature_names_out()`. Store in `feature_names`.

### Expected Properties
- `feature_names` should be an array
- Should have length 3 (one per category)

In [None]:
# Your solution:
df = pd.DataFrame({
    'color': ['red', 'green', 'blue']
})

ohe = OneHotEncoder(sparse_output=False)
ohe.fit(df)

feature_names = None

In [None]:
# Verification
check.is_not_none(feature_names, "P15: Not None")
check.is_type(feature_names, np.ndarray, "P15: Type check")
check.has_length(feature_names, 3, "P15: Correct length")

---
## Problem 16: K-Fold Split Preview
**Difficulty:** Hard

### Concept
K-Fold cross-validation splits data into k equal folds. Each fold serves as test set once while the others are training. This provides k different train/test splits for robust model evaluation.

### Syntax
```python
from sklearn.model_selection import KFold

kf = KFold(n_splits=5, shuffle=True, random_state=42)
for train_idx, test_idx in kf.split(X):
    X_train, X_test = X[train_idx], X[test_idx]
    # Train and evaluate
```

### Example
```python
>>> X = np.array([[1], [2], [3], [4], [5]])
>>> kf = KFold(n_splits=5)
>>> for train_idx, test_idx in kf.split(X):
...     print(f"Test: {test_idx}")
Test: [0]
Test: [1]
Test: [2]
Test: [3]
Test: [4]
```

### Task
Create a KFold with 5 splits and count how many folds are created by iterating through `kf.split(X)`. Store the count in `n_folds`.

### Expected Properties
- `n_folds` should be an integer
- Should equal 5 (the n_splits parameter)

In [None]:
# Your solution:
X = np.arange(20).reshape(10, 2)
y = np.arange(10)

kf = KFold(n_splits=5, shuffle=True, random_state=42)

n_folds = 0
# Count the folds by iterating
for train_idx, test_idx in kf.split(X):
    n_folds += 1

In [None]:
# Verification
check.is_not_none(n_folds, "P16: Not None")
check.is_type(n_folds, int, "P16: Type check")
check.is_true(n_folds == 5, "P16: Correct count", "Should have 5 folds")

---
## Summary

Run this cell to see your overall progress on this notebook.

In [None]:
check.summary()