# Data Splitting with ShuffleSplit

In this notebook, we will demonstrate how to use the `ShuffleSplit` class from `scikit-learn` to create multiple train-test splits for a small dataset.

### Objectives:
1. **Define a Dataset**: Create a simple feature matrix and target vector for demonstration purposes.
2. **Configure ShuffleSplit**: Initialize `ShuffleSplit` with specific parameters to generate multiple train-test splits.
3. **Generate and Display Splits**: Iterate through each split and display the indices of the training and testing datasets.

### Steps:
1. **Import Libraries**: Import `numpy` for data manipulation and `ShuffleSplit` for splitting the dataset.
2. **Initialize Dataset**: Define a small dataset with features and target values.
3. **Set Up ShuffleSplit**: Configure `ShuffleSplit` with the number of splits, test size, and random seed.
4. **Iterate and Print Splits**: Generate train-test splits and print the indices for each split to visualize the data distribution.

In [8]:
import numpy as np
from sklearn.model_selection import ShuffleSplit


In [9]:
# Define a small dataset with features and target values
X = np.array([[1,2], [3,4], [5,6], [6,7]])
y = np.array([1,2,1,2])

In [10]:
# Initialize ShuffleSplit with 5 splits, a test size of 25%, and a random seed for reproducibility
split_data = ShuffleSplit(n_splits=5, test_size=0.25, random_state=111)

# Get the number of splits configured
split_data.get_n_splits(X)

# Iterate through each split
for train_idx, test_idx in split_data.split(X):
    # Print the indices for the training and testing datasets
    print(f'Training dataset:: {train_idx},   Testing dataset:: {test_idx}')

Training dataset:: [2 3 0],   Testing dataset:: [1]
Training dataset:: [1 2 0],   Testing dataset:: [3]
Training dataset:: [0 2 1],   Testing dataset:: [3]
Training dataset:: [1 0 2],   Testing dataset:: [3]
Training dataset:: [2 0 1],   Testing dataset:: [3]
