# Demonstrating Data Loading Functions

This notebook demonstrates the use of the main data loading functions from `generatedata/load_data.py`:
- `load_data`
- `load_data_as_xy`
- `load_data_as_xy_onehot`

We show how to load datasets, inspect their structure, and prepare them for downstream ML workflows.

In [1]:
# Import required libraries
import pandas as pd
from generatedata.load_data import load_data, load_data_as_xy, load_data_as_xy_onehot

## Load a Dataset (Dictionary Format)

The `load_data` function returns a dictionary with keys: `info`, `start`, and `target`.

In [2]:
# Example: Load the MNIST dataset (adjust name as needed)
data = load_data('MNIST', local=True)
print('Keys:', data.keys())
print('Info:', data['info'])
print('Start shape:', data['start'].shape)
print('Target shape:', data['target'].shape)

Keys: dict_keys(['info', 'start', 'target'])
Info: {'num_points': 1000, 'size': 794, 'x_y_index': 784, 'x_size': 784, 'y_size': 10, 'onehot_y': 1}
Start shape: (1000, 794)
Target shape: (1000, 794)


## Load Data as (X, Y) for ML

The `load_data_as_xy` function splits the target data into features (X) and labels (Y) using metadata in `info.json`.

In [3]:
# Load as (X, Y) for regression/classification
X, Y = load_data_as_xy('MNIST', local=True)
print('X shape:', X.shape)
print('Y shape:', Y.shape)
X.head(), Y.head()

X shape: (1000, 784)
Y shape: (1000, 10)


(    x0   x1   x2   x3   x4   x5   x6   x7   x8   x9  ...  x774  x775  x776  \
 0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0  ...  -1.0  -1.0  -1.0   
 1 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0  ...  -1.0  -1.0  -1.0   
 2 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0  ...  -1.0  -1.0  -1.0   
 3 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0  ...  -1.0  -1.0  -1.0   
 4 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0  ...  -1.0  -1.0  -1.0   
 
    x777  x778  x779  x780  x781  x782  x783  
 0  -1.0  -1.0  -1.0  -1.0  -1.0  -1.0  -1.0  
 1  -1.0  -1.0  -1.0  -1.0  -1.0  -1.0  -1.0  
 2  -1.0  -1.0  -1.0  -1.0  -1.0  -1.0  -1.0  
 3  -1.0  -1.0  -1.0  -1.0  -1.0  -1.0  -1.0  
 4  -1.0  -1.0  -1.0  -1.0  -1.0  -1.0  -1.0  
 
 [5 rows x 784 columns],
    x784  x785  x786  x787  x788  x789  x790  x791  x792  x793
 0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   1.0   0.0   0.0
 1   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   1.0   0.0
 2   1.0   0.0   0.0   0.0

## Load Data as (X, Y) with One-Hot Labels

The `load_data_as_xy_onehot` function returns features and one-hot encoded labels (if available).

In [4]:
# Load as (X, Y) with one-hot labels (if supported by dataset)
try:
    X_oh, Y_oh = load_data_as_xy_onehot('MNIST', local=True)
    print('X shape:', X_oh.shape)
    print('Y (one-hot) shape:', Y_oh.shape)
    X_oh.head(), Y_oh.head()
except Exception as e:
    print('One-hot loading not supported for this dataset:', e)

X shape: (1000, 784)
Y (one-hot) shape: (1000, 10)
