# Example Usage of ExampleDatasetLoader with IonosphereDatasetLoader

This notebook demonstrates how to use the ExampleDatasetLoader class and its implementation, specifically the IonosphereDatasetLoader. We will:
1. Load the dataset
2. Perform default preprocessing
3. Use custom preprocessing techniques such as imputing missing values, scaling, and encoding categorical data.


In [1]:
# Importing the required class
from rocelib.datasets.ExampleDatasets import get_example_dataset

# Instantiate the dataset loader, here we use get_example_dataset
# which has options:
#                - iris
#                - ionosphere
#                - adult
#                - titanic
# Alternatively instantiate DatasetLoader directly, e.g., IonosphereDatasetLoader()
ionosphere_loader = get_example_dataset("ionosphere")

# Load the dataset
ionosphere_loader.load_data()

# Display the first 5 rows of the dataset
ionosphere_loader.data.head()

Unnamed: 0,feature_0,feature_1,feature_2,feature_3,feature_4,feature_5,feature_6,feature_7,feature_8,feature_9,...,feature_25,feature_26,feature_27,feature_28,feature_29,feature_30,feature_31,feature_32,feature_33,target
0,1,0,0.99539,-0.05889,0.85243,0.02306,0.83398,-0.37708,1.0,0.0376,...,-0.51171,0.41078,-0.46168,0.21266,-0.3409,0.42267,-0.54487,0.18641,-0.453,g
1,1,0,1.0,-0.18829,0.93035,-0.36156,-0.10868,-0.93597,1.0,-0.04549,...,-0.26569,-0.20468,-0.18401,-0.1904,-0.11593,-0.16626,-0.06288,-0.13738,-0.02447,b
2,1,0,1.0,-0.03365,1.0,0.00485,1.0,-0.12062,0.88965,0.01198,...,-0.4022,0.58984,-0.22145,0.431,-0.17365,0.60436,-0.2418,0.56045,-0.38238,g
3,1,0,1.0,-0.45161,1.0,1.0,0.71216,-1.0,0.0,0.0,...,0.90695,0.51613,1.0,1.0,-0.20099,0.25682,1.0,-0.32382,1.0,b
4,1,0,1.0,-0.02401,0.9414,0.06531,0.92106,-0.23255,0.77152,-0.16399,...,-0.65158,0.1329,-0.53206,0.02431,-0.62197,-0.05707,-0.59573,-0.04608,-0.65697,g


## Preprocessing the Data Using the Default Method
The `default_preprocess` method applies the default pipeline which in this case includes standard scaling.

In [2]:
# Apply default preprocessing
ionosphere_loader.default_preprocess()

# Show the preprocessed data
ionosphere_loader.data.head()

Unnamed: 0,feature_0,feature_1,feature_2,feature_3,feature_4,feature_5,feature_6,feature_7,feature_8,feature_9,...,feature_25,feature_26,feature_27,feature_28,feature_29,feature_30,feature_31,feature_32,feature_33,target
0,0.348433,0.0,0.712372,-0.234257,0.484208,-0.201735,0.577059,-0.954679,0.964074,-0.29751,...,-0.867565,-0.253868,-0.713971,-0.28829,-0.617039,0.122937,-1.055054,-0.312221,-0.999595,1
1,0.348433,0.0,0.721648,-0.527811,0.634308,-1.037587,-1.339106,-2.029452,0.964074,-0.469482,...,-0.383054,-1.447849,-0.208419,-0.989185,-0.17353,-0.909063,-0.115213,-0.932605,-0.083286,0
2,0.348433,0.0,0.721648,-0.176998,0.768477,-0.241309,0.914531,-0.461494,0.746139,-0.350536,...,-0.651896,0.093506,-0.276586,0.091389,-0.28732,0.441318,-0.464092,0.404443,-0.848591,1
3,0.348433,0.0,0.721648,-1.125172,0.768477,1.92134,0.329433,-2.152585,-1.010873,-0.375331,...,1.92634,-0.04949,1.9473,1.080843,-0.341218,-0.167687,1.957315,-1.289826,2.107299,0
4,0.348433,0.0,0.721648,-0.155129,0.655594,-0.109918,0.754068,-0.676741,0.512838,-0.714742,...,-1.143025,-0.79295,-0.842112,-0.615818,-1.171144,-0.717726,-1.154227,-0.757673,-1.435736,1


## Custom Preprocessing
We can also customize the preprocessing by defining different imputation strategies, scaling methods, or encoding choices.

In [3]:
# Custom preprocessing
ionosphere_loader.preprocess(
    impute_strategy_numeric='mean',  # Impute missing numeric values with mean
    scale_method='minmax',           # Apply min-max scaling
    encode_categorical=False         # No categorical encoding needed (since no categorical features)
)

# Show preprocessed data after custom preprocessing
ionosphere_loader.data.head()

Unnamed: 0,feature_0,feature_1,feature_2,feature_3,feature_4,feature_5,feature_6,feature_7,feature_8,feature_9,...,feature_25,feature_26,feature_27,feature_28,feature_29,feature_30,feature_31,feature_32,feature_33,target
0,1.0,0.0,0.997695,0.470555,0.926215,0.51153,0.91699,0.31146,1.0,0.5188,...,0.244145,0.70539,0.26916,0.60633,0.32955,0.711335,0.227565,0.593205,0.2735,1.0
1,1.0,0.0,1.0,0.405855,0.965175,0.31922,0.44566,0.032015,1.0,0.477255,...,0.367155,0.39766,0.407995,0.4048,0.442035,0.41687,0.46856,0.43131,0.487765,0.0
2,1.0,0.0,1.0,0.483175,1.0,0.502425,1.0,0.43969,0.944825,0.50599,...,0.2989,0.79492,0.389275,0.7155,0.413175,0.80218,0.3791,0.780225,0.30881,1.0
3,1.0,0.0,1.0,0.274195,1.0,1.0,0.85608,0.0,0.5,0.5,...,0.953475,0.758065,1.0,1.0,0.399505,0.62841,1.0,0.33809,1.0,0.0
4,1.0,0.0,1.0,0.487995,0.9707,0.532655,0.96053,0.383725,0.88576,0.418005,...,0.17421,0.56645,0.23397,0.512155,0.189015,0.471465,0.202135,0.47696,0.171515,1.0


If you do not want to alter the state of the DatasetLoader after preprocessing, you can
get the preprocessed features directly too using the functions below:

In [4]:
ionosphere_loader = get_example_dataset("ionosphere")

default_preprocessed = ionosphere_loader.get_default_preprocessed_features()

print(default_preprocessed.head())

custom_preprocessed = ionosphere_loader.get_preprocessed_features(
    impute_strategy_numeric='mean',  # Impute missing numeric values with mean
    scale_method='minmax',           # Apply min-max scaling
    encode_categorical=False         # No categorical encoding needed (since no categorical features)
)

print(custom_preprocessed.head())

   feature_0  feature_1  feature_2  feature_3  feature_4  feature_5  \
0   0.348433        0.0   0.712372  -0.234257   0.484208  -0.201735   
1   0.348433        0.0   0.721648  -0.527811   0.634308  -1.037587   
2   0.348433        0.0   0.721648  -0.176998   0.768477  -0.241309   
3   0.348433        0.0   0.721648  -1.125172   0.768477   1.921340   
4   0.348433        0.0   0.721648  -0.155129   0.655594  -0.109918   

   feature_6  feature_7  feature_8  feature_9  ...  feature_24  feature_25  \
0   0.577059  -0.954679   0.964074  -0.297510  ...    0.297728   -0.867565   
1  -1.339106  -2.029452   0.964074  -0.469482  ...   -1.037790   -0.383054   
2   0.914531  -0.461494   0.746139  -0.350536  ...    0.310141   -0.651896   
3   0.329433  -2.152585  -1.010873  -0.375331  ...    1.045426    1.926340   
4   0.754068  -0.676741   0.512838  -0.714742  ...   -0.628910   -1.143025   

   feature_26  feature_27  feature_28  feature_29  feature_30  feature_31  \
0   -0.253868   -0.713971  