# CsvDatasetLoader Usage Example
This notebook demonstrates how to use the `CsvDatasetLoader` class to load and manipulate a dataset from a CSV file.
We will perform the following operations:
1. Load a dataset
2. Access features and target columns
3. Filter negative instances
4. Get a random positive instance
5. Programmatically set the dataset

In [1]:
# Import necessary libraries
import pandas as pd
from rocelib.datasets.custom_datasets.CsvDatasetLoader import CsvDatasetLoader


### Step 1: Initialize the CsvDatasetLoader with the path to the CSV and the target column name

In [2]:
# Let's assume the contents of 'test.csv' look like this:
# ----------------------------------
# feature1,feature2,target
# 1.0, 2.0, 0
# 3.0, 4.0, 1
# 5.0, 6.0, 0
# 7.0, 8.0, 1
# ----------------------------------

csv_path = "test.csv"  # Path to the CSV file
target_column = "target"  # Name of the target column

# Create an instance of CsvDatasetLoader
loader = CsvDatasetLoader(csv=csv_path, target_column=target_column)

### Step 2: Access the loaded dataset (whole dataset)

In [3]:
# This will print the full dataset including feature columns and target column
print("Full Dataset:")
print(loader.data)

Full Dataset:
   feature1  feature2  target
0       1.0       2.0       0
1       3.0       4.0       1
2       5.0       6.0       0
3       7.0       8.0       1


### Step 3: Access the feature columns (X)

In [4]:
# This returns only the feature columns (without the target column)
print("\nFeature Columns (X):")
print(loader.X)


Feature Columns (X):
   feature1  feature2
0       1.0       2.0
1       3.0       4.0
2       5.0       6.0
3       7.0       8.0


### Step 4: Access the target column (y)

In [5]:
# This returns only the target column
print("\nTarget Column (y):")
print(loader.y)


Target Column (y):
   target
0       0
1       1
2       0
3       1


### Step 5: Filter negative instances (target value == 0)

In [6]:
# Assuming '0' is the negative value in the dataset for the target column
negative_instances = loader.get_negative_instances(neg_value=0)
print("\nNegative Instances (Target == 0):")
print(negative_instances)


Negative Instances (Target == 0):
   feature1  feature2
0       1.0       2.0
2       5.0       6.0


### Step 6: Retrieve a random positive instance (target value != 0)

In [7]:
# Assuming any value other than '0' is a positive instance
positive_instance = loader.get_random_positive_instance(neg_value=0)
print("\nRandom Positive Instance (Target != 0):")
print(positive_instance)


Random Positive Instance (Target != 0):
   feature1  feature2
3       7.0       8.0


### Step 7: Demonstrating setting the dataset programmatically (optional)

In [8]:
# For demonstration purposes, we'll set the data manually and show that it works as expected
new_data = pd.DataFrame({
    'feature1': [9.0, 10.0],
    'feature2': [11.0, 12.0],
    'target': [1, 0]
})

loader.data = new_data  # Setting the new dataset

# Display the newly set dataset
print("\nNewly Set Dataset:")
print(loader.data)


Newly Set Dataset:
   feature1  feature2  target
0       9.0      11.0       1
1      10.0      12.0       0


### Demonstrating functionality with the new dataset

In [9]:
# Now, demonstrate the functionality again with the new dataset
print("\nNew Feature Columns (X):")
print(loader.X)

print("\nNew Target Column (y):")
print(loader.y)

# Negative instances with the new data
print("\nNegative Instances (Target == 0) from the new data:")
print(loader.get_negative_instances(neg_value=0))

# Random positive instance with the new data
print("\nRandom Positive Instance (Target != 0) from the new data:")
print(loader.get_random_positive_instance(neg_value=0))


New Feature Columns (X):
   feature1  feature2
0       9.0      11.0
1      10.0      12.0

New Target Column (y):
   target
0       1
1       0

Negative Instances (Target == 0) from the new data:
   feature1  feature2
1      10.0      12.0

Random Positive Instance (Target != 0) from the new data:
   feature1  feature2
0       9.0      11.0
