# Check Adult datasets
This notebook loads `AdultDataset` and `AdultCensusDataset` from the repository and runs quick sanity checks: shapes, dtypes, NaN checks and a small sample preview. Run this inside the project (preferably via `uv run`).

In [2]:
# Cell 1: Imports and path setup
import sys
from pathlib import Path
# ensure repo root is on path
repo_root = Path.cwd().resolve().parents[1]  # notebooks/.. -> repo root
sys.path.insert(0, str(repo_root))

import numpy as np
from counterfactuals.datasets.adult import AdultDataset
from counterfactuals.datasets.adult_census import AdultCensusDataset

In [3]:
# Cell 2: Load datasets
print('Loading AdultDataset...')
adult = AdultDataset()
print('Loading AdultCensusDataset...')
adult_census = AdultCensusDataset()
print('Loaded datasets')

Loading AdultDataset...
Loading AdultCensusDataset...
Loaded datasets


In [4]:
# Cell 3: Quick checks for AdultDataset
def check_dataset(ds, name):
    print(f'--- {name} ---')
    print('X shape:', ds.X.shape)
    print('y shape:', ds.y.shape)
    print('X dtype:', ds.X.dtype)
    print('y dtype:', ds.y.dtype)
    print('Has NaNs in X?', np.isnan(ds.X).any())
    print('Has NaNs in y?', np.isnan(ds.y).any())
    print('Sample X (first 3 rows):')
    print(ds.X[:3])
    print()

check_dataset(adult, 'AdultDataset')
check_dataset(adult_census, 'AdultCensusDataset')

--- AdultDataset ---
X shape: (32561, 8)
y shape: (32561,)
X dtype: float32
y dtype: int64
Has NaNs in X? False
Has NaNs in y? False
Sample X (first 3 rows):
[[39. 40.  0.  1.  3.  5.  1.  1.]
 [50. 13.  3.  1.  1.  5.  1.  1.]
 [38. 40.  2.  3.  0.  0.  1.  1.]]

--- AdultCensusDataset ---
X shape: (32000, 12)
y shape: (32000,)
X dtype: float32
y dtype: int64
Has NaNs in X? False
Has NaNs in y? False
Sample X (first 3 rows):
[[  39. 2174.    0.   40.   15.   25.   11.   16.    7.    9.    3.   68.]
 [  50.    0.    0.   13.   14.   25.    9.   19.    6.    9.    3.   68.]
 [  38.    0.    0.   40.   12.   27.    7.   21.    7.    9.    3.   68.]]



## How to run
Prefer running this notebook with the project's environment. Example (from repo root):
```
uv run jupyter nbconvert --to notebook --execute notebooks/check_adult_datasets.ipynb --output -
```
Or open the notebook with Jupyter in the environment created by `uv` and run cells interactively.