# Data Tutorial

First, we set up a logger for this notebook.
Note that the `PertData` class also makes use of Python's `logging` module.

In [None]:
import logging
import sys

# Configure the root logger
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
    handlers=[logging.StreamHandler(sys.stdout)],
)

# Create a logger for this notebook
log = logging.getLogger(__name__)

Next, we create `PertData` object.
We specify that we want to load the `"norman"` dataset.

In [None]:
from causal_hts_modeling.pertdata import PertData
from causal_hts_modeling.utils import get_git_root

pert_data = PertData(data_dir=f"{get_git_root()}/data", dataset_name="norman")

Then we can access the the **gene expression matrix** and the **perturbations vector**.

In [None]:
log.info(f"Gene expression matrix (X) shape: {pert_data.X.shape}")
log.info(f"Perturbations vector (y['original']) shape: {pert_data.y['original'].shape}")
log.info(f"Found {len(pert_data.y['original'].unique())} different perturbations")
log.info(f"Perturbations vector (y['fixed']) shape: {pert_data.y['fixed'].shape}")
log.info(f"Found {len(pert_data.y['fixed'].unique())} different perturbations")
log.info(f"Perturbations vector (y['binary']) shape: {pert_data.y['binary'].shape}")
log.info(f"Found {len(pert_data.y['binary'].unique())} different perturbations")

We can also access different variants of the **perturbations vector**.

In [None]:
n = 5
log.info(f"First {n} elements of y['original']:")
print(f"{pert_data.y['original'][:n]}")
log.info(f"First {n} elements of y['fixed']:")
print(f"{pert_data.y['fixed'][:n]}")
log.info(f"First {n} elements of y['binary']:")
print(f"{pert_data.y['binary'][:n]}")