# Synthetic Experiment Reproduction
This notebook guide you through the reproduction of the synthetic hidden subgroup discovery experiment.

## 1. Setup Environment

In [None]:
# Clone repo (if on Colab)
!git clone https://github.com/alceubissoto/hidden-subgroup-perf.git
%cd hidden-subgroup-perf/experiments/synthetic

In [None]:
# 1. Force NumPy 1.x and Pandas 1.x for binary compatibility in Colab
# This MUST be done first with --force-reinstall
%pip install "numpy<2.0.0" "pandas<2.0.0" --force-reinstall --quiet

# 2. Upgrade build tools
%pip install --upgrade pip setuptools wheel --quiet

# 3. Install pre-requisites
%pip install cython ujson --quiet

# 4. Install repo core dependencies
%pip install -r requirements.txt --quiet

# 5. Install CLIP (required for feature extraction)
%pip install git+https://github.com/openai/CLIP.git --quiet

# 6. Install Meerkat and Domino directly from GitHub
%pip install git+https://github.com/hazyresearch/meerkat.git --quiet
%pip install "domino[clip] @ git+https://github.com/hazyresearch/domino.git" --quiet

**IMPORTANT:** You **MUST** restart the runtime (Runtime > Restart session) after the installation finishes. This is critical to resolve the NumPy/Pandas binary incompatibility error.

## 2. Run Reproduction Pipeline

We use the `configs/colab_config.yaml` for standardized runs on Google Colab GPUs. If you are running this locally on an M1 Mac, you can use `configs/m1_config.yaml` instead.

In [None]:
# Set the config file to use
CONFIG = "configs/colab_config.yaml"

In [None]:
!python scripts/1_setup_data.py
!python scripts/2_generate_synthetic.py --config {CONFIG}

In [None]:
!python scripts/3_train_model.py --config {CONFIG}

In [None]:
!python scripts/4_extract_features.py --config {CONFIG}

In [None]:
!python scripts/5_run_analysis.py --config {CONFIG}

## 3. Results Visualization

In [None]:
import pandas as pd
results = pd.read_csv('../../results/synthetic_analysis.csv')
print("Discovered Slices:")
display(results)