# Balance CLI tutorial

This tutorial walks through using the `balance` command-line interface (CLI) to adjust a sample dataset to a target. We will build a small synthetic dataset, run the CLI, and inspect the outputs.


## Prerequisites

Make sure `balance` is installed and the `balance` CLI is on your PATH. You can also run the CLI via `python -m balance.cli` from a checkout of the repository.


In [None]:
import os
import subprocess
import tempfile

import numpy as np
import pandas as pd


## Create a sample + target dataset

We'll create a CSV with two groups: respondents (sample) and non-respondents (target). The CLI expects a binary sample indicator column (`is_respondent` by default), an `id`, a `weight`, and covariates.


In [None]:
rng = np.random.default_rng(2021)
n_sample = 1000
n_target = 2000

sample_df = pd.DataFrame(
    {
        "age": rng.uniform(18, 80, n_sample),
        "gender": rng.choice([1, 2, 3, 4], n_sample),
        "id": range(n_sample),
        "weight": 1.0,
        "is_respondent": 1,
    }
)
target_df = pd.DataFrame(
    {
        "age": rng.uniform(18, 80, n_target),
        "gender": rng.choice([1, 2, 3, 4], n_target),
        "id": range(n_sample, n_sample + n_target),
        "weight": 1.0,
        "is_respondent": 0,
    }
)

input_df = pd.concat([sample_df, target_df], ignore_index=True)
input_df.head()


## Run the CLI

We'll write the input dataset to disk, then call the CLI to compute weights and diagnostics.


In [None]:
with tempfile.TemporaryDirectory() as tmpdir:
    input_path = os.path.join(tmpdir, "input.csv")
    output_path = os.path.join(tmpdir, "weights_out.csv")
    diagnostics_path = os.path.join(tmpdir, "diagnostics_out.csv")

    input_df.to_csv(input_path, index=False)

    cmd = [
        "python",
        "-m",
        "balance.cli",
        "--input_file",
        input_path,
        "--output_file",
        output_path,
        "--diagnostics_output_file",
        diagnostics_path,
        "--covariate_columns",
        "age,gender",
        "--method",
        "ipw",
    ]

    subprocess.check_call(cmd)

    adjusted_df = pd.read_csv(output_path)
    diagnostics_df = pd.read_csv(diagnostics_path)

adjusted_df.head()


## Inspect diagnostics

The diagnostics output is a flat table that includes adjustment metadata and balance metrics. You can filter for specific rows to understand the adjustment summary.


In [None]:
diagnostics_df.query("metric == 'adjustment_method'")


## Next steps

- Try `--method cbps` or `--method rake` for alternative weighting approaches.
- Use `--outcome_columns` to control which columns are treated as outcomes.
- Supply `--ipw_logistic_regression_kwargs` to tune the IPW model.
