acteval

A Python library for evaluating synthetic activity schedules by comparing them to observed data. Given a population of daily activity sequences (who did what and when), acteval measures how well a synthetic population reproduces the observed distribution across multiple dimensions: activity frequencies, timing, transitions, participation rates, and novelty.

Install

pip install acteval

Or with uv:

uv add acteval

Development

# Install dependencies (including dev tools)
uv sync

# Run tests (benchmarks excluded by default)
uv run pytest tests/

# Run tests with coverage
uv run pytest --cov=src/acteval tests/

# Lint with ruff
ruff check src/ tests/

# Auto-fix lint issues
ruff check --fix src/ tests/

# Check formatting with black
black --check src/ tests/

# Auto-format
black src/ tests/

Benchmarks

Two benchmark suites are included, both using pytest-benchmark.

Population evaluation (`compare`)

Tests compare() at 1k, 20k, and 100k rows (observed + synthetic populations):

uv run pytest tests/test_bench_evaluate.py --benchmark-only -v

Pairwise distances (`pairwise_distances`)

Tests pairwise_distances() at 256, 512, and 1024 schedules:

uv run pytest tests/test_bench_pairwise.py --benchmark-only -v

Both suites together

uv run pytest tests/test_bench_evaluate.py tests/test_bench_pairwise.py --benchmark-only -v

Comparing runs

# Save a baseline
uv run pytest tests/test_bench_evaluate.py tests/test_bench_pairwise.py --benchmark-only --benchmark-save=baseline

# Compare against it after making changes
uv run pytest tests/test_bench_evaluate.py tests/test_bench_pairwise.py --benchmark-only --benchmark-compare=baseline

Input format

Data is passed as a pandas DataFrame with one row per activity episode:

column	type	description
`pid`	int/str	Person identifier
`act`	str	Activity label (e.g. `"home"`, `"work"`, `"shop"`)
`start`	numeric	Start time (any consistent unit, e.g. hours)
`end`	numeric	End time
`duration`	numeric	Duration (`end - start`)

import pandas as pd

observed = pd.DataFrame([
    {"pid": 0, "act": "home", "start": 0,  "end": 6,  "duration": 6},
    {"pid": 0, "act": "work", "start": 6,  "end": 14, "duration": 8},
    {"pid": 0, "act": "home", "start": 14, "end": 24, "duration": 10},
    {"pid": 1, "act": "home", "start": 0,  "end": 10, "duration": 10},
    {"pid": 1, "act": "work", "start": 10, "end": 24, "duration": 14},
])

API

`compare(observed, synthetic, **kwargs)`

Compare one or more synthetic populations to an observed population.

from acteval import compare

# Single synthetic population
result = compare(observed, synthetic)

# Multiple models side-by-side
result = compare(observed, {"model_a": synthetic_a, "model_b": synthetic_b})

result is a dict with six DataFrames:

key	description
`"distances"`	Per-feature distances
`"group_distances"`	Distances aggregated by feature group
`"domain_distances"`	Distances aggregated by domain (participations, transitions, timing)
`"descriptions"`	Per-feature descriptive statistics
`"group_descriptions"`	Group-level descriptive statistics
`"domain_descriptions"`	Domain-level descriptive statistics

Pass report_stats=True to include additional statistics in the output.

`Evaluator`

Use Evaluator when comparing multiple synthetic populations against the same observed data — it computes and caches the observed features once. Each compare() call is independent.

from acteval import Evaluator

evaluator = Evaluator(observed)

result_v1 = evaluator.compare({"v1": synthetic_v1})
result_v2 = evaluator.compare({"v2": synthetic_v2})

For fine-grained control, compare one model at a time then assemble:

evaluator.compare_population("v1", synthetic_v1)
evaluator.compare_population("v2", synthetic_v2)
result = evaluator.report()

`pairwise_distances(schedules, specs=None)`

Compute a single NxN distance matrix between individual schedules. Useful for clustering, outlier detection, or directly comparing a small batch of schedules.

from acteval import pairwise_distances

result = pairwise_distances(schedules)
result.matrix          # numpy array, shape (N, N)
result.pids            # original pid values, length N

# Get a labeled DataFrame with original pid values as index/columns
df = result.to_dataframe()

# Example: find the two most similar schedules
import numpy as np
dist = result.to_dataframe()
dist.values[np.arange(len(dist)), np.arange(len(dist))] = np.inf
i, j = np.unravel_index(dist.values.argmin(), dist.shape)
print(f"Most similar: {dist.index[i]} and {dist.columns[j]}")

The result matrix is symmetric with zeros on the diagonal. All values are in 0–1.

Pluggable distance specs

By default, three equal-weight semantic-distance specs are used (participations, transitions, timing via MAE). Pass a custom specs list to change the metrics or their relative weights:

from acteval.pairwise import chamfer_spec, soft_dtw_spec, default_pairwise_specs

# Chamfer distance on EOS-padded activity sequences
result = pairwise_distances(schedules, specs=[chamfer_spec()])

# Soft-DTW on EOS-padded activity sequences
result = pairwise_distances(schedules, specs=[soft_dtw_spec(gamma=1.0)])

# Mix metrics with custom weights
result = pairwise_distances(schedules, specs=[
    *default_pairwise_specs(),       # weight=1.0 each
    chamfer_spec(weight=1.0),
    soft_dtw_spec(weight=2.0),
])

Each spec defines a feature_fn (extracts a (N, ...) array from the population) and a distance_fn (computes the (N, N) matrix). The final matrix is a weighted average across all active specs.

Factory	Description
`default_pairwise_specs()`	MAE on participation counts, bi-gram counts, mean durations
`chamfer_spec(max_len, weight)`	Chamfer distance on EOS-padded `(N, L, 2)` sequences
`soft_dtw_spec(max_len, gamma, weight)`	Soft-DTW on EOS-padded `(N, L, 2)` sequences

`Population`

For direct access to the underlying data structure:

from acteval import Population

pop = Population(observed)
print(pop.acts)       # activity labels per episode
print(pop.durations)  # durations as numpy array

Reading the results

The six DataFrames

compare() returns an EvalResult with six DataFrames at three levels of aggregation:

Property	Index	Content
`result.distances`	`(domain, feature, segment)`	Per-feature distances — lower is closer to observed
`result.descriptions`	`(domain, feature, segment)`	Per-feature descriptive statistics (e.g. average start time)
`result.group_distances`	`(domain, feature)`	Distances averaged across segments
`result.group_descriptions`	`(domain, feature)`	Descriptions averaged across segments
`result.domain_distances`	`(domain,)`	Distances averaged across features — one row per domain
`result.domain_descriptions`	`(domain,)`	Descriptions averaged across features

Distances are in the range 0–1 (lower is better). A distance of 0.0 means the synthetic distribution perfectly matches observed; 1.0 is the maximum penalty.

Note on timing features: A distance of 1.0 for a timing feature means the activity is entirely absent from the synthetic population — not just timed differently. This is treated as a maximum-penalty missing feature rather than a distributional difference.

Evaluation domains

Domain	What it measures
`participations`	Who does what and how often — participation rates, joint participation, sequence lengths
`transitions`	Activity sequences — 2-, 3-, and 4-gram transition patterns
`timing`	When and how long — start times, durations, and their joint distributions
`creativity`	How novel and diverse the synthetic schedules are relative to observed
`feasibility`	Structural validity — home-based schedules, no consecutive duplicate activities

Ranking models

result = compare(observed, {"model_a": df_a, "model_b": df_b})

# Mean domain distance per model (lower is better)
print(result.rank_models())
# model_a    0.12
# model_b    0.19

# Best model
print(result.best_model)   # "model_a"

# Domain-level summary table
print(result.summary())
#                   model_a  model_b
# domain
# creativity           0.08     0.14
# feasibility          0.03     0.05
# participations       0.11     0.18
# timing               0.16     0.22
# transitions          0.22     0.35

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
scripts		scripts
src/acteval		src/acteval
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

acteval

Install

Development

Benchmarks

Population evaluation (`compare`)

Pairwise distances (`pairwise_distances`)

Both suites together

Comparing runs

Input format

API

`compare(observed, synthetic, **kwargs)`

`Evaluator`

`pairwise_distances(schedules, specs=None)`

Pluggable distance specs

`Population`

Reading the results

The six DataFrames

Evaluation domains

Ranking models

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

acteval

Install

Development

Benchmarks

Population evaluation (compare)

Pairwise distances (pairwise_distances)

Both suites together

Comparing runs

Input format

API

compare(observed, synthetic, **kwargs)

Evaluator

pairwise_distances(schedules, specs=None)

Pluggable distance specs

Population

Reading the results

The six DataFrames

Evaluation domains

Ranking models

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Population evaluation (`compare`)

Pairwise distances (`pairwise_distances`)

`compare(observed, synthetic, **kwargs)`

`Evaluator`

`pairwise_distances(schedules, specs=None)`

`Population`

Packages