# End-to-End Results Demo

This notebook is the single maintained walkthrough for the compact results companion repository.

It does three things:
- runs the small demo script
- loads the committed summary tables
- displays the verified markdown conclusions


In [1]:
from __future__ import annotations

import subprocess
import sys
from pathlib import Path

import pandas as pd
from IPython.display import display

def find_repo_root(start: Path) -> Path:
    for candidate in (start, *start.parents):
        if (candidate / '.git').exists():
            return candidate
    raise RuntimeError('Could not find repository root (missing .git).')

repo_root = find_repo_root(Path.cwd())
agg_dir = repo_root / "aggregated_results"
print(f"Repo root: {repo_root.name}")


Repo root: crispr-perturbation-manifold-benchmarks


In [2]:
demo_script = repo_root / "scripts" / "demo" / "run_end_to_end_results_demo.py"
completed = subprocess.run([sys.executable, str(demo_script)], cwd=repo_root, capture_output=True, text=True, check=True)
print(completed.stdout)


# End-to-End Results Demo Summary

## Verified Conclusions

1. LSFT adds little on top of the strongest single-cell baseline.
   - Mean Δr (`lpm_selftrained`): 0.0006
   - Mean Δr (`lpm_scgptGeneEmb`): 0.0002
   - Mean Δr (`lpm_randomPertEmb`): -0.0163

2. Self-trained PCA (`lpm_selftrained`) is the top baseline across datasets.
   - Single-cell best baseline: `lpm_selftrained`
   - Pseudobulk best baseline: `lpm_selftrained`

3. More local training data improves pseudobulk LSFT for `lpm_selftrained`.
   - adamson: 1% 0.925 -> 10% 0.943 (Δr=0.019)
   - k562: 1% 0.677 -> 10% 0.706 (Δr=0.029)
   - rpe1: 1% 0.776 -> 10% 0.793 (Δr=0.017)

4. PCA also leads in LOGO generalization.
   - Single-cell LOGO top baseline: `lpm_selftrained` (mean r=0.327)
   - Pseudobulk LOGO top baseline: `lpm_selftrained` (mean r=0.773)

## Sponsorship
This project was sponsored by the **NIH Bridges to Baccalaureate** program.

## Outputs
- aggregated_results/baseline_performance_all_analyses.csv
- aggregated_re

In [3]:
best = pd.read_csv(agg_dir / "best_baseline_per_dataset.csv")
lsft = pd.read_csv(agg_dir / "lsft_improvement_summary.csv")
logo = pd.read_csv(agg_dir / "logo_generalization_all_analyses.csv")
trend = pd.read_csv(agg_dir / "selftrained_pseudobulk_data_scale_trend.csv")

print("Best baseline by dataset and analysis type:")
display(best)

print("Single-cell LSFT mean delta for key baselines:")
display(lsft[lsft["baseline"].isin(["lpm_selftrained", "lpm_scgptGeneEmb", "lpm_randomPertEmb"])][["dataset", "baseline", "mean_delta_r"]])

print("Pseudobulk self-trained LSFT data-scale trend:")
display(trend)

print("Top LOGO baseline by analysis type:")
display(
    logo.groupby(["analysis_type", "baseline"], as_index=False)["pearson_r"]
    .mean()
    .sort_values(["analysis_type", "pearson_r"], ascending=[True, False])
    .groupby("analysis_type")
    .head(1)
)


Best baseline by dataset and analysis type:


Unnamed: 0,dataset,analysis_type,best_baseline,pearson_r
0,adamson,pseudobulk,lpm_selftrained,0.94648
1,k562,pseudobulk,lpm_selftrained,0.663806
2,rpe1,pseudobulk,lpm_selftrained,0.767834
3,adamson,single_cell,lpm_selftrained,0.395973
4,k562,single_cell,lpm_selftrained,0.261948
5,rpe1,single_cell,lpm_selftrained,0.395125


Single-cell LSFT mean delta for key baselines:


Unnamed: 0,dataset,baseline,mean_delta_r
2,adamson,lpm_randomPertEmb,-0.044419
4,adamson,lpm_scgptGeneEmb,-0.008315
5,adamson,lpm_selftrained,0.001146
7,k562,lpm_randomPertEmb,-0.00309
9,k562,lpm_scgptGeneEmb,0.00377
10,k562,lpm_selftrained,0.00048
13,rpe1,lpm_randomPertEmb,-0.001503
15,rpe1,lpm_scgptGeneEmb,0.005081
16,rpe1,lpm_selftrained,0.000255


Pseudobulk self-trained LSFT data-scale trend:


Unnamed: 0,dataset,start_pct,end_pct,start_r,end_r,delta_r
0,adamson,0.01,0.1,0.924709,0.94321,0.018502
1,k562,0.01,0.1,0.676557,0.705624,0.029067
2,rpe1,0.01,0.1,0.776398,0.793398,0.017


Top LOGO baseline by analysis type:


Unnamed: 0,analysis_type,baseline,pearson_r
7,pseudobulk_logo,lpm_selftrained,0.772928
13,single_cell_logo,lpm_selftrained,0.327289


In [4]:
summary_path = agg_dir / "final_conclusions_verified.md"
print(summary_path.relative_to(repo_root))
print("-" * 80)
print(summary_path.read_text(encoding="utf-8"))


aggregated_results/final_conclusions_verified.md
--------------------------------------------------------------------------------
# End-to-End Results Demo Summary

## Verified Conclusions

1. LSFT adds little on top of the strongest single-cell baseline.
   - Mean Δr (`lpm_selftrained`): 0.0006
   - Mean Δr (`lpm_scgptGeneEmb`): 0.0002
   - Mean Δr (`lpm_randomPertEmb`): -0.0163

2. Self-trained PCA (`lpm_selftrained`) is the top baseline across datasets.
   - Single-cell best baseline: `lpm_selftrained`
   - Pseudobulk best baseline: `lpm_selftrained`

3. More local training data improves pseudobulk LSFT for `lpm_selftrained`.
   - adamson: 1% 0.925 -> 10% 0.943 (Δr=0.019)
   - k562: 1% 0.677 -> 10% 0.706 (Δr=0.029)
   - rpe1: 1% 0.776 -> 10% 0.793 (Δr=0.017)

4. PCA also leads in LOGO generalization.
   - Single-cell LOGO top baseline: `lpm_selftrained` (mean r=0.327)
   - Pseudobulk LOGO top baseline: `lpm_selftrained` (mean r=0.773)

## Sponsorship
This project was sponsored by th