# Lab 10: Export + Reproducibility

## Objectives
- Export a processed dataset to common formats
- Define what to save vs what to recompute
- Create a minimal reproducibility bundle (config + versions + logs)

## Outputs
- `results/repro_bundle/` folder containing:
  - `versions.txt`
  - `params.json`
  - `exported/` (your exported matrices)
  - `README.md` (how to re-run)

---

Rule: **Never overwrite raw inputs.** Export processed outputs into a new folder with timestamps or version tags.


In [None]:
import json
import platform
from pathlib import Path

import scanpy as sc

# Load example data (replace with your filtered dataset if you have one)
adata = sc.datasets.pbmc3k()

out_dir = Path('../results/repro_bundle')
export_dir = out_dir / 'exported'
export_dir.mkdir(parents=True, exist_ok=True)

# Record versions
versions = {
    'python': platform.python_version(),
    'platform': platform.platform(),
    'scanpy': sc.__version__,
}
(out_dir / 'versions.txt').write_text('\n'.join([f"{k}: {v}" for k, v in versions.items()]) + '\n')

# Record parameters (fill with your real thresholds)
params = {
    'min_genes': 200,
    'max_genes': 2500,
    'max_pct_mt': 5,
}
(out_dir / 'params.json').write_text(json.dumps(params, indent=2) + '\n')

print(f"Wrote {out_dir}/versions.txt and {out_dir}/params.json")


In [None]:
# Export formats
# 1) H5AD (Scanpy)
h5ad_path = export_dir / 'adata_filtered.h5ad'
adata.write(h5ad_path)
print(f"Wrote {h5ad_path}")

# 2) MTX (Matrix Market)
mtx_dir = export_dir / 'mtx'
mtx_dir.mkdir(exist_ok=True)
sc.write_10x_mtx(mtx_dir, adata, gene_symbols='gene_symbols', overwrite=True)
print(f"Wrote MTX folder {mtx_dir}")

# 3) CSV for metadata
adata.obs.to_csv(export_dir / 'cell_metadata.csv')
print("Wrote cell_metadata.csv")


## Create a tiny README

In `results/repro_bundle/README.md`, write:
- what raw inputs were used (FASTQ paths)
- what reference was used (FASTA/GTF + version)
- commands to re-run the pipeline
- where outputs are stored

This is the minimum you need to make your analysis reproducible.
