# Setup of the AnnData object
**Author:** [Severin Dicks](https://github.com/Intron7)
**Copyright** [scverse](https://scverse.org)

This notebook handles the download and storage of AnnData objects required for the rapids-singlecell tutorials.

We will:
- **Download example datasets** from online repositories
- **Save the AnnData objects** locally for subsequent analysis

This setup ensures all required data is available locally before proceeding with GPU-accelerated analysis workflows.

In [1]:
import os
import anndata as ad
import wget
import scanpy as sc
os.makedirs("./h5",exist_ok=True)

In [2]:
wget.download('https://figshare.com/ndownloader/files/45788454',
              "./h5/adata.raw.h5ad")

'./h5/adata.raw.h5ad'

In [2]:
import cellxgene_census
CENSUS_VERSION = "2025-01-30"
with cellxgene_census.open_soma(census_version=CENSUS_VERSION) as census:
    adata = cellxgene_census.get_anndata(census, "Homo sapiens",
            obs_value_filter='dataset_id=="ae29ebd0-1973-40a4-a6af-d15a5f77a80f"',
        )
adata= adata[adata.obs["assay"].isin(["10x 3' v3", "10x 5' v1", "10x 5' v2"])].copy()
adata.write("h5/dli_census.h5ad")

In [2]:
wget.download('https://rapids-single-cell-examples.s3.us-east-2.amazonaws.com/1M_brain_cells_10X.sparse.h5ad',
              "h5/nvidia_1.3M.h5ad")

100% [....................................................................] 5652968495 / 5652968495

'h5/nvidia_1.3M (1).h5ad'

In [None]:
from packaging.version import parse as parse_version

if parse_version(ad.__version__) < parse_version("0.12.0rc1"):
    from anndata.experimental import read_elem_as_dask as read_dask
else:
    from anndata.experimental import read_elem_lazy as read_dask

In [3]:
import h5py

SPARSE_CHUNK_SIZE = 20_000
data_pth = "h5/nvidia_1.3M.h5ad"


f = h5py.File(data_pth)
X = f["X"]
shape = X.attrs["shape"]
adata = ad.AnnData(
    X = read_dask(X, (SPARSE_CHUNK_SIZE, shape[1])),
    obs = ad.io.read_elem(f["obs"]),
    var = ad.io.read_elem(f["var"]))
f.close()

adata.write_zarr("zarr/nvidia_1.3M.zarr")

  utils.warn_names_duplicates("var")


In [11]:
wget.download('https://datasets.cellxgene.cziscience.com/3817734b-0f82-433b-8c38-55b214200fff.h5ad',
              "h5/cell_atlas.h5ad")

100% [..................................................................] 46534275253 / 46534275253

'h5/cell_atlas.h5ad'

In [14]:
from anndata.experimental import read_elem_as_dask

import h5py

SPARSE_CHUNK_SIZE = 20_000
data_pth = "h5/cell_atlas.h5ad"


f = h5py.File(data_pth)
X = f["X"]
shape = X.attrs["shape"]
adata = ad.AnnData(
    X = read_dask(X, (SPARSE_CHUNK_SIZE, shape[1])),
    obs = ad.io.read_elem(f["obs"]),
    var = ad.io.read_elem(f["var"]))
f.close()

adata.write_zarr("zarr/cell_atlas.zarr")