# Setup of the AnnData object
**Author:** [Severin Dicks](https://github.com/Intron7) (IBSM Freiburg)

This notebook is just downloader and sets up the [AnnData object](https://anndata.readthedocs.io/en/latest/index.html) we will be working with. In this example workflow we'll be looking at a dataset of ca. 90000 cells from lungcancer patients published by [Quin et al., Cell Research 2020](https://www.nature.com/articles/s41422-020-0355-0).
In the Pearson Residuals example we'll use a dataset of 200000 brain cells from [Nvidia](https://github.com/clara-parabricks/rapids-single-cell-examples/blob/master/notebooks/1M_brain_cpu_analysis.ipynb).



In [1]:
import gdown
import os
import wget
import scanpy as sc
os.makedirs("./h5",exist_ok=True)

In [2]:
url = 'https://drive.google.com/file/d/1eoK0m2ML1uNLc80L6yBuPrkJqsDF-QWj/view?usp=sharing'
output = './h5/adata.raw.h5ad'
gdown.download(url, output, quiet=True, fuzzy=True)

'./h5/adata.raw.h5ad'

In [3]:
import cellxgene_census
CENSUS_VERSION = "2025-01-30"
with cellxgene_census.open_soma(census_version=CENSUS_VERSION) as census:
    adata = cellxgene_census.get_anndata(census, "Homo sapiens",
            obs_value_filter='dataset_id=="ae29ebd0-1973-40a4-a6af-d15a5f77a80f"',
        )
adata= adata[adata.obs["assay"].isin(["10x 3' v3", "10x 5' v1", "10x 5' v2"])].copy()
adata.write("h5/dli_census.h5ad")

In [4]:
wget.download('https://rapids-single-cell-examples.s3.us-east-2.amazonaws.com/1M_brain_cells_10X.sparse.h5ad',
              "h5/nvidia_1.3M.h5ad")

100% [....................................................................] 5652968495 / 5652968495

'h5/nvidia_1.3M.h5ad'

In [11]:
wget.download('https://datasets.cellxgene.cziscience.com/3817734b-0f82-433b-8c38-55b214200fff.h5ad',
              "h5/cell_atlas.h5ad")

100% [..................................................................] 46534275253 / 46534275253

'h5/cell_atlas.h5ad'