<a href="https://colab.research.google.com/github/livieee/Software-Engineering-Management/blob/main/Development_Environment_Setup_%26_Python_Quick_Start.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Installation
# Requirements
The Census API requires a Linux or MacOS system with:



*   Python 3.10 to Python 3.12. Or R, supported versions TBD.
*   Recommended: >16 GB of memory.
*   Recommended: >5 Mbps internet connection.
*   Recommended: for increased performance use the API through a AWS-EC2 instance from the region us-west-2. The Census data builds are hosted in a AWS-S3 bucket in that region.



In [None]:
python -m venv ./venv
source ./venv/bin/activate

Install the` cellxgene-census` package via pip:

In [None]:
pip install -U cellxgene-census

There are also “experimental” add-on modules that are less stable than the main API, and may have more complex dependencies. To install these.

In [None]:
pip install -U cellxgene-census[experimental]

# Python quick start
Below are 3 examples of common operations you can do with the Census. As a reminder, the reference documentation for the API can be accessed via` help()`:

In [None]:
import cellxgene_census

help(cellxgene_census)
help(cellxgene_census.get_anndata)
# etc

Help on package cellxgene_census:

NAME
    cellxgene_census - An API to facilitate use of the CZI Science CELLxGENE Census. The Census is a versioned container of single-cell data hosted at `CELLxGENE Discover`_.

DESCRIPTION
    The API is built on the `tiledbsoma` SOMA API, and provides a number of helper functions including:
    
        * Open a named version of the Census, for use with the SOMA API
        * Get a list of available Census versions, and for each version, a description
        * Get a slice of the Census as an AnnData, for use with ScanPy
        * Get the URI for, or directly download, underlying data in H5AD format
    
    For more information on the API, visit the `cellxgene_census repo`_. For more information on SOMA, see the `tiledbsoma repo`_.
    
    .. _CELLxGENE Discover:
        https://cellxgene.cziscience.com/
    
    .. _cellxgene_census repo:
        https://github.com/chanzuckerberg/cellxgene-census/
    
    .. _tiledbsoma repo:
        https://g

# Python quick start

# Querying a slice of cell metadata
The following reads the cell metadata and **filters** female cells of cell type microglial cell or neuron, and selects the **columns** assay, cell_type, tissue, tissue_general, suspension_type, and disease.

In [None]:
import cellxgene_census

with cellxgene_census.open_soma() as census:

    # Reads SOMADataFrame as a slice
    cell_metadata = census["census_data"]["homo_sapiens"].obs.read(
        value_filter = "sex == 'female' and cell_type in ['microglial cell', 'neuron']",
        column_names = ["assay", "cell_type", "tissue", "tissue_general", "suspension_type", "disease"]
    )

    # Concatenates results to pyarrow.Table
    cell_metadata = cell_metadata.concat()

    # Converts to pandas.DataFrame
    cell_metadata = cell_metadata.to_pandas()

    print(cell_metadata)

The "stable" release is currently 2024-07-01. Specify 'census_version="2024-07-01"' in future calls to open_soma() to ensure data consistency.
INFO:cellxgene_census:The "stable" release is currently 2024-07-01. Specify 'census_version="2024-07-01"' in future calls to open_soma() to ensure data consistency.


            assay        cell_type                          tissue  \
0       10x 5' v1           neuron                            lung   
1       10x 5' v1           neuron                            lung   
2       10x 5' v1           neuron                            lung   
3       10x 5' v1           neuron                            lung   
4       10x 5' v1           neuron                            lung   
...           ...              ...                             ...   
732776  10x 3' v3  microglial cell  dorsolateral prefrontal cortex   
732777  10x 3' v3  microglial cell  dorsolateral prefrontal cortex   
732778  10x 3' v3  microglial cell  dorsolateral prefrontal cortex   
732779  10x 3' v3  microglial cell  dorsolateral prefrontal cortex   
732780  10x 3' v3  microglial cell  dorsolateral prefrontal cortex   

       tissue_general suspension_type   disease     sex  
0                lung            cell    normal  female  
1                lung            cell    no

# Obtaining a slice as AnnData

The following creates an anndata.AnnData object on-demand with the same cell filtering criteria as above and filtering only the genes ENSG00000161798, ENSG00000188229.

In [None]:
import cellxgene_census

with cellxgene_census.open_soma() as census:
    adata = cellxgene_census.get_anndata(
        census = census,
        organism = "Homo sapiens",
        var_value_filter = "feature_id in ['ENSG00000161798', 'ENSG00000188229']",
        obs_value_filter = "sex == 'female' and cell_type in ['microglial cell', 'neuron']",
        column_names = {"obs": ["assay", "cell_type", "tissue", "tissue_general", "suspension_type", "disease"]},
    )

    print(adata)

The "stable" release is currently 2024-07-01. Specify 'census_version="2024-07-01"' in future calls to open_soma() to ensure data consistency.
INFO:cellxgene_census:The "stable" release is currently 2024-07-01. Specify 'census_version="2024-07-01"' in future calls to open_soma() to ensure data consistency.
  adata = cellxgene_census.get_anndata(


AnnData object with n_obs × n_vars = 732781 × 2
    obs: 'assay', 'cell_type', 'tissue', 'tissue_general', 'suspension_type', 'disease', 'sex'
    var: 'soma_joinid', 'feature_id', 'feature_name', 'feature_length', 'nnz', 'n_measured_obs'


The output with about 300K cells and 2 genes can be now used for downstream analysis using scanpy.


# Memory-efficient queries
This example provides a demonstration to access the data for larger-than-memory operations using TileDB-SOMA operations.

First we initiate a lazy-evaluation query to access all brain and male cells from human. This query needs to be closed — query.close() — or called in a context manager — with ....



In [None]:
import cellxgene_census
import tiledbsoma

with cellxgene_census.open_soma() as census:

    human = census["census_data"]["homo_sapiens"]
    query = human.axis_query(
       measurement_name = "RNA",
       obs_query = tiledbsoma.AxisQuery(
           value_filter = "tissue == 'brain' and sex == 'male'"
       )
    )

    # Continued below


The "stable" release is currently 2024-07-01. Specify 'census_version="2024-07-01"' in future calls to open_soma() to ensure data consistency.
INFO:cellxgene_census:The "stable" release is currently 2024-07-01. Specify 'census_version="2024-07-01"' in future calls to open_soma() to ensure data consistency.
