Script to query data from Cell Census.
    
    
SOMA = STACKS of matrices, annotated: https://github.com/single-cell-data/SOMA/blob/main/abstract_specification.md

CELLxGENE dataset schema: https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/3.0.0/schema.md

Helpful links:
https://github.com/chanzuckerberg/cell-census/blob/main/api/python/notebooks/api_demo/census_query_extract.ipynb

Overview of AnnData: https://adamgayoso.com/posts/ten_min_to_adata/

Functions to write:

1) get data using get_anndata
2) check data query for existing keywords so it doesn't time out - DONE

In [1]:
import cellxgene_census
import anndata as ad

from pronto import Ontology


In [2]:
cellxgene_census.__version__

'1.5.1'

In [3]:
census = cellxgene_census.open_soma(census_version="stable")


The "stable" release is currently 2023-07-25. Specify 'census_version="2023-07-25"' in future calls to open_soma() to ensure data consistency.


In ```n_obs``` there are a few ontology related terms. One of these might be our target variable, perhaps cell_type?

- ```cell_type_ontology_term_id``` 
- ```development_stage_ontology_term_id``` 
- ```disease_ontology_term_id``` 
- ```self_reported_ethnicity_ontology_term_id``` 
- ```sex_ontology_term_id``` 
- ```tissue_ontology_term_id``` 
- ```tissue_general_ontology_term_id```

```obs``` = cell metadata
```var``` = feature metadata

Data is stored in ```adata.X``` which is a sparse matrix 

## Load Ontology

In [4]:
cl = Ontology.from_obo_library('cl.owl')

  meta.annotations.add(self._extract_literal_pv(child))
  meta.annotations.add(self._extract_literal_pv(child))
  meta.annotations.add(self._extract_literal_pv(child))
  self._extract_object_property(prop, curies)
  self._extract_object_property(prop, curies)
  self._extract_object_property(prop, curies)
  self._extract_object_property(prop, curies)
  self._extract_object_property(prop, curies)
  self._extract_object_property(prop, curies)
  self._extract_object_property(prop, curies)
  self._extract_object_property(prop, curies)
  self._extract_object_property(prop, curies)
  self._extract_object_property(prop, curies)
  self._extract_object_property(prop, curies)
  self._extract_object_property(prop, curies)
  self._extract_object_property(prop, curies)
  self._extract_object_property(prop, curies)
  self._extract_object_property(prop, curies)
  self._extract_object_property(prop, curies)
  self._extract_object_property(prop, curies)
  self._extract_object_property(prop, curies)
  se

  self._extract_term(class_, curies)
  self._extract_term(class_, curies)
  self._extract_term(class_, curies)
  self._extract_term(class_, curies)
  self._extract_term(class_, curies)
  self._extract_term(class_, curies)
  self._extract_term(class_, curies)
  self._extract_term(class_, curies)
  self._extract_term(class_, curies)
  self._extract_term(class_, curies)
  self._extract_term(class_, curies)
  self._extract_term(class_, curies)
  self._extract_term(class_, curies)
  self._extract_term(class_, curies)
  self._extract_term(class_, curies)
  self._extract_term(class_, curies)
  self._extract_term(class_, curies)
  self._extract_term(class_, curies)
  self._extract_term(class_, curies)
  self._extract_term(class_, curies)
  self._extract_term(class_, curies)
  cls(self).parse_from(_handle)  # type: ignore
  cls(self).parse_from(_handle)  # type: ignore
  cls(self).parse_from(_handle)  # type: ignore
  cls(self).parse_from(_handle)  # type: ignore
  cls(self).parse_from(_handle)

  cls(self).parse_from(_handle)  # type: ignore
  cls(self).parse_from(_handle)  # type: ignore
  cls(self).parse_from(_handle)  # type: ignore
  cls(self).parse_from(_handle)  # type: ignore
  cls(self).parse_from(_handle)  # type: ignore
  cls(self).parse_from(_handle)  # type: ignore
  cls(self).parse_from(_handle)  # type: ignore
  cls(self).parse_from(_handle)  # type: ignore
  cls(self).parse_from(_handle)  # type: ignore
  cls(self).parse_from(_handle)  # type: ignore
  cls(self).parse_from(_handle)  # type: ignore
  cls(self).parse_from(_handle)  # type: ignore
  cls(self).parse_from(_handle)  # type: ignore
  cls(self).parse_from(_handle)  # type: ignore
  cls(self).parse_from(_handle)  # type: ignore
  cls(self).parse_from(_handle)  # type: ignore
  cls(self).parse_from(_handle)  # type: ignore


In [9]:
def select_ontology_target_leafs(target_branch):
    '''
    This function identifies are leafs under the target_branch for an open ontology.
    
    Assumes there is an active ontology already open as cl.
    
    Parameters
    ----------
    target_branch : string
        string with ontological ID for branch you want to identify leafs for
        
    Returns
    -------
    leaf_list : list
        list of ontology IDs for all leafs of target_branch
    
    '''
    root_node = cl[target_branch] 

    leaf_list = []

    for term in root_node.subclasses(distance=None,with_self=False).to_set():
        if term.is_leaf():
            leaf_list.append(term.id)
            
    return leaf_list

In [10]:
target_branch = 'CL:0000738' # leukocyte

leaf_list = select_ontology_target_leafs(target_branch)


In [13]:
len(list(set(leaf_list)))

324

## Check Query

Code to check query before running get_anndata and crashing the kernel.

Assumes there is a census object already open

In [42]:
cell_10v3.columns

Index(['soma_joinid', 'dataset_id', 'assay', 'assay_ontology_term_id',
       'cell_type', 'cell_type_ontology_term_id', 'development_stage',
       'development_stage_ontology_term_id', 'disease',
       'disease_ontology_term_id', 'donor_id', 'is_primary_data',
       'self_reported_ethnicity', 'self_reported_ethnicity_ontology_term_id',
       'sex', 'sex_ontology_term_id', 'suspension_type', 'tissue',
       'tissue_ontology_term_id', 'tissue_general',
       'tissue_general_ontology_term_id'],
      dtype='object')

In [5]:
def check_cell_census_query(metadata_columns,col_vals):
    '''
    This function checks an active census object to see if you can successfully filter on the inputs. This is
    a quick way to check your query before running get_anndata(), which can result in a kernel crash if the 
    filtering is not correct.
    
    Assumes there is an active census object already open. Assumes you only want to query on cell metadata. 
    Gene metadata querying not currently supported.
    
    Parameters
    ----------
    metadata_columns : list
        list of strings containing obs parameters to query
        
    col_vals : list
        list of strings containing obs parameters values you hope to filter on
        
    Returns
    -------
        printed string detailing in query would be valid or not
    
    '''
    with cellxgene_census.open_soma(census_version="2023-07-25") as census:
        cell_metadata_check = census["census_data"]["homo_sapiens"].obs.read(column_names=metadata_columns).concat().to_pandas()

    for x in range(len(metadata_columns)):
        if col_vals[x] in cell_metadata_check[metadata_columns[x]].unique():
            print(col_vals[x], ' is in query')
        else:
            print(col_vals[x], ' is NOT in query. Rewrite querey before running get_anndata()')

In [6]:
metadata_columns = ['tissue_general_ontology_term_id','assay']
col_vals = ['UBERON:0002405',"10x 3' v3"]

check_cell_census_query(metadata_columns,col_vals)

UBERON:0002405  is in query
10x 3' v3  is in query


## Find Query Options

Before you run a query, see the options for a subset of columns.

The *obs* columns to query are

- soma_joinid
- dataset_id
- assay
- assay_ontology_term_id
- cell_type
- cell_type_ontology_term_id
- development_stage
- development_stage_ontology_term_id
- disease
- disease_ontology_term_id
- donor_id
- is_primary_data
- self_reported_ethnicity
- self_reported_ethnicity_ontology_term_id
- sex
- sex_ontology_term_id
- suspension_type
- tissue
- tissue_ontology_term_id
- tissue_general
- tissue_general_ontology_term_id


In [12]:
def see_cell_census_column_options(column_to_check):
    '''
    This function checks an active census object to identify the unique values contained in the
    column of interest.
    
    Assumes there is an active census object already open. Assumes you only want to query on cell metadata. 
    Gene metadata querying not currently supported.
    
    Parameters
    ----------
    column_to_check : string
        string containing obs parameter to query
                
    Returns
    -------
        printed string detailing unique values for input column
    
    '''
    cell_column_check = census["census_data"]["homo_sapiens"].obs.read(column_names=column_check).concat().to_pandas()
    for col in column_check:
        print('The unique values in ', col, ' are')
        print(cell_column_check[col].unique())
        print('')

In [14]:
column_check = ['cell_type_ontology_term_id']

see_cell_census_column_options(column_check)

The unique values in  cell_type_ontology_term_id  are
['CL:0000649' 'CL:0002187' 'CL:0000148' 'CL:0000312' 'CL:0000242'
 'CL:0000988' 'CL:2000092' 'CL:0002189' 'CL:0000499' 'CL:0000623'
 'CL:0000192' 'CL:0000151' 'CL:0000067' 'CL:0000235' 'CL:0000669'
 'CL:0000236' 'CL:0000097' 'CL:0000115' 'CL:0002138' 'CL:0000738'
 'CL:1000334' 'CL:0019032' 'CL:0002071' 'CL:0009039' 'CL:0000677'
 'CL:1000495' 'CL:0009042' 'CL:0009041' 'CL:0009043' 'CL:0009017'
 'CL:0002254' 'CL:0009012' 'CL:0009011' 'CL:0011026' 'CL:0009006'
 'CL:1000343' 'CL:1000353' 'CL:0000576' 'CL:0000451' 'CL:0000084'
 'CL:4030006' 'CL:0000057' 'CL:0000786' 'CL:0000003' 'CL:0000171'
 'CL:0000173' 'CL:0000169' 'CL:0002275' 'CL:1000329' 'CL:0000787'
 'CL:0000798' 'CL:0000909' 'CL:1000348' 'CL:0000064' 'CL:0000898'
 'CL:0000939' 'CL:0005012' 'CL:0000775' 'CL:0000158' 'CL:0000068'
 'CL:0000453' 'CL:0017000' 'CL:0000788' 'CL:0000990' 'CL:0000814'
 'CL:0000890' 'CL:0001065' 'CL:0000076' 'CL:0001058' 'CL:0000815'
 'CL:0000938' 'CL:0000

## Check Subset Of Data

Let's write a function so that we can filter on one set of data and check for the presence of a possible secondary filter.

In [14]:
def check_subset(filter,col):
    '''
    This function checks an active census object to identify the unique values contained in the
    column of interest, after filtering on an initial column.
    
    Assumes there is an active census object already open. Assumes you only want to query on cell metadata. 
    Gene metadata querying not currently supported. Currently only supports querying one column at a time.
    
    Parameters
    ----------
    filter : string
        string containing obs parameter filter
        
    col : string
        string containing column of interest for identifying unique values
                
    Returns
    -------
        printed string detailing unique values for input column after applying filter
    
    '''
    cell_data = (
        census["census_data"]["homo_sapiens"]
        .obs.read(value_filter=filter)
        .concat()
        .to_pandas()
    )
    
    print('After filtering on ', filter, 'the unique values for ', col, 'are:')
    print(cell_data[col].unique())
    return(cell_data[col].unique())
    

In [15]:
test = check_subset('''assay == "10x 3\' v3" and cell_type_ontology_term_id in {}'''.format(leaf_list_in_cl),# and cell_type_ontology_term_id in ["CL:0000738","CL:0000542"]''',
             'cell_type_ontology_term_id')



After filtering on  assay == "10x 3' v3" and cell_type_ontology_term_id in ['CL:0000985', 'CL:0000987', 'CL:0000913', 'CL:0000905', 'CL:0000091', 'CL:0000895', 'CL:0000794', 'CL:0000583', 'CL:0002399', 'CL:0000903', 'CL:0000910', 'CL:0000938', 'CL:0000915', 'CL:0000899', 'CL:0000904', 'CL:0000900', 'CL:0002396', 'CL:0000934', 'CL:0000940', 'CL:0000907', 'CL:0001057', 'CL:0011025', 'CL:0001050', 'CL:0000939', 'CL:0001062', 'CL:0001044', 'CL:0002394', 'CL:2000055', 'CL:0000807', 'CL:0000808', 'CL:0001058', 'CL:0001043', 'CL:0001049', 'CL:0001076', 'CL:0002057'] the unique values for  cell_type_ontology_term_id are:
['CL:0000985' 'CL:0000987' 'CL:0000913' 'CL:0000905' 'CL:0000091'
 'CL:0000895' 'CL:0000794' 'CL:0000583' 'CL:0002399' 'CL:0000903'
 'CL:0000910' 'CL:0000938' 'CL:0000915' 'CL:0000899' 'CL:0000904'
 'CL:0000900' 'CL:0002396' 'CL:0000934' 'CL:0000940' 'CL:0000907'
 'CL:0001057' 'CL:0011025' 'CL:0001050' 'CL:0000939' 'CL:0001062'
 'CL:0001044' 'CL:0002394' 'CL:2000055' 'CL:00008

In [None]:
#obs_val_filter = '''assay == "10x 3\' v3" and cell_type_ontology_term_id in {}'''.format(leaf_list)


In [15]:
unique_vals = check_subset('''assay == "10x 3\' v3"''',# and cell_type_ontology_term_id in ["CL:0000738","CL:0000542"]''',
             'cell_type_ontology_term_id')

After filtering on  assay == "10x 3' v3" the unique values for  cell_type_ontology_term_id are:
['CL:0000525' 'CL:2000060' 'CL:0008036' 'CL:0002488' 'CL:0000499'
 'CL:0000003' 'CL:0000235' 'CL:0002601' 'CL:0009095' 'CL:0000084'
 'CL:0002343' 'CL:0000066' 'CL:0000623' 'CL:0002138' 'CL:0000815'
 'CL:0001078' 'CL:3000001' 'CL:0009092' 'CL:0000236' 'CL:2000042'
 'CL:0000786' 'CL:0000451' 'CL:0000094' 'CL:0002064' 'CL:0000115'
 'CL:0000763' 'CL:0002410' 'CL:0000814' 'CL:0000057' 'CL:0002079'
 'CL:0000169' 'CL:0000097' 'CL:0002275' 'CL:0000171' 'CL:0000173'
 'CL:0002623' 'CL:0000788' 'CL:0000787' 'CL:0000492' 'CL:0000669'
 'CL:0002503' 'CL:1000398' 'CL:0000576' 'CL:0000068' 'CL:0000646'
 'CL:0000625' 'CL:0000775' 'CL:0000185' 'CL:0009005' 'CL:0005006'
 'CL:0010008' 'CL:0000746' 'CL:0000182' 'CL:0000192' 'CL:0002548'
 'CL:0000186' 'CL:1001428' 'CL:0002144' 'CL:0002543' 'CL:0000784'
 'CL:1000329' 'CL:0000064' 'CL:0000151' 'CL:0000319' 'CL:1000330'
 'CL:0000624' 'CL:0000809' 'CL:1000320' 'CL:00

In [18]:
leaf_list_in_cl = [x for x in unique_vals if x in leaf_list]



In [16]:
len(unique_vals)

431

In [17]:
len(leaf_list)

324

In [19]:
len(leaf_list_in_cl)

36

In [21]:
print(leaf_list_in_cl)

['CL:0002343', 'CL:3000001', 'CL:0000895', 'CL:0000900', 'CL:0002394', 'CL:0002399', 'CL:2000055', 'CL:0001050', 'CL:0001044', 'CL:0000807', 'CL:0000808', 'CL:0000794', 'CL:0000985', 'CL:0000987', 'CL:0000913', 'CL:0000905', 'CL:0000091', 'CL:0000903', 'CL:0000910', 'CL:0000938', 'CL:0000915', 'CL:0000899', 'CL:0000904', 'CL:0002396', 'CL:0000934', 'CL:0000940', 'CL:0000907', 'CL:0001057', 'CL:0011025', 'CL:0000939', 'CL:0001062', 'CL:0001058', 'CL:0001043', 'CL:0001049', 'CL:0001076', 'CL:0002057']


Currently downloaded on great lakes:
['CL:0000985', 'CL:0000987', 'CL:0000913', 'CL:0000905', 'CL:0000091', 'CL:0000895', 'CL:0000794', 'CL:0000583', 'CL:0002399', 'CL:0000903', 'CL:0000910', 'CL:0000938', 'CL:0000915', 'CL:0000899', 'CL:0000904', 'CL:0000900', 'CL:0002396', 'CL:0000934', 'CL:0000940', 'CL:0000907', 'CL:0001057', 'CL:0011025', 'CL:0001050', 'CL:0000939', 'CL:0001062', 'CL:0001044', 'CL:0002394', 'CL:2000055', 'CL:0000807', 'CL:0000808', 'CL:0001058', 'CL:0001043', 'CL:0001049', 'CL:0001076', 'CL:0002057']

The following values are NOT in the new list:
CL:0000583 (This ID makes up almost 25 percent of the currently distribution of values. It's also not a leaf)


Currently in new leaf_list
['CL:0002343', 'CL:3000001', 'CL:0000895', 'CL:0000900', 'CL:0002394', 'CL:0002399', 'CL:2000055', 'CL:0001050', 'CL:0001044', 'CL:0000807', 'CL:0000808', 'CL:0000794', 'CL:0000985', 'CL:0000987', 'CL:0000913', 'CL:0000905', 'CL:0000091', 'CL:0000903', 'CL:0000910', 'CL:0000938', 'CL:0000915', 'CL:0000899', 'CL:0000904', 'CL:0002396', 'CL:0000934', 'CL:0000940', 'CL:0000907', 'CL:0001057', 'CL:0011025', 'CL:0000939', 'CL:0001062', 'CL:0001058', 'CL:0001043', 'CL:0001049', 'CL:0001076', 'CL:0002057']

The following values are NOT in the old list:
CL:0002343
CL:3000001



In [16]:
check_subset('tissue_general_ontology_term_id == "UBERON:0002405"', 'assay')

["10x 5' v1" "10x 3' v2"]


In [5]:
check_subset('assay == "10x 3\' v3"',
             'cell_type_ontology_term_id')

After filtering on  assay == "10x 3' v3" the unique values for  cell_type_ontology_term_id are:
['CL:0000151' 'CL:0000115' 'CL:0000499' 'CL:0000192' 'CL:0000669'
 'CL:0000623' 'CL:0000236' 'CL:0002138' 'CL:0000235' 'CL:0000097'
 'CL:0000067' 'CL:0000738' 'CL:1000334' 'CL:0019032' 'CL:0002071'
 'CL:0009039' 'CL:0000677' 'CL:1000495' 'CL:0009042' 'CL:0009041'
 'CL:0009043' 'CL:0009017' 'CL:0002254' 'CL:0009012' 'CL:0009011'
 'CL:0011026' 'CL:0009006' 'CL:1000343' 'CL:1000353' 'CL:0000576'
 'CL:0000451' 'CL:0000084' 'CL:4030006' 'CL:0000057' 'CL:0000786'
 'CL:0000003' 'CL:4023040' 'CL:0002605' 'CL:4023051' 'CL:4023070'
 'CL:4023012' 'CL:4023013' 'CL:0000128' 'CL:4023041' 'CL:4023017'
 'CL:1001602' 'CL:4023011' 'CL:4023038' 'CL:4023016' 'CL:4023036'
 'CL:4023018' 'CL:0000129' 'CL:4023015' 'CL:0002453' 'CL:0000583'
 'CL:0002063' 'CL:0002632' 'CL:0002062' 'CL:0000064' 'CL:0000745'
 'CL:0000750' 'CL:0000749' 'CL:0000636' 'CL:0000127' 'CL:0000604'
 'CL:0000573' 'CL:0000561' 'CL:1001509' 'CL:00

## Find 10X 3' V3 data from human immune cells



From (https://www.cancer.gov/publications/dictionaries/cancer-terms/def/immune-cell), immune cells include neutrophils, eosinophils, basophils, mast cells, monocytes, macrophages, dendritic cells, natural killer cells, and lymphocytes (B cells and T cells).

All of these show up as s ```cell_type``` using the ```obs``` axis. Some show up in multiple ways. We could create a list of cell_types to search for. 

Use cell_census.get_anndata to get the gene expression data



In [3]:
# obs.read brings in only the meta data
# .get_anndata brings in the specific gene/cell level data



cell_10v3 = (
   census["census_data"]["homo_sapiens"].obs.read(value_filter='''assay == "10x 3\' v3" and cell_type_ontology_term_id in ["CL:0000738","CL:0000542"]''').concat().to_pandas()
)

# adata = cell_census.get_anndata(
#         census=census,
#         organism = "Homo sapiens",
#         obs_value_filter = '''assay == "10x 3\' v3" and cell_type_ontology_term_id in ["CL:0000738","CL:0000542"]''',
#         column_names={"obs": ["development_stage"]},
#         )

display(cell_10v3)


Unnamed: 0,soma_joinid,dataset_id,assay,assay_ontology_term_id,cell_type,cell_type_ontology_term_id,development_stage,development_stage_ontology_term_id,disease,disease_ontology_term_id,...,is_primary_data,self_reported_ethnicity,self_reported_ethnicity_ontology_term_id,sex,sex_ontology_term_id,suspension_type,tissue,tissue_ontology_term_id,tissue_general,tissue_general_ontology_term_id
0,132580,d1207c81-7309-43a7-a5a0-f4283670b62b,10x 3' v3,EFO:0009922,leukocyte,CL:0000738,62-year-old human stage,HsapDv:0000156,normal,PATO:0000461,...,True,European,HANCESTRO:0005,female,PATO:0000383,cell,ovary,UBERON:0000992,ovary,UBERON:0000992
1,132607,d1207c81-7309-43a7-a5a0-f4283670b62b,10x 3' v3,EFO:0009922,leukocyte,CL:0000738,62-year-old human stage,HsapDv:0000156,normal,PATO:0000461,...,True,European,HANCESTRO:0005,female,PATO:0000383,cell,ovary,UBERON:0000992,ovary,UBERON:0000992
2,132635,d1207c81-7309-43a7-a5a0-f4283670b62b,10x 3' v3,EFO:0009922,leukocyte,CL:0000738,62-year-old human stage,HsapDv:0000156,normal,PATO:0000461,...,True,European,HANCESTRO:0005,female,PATO:0000383,cell,ovary,UBERON:0000992,ovary,UBERON:0000992
3,132641,d1207c81-7309-43a7-a5a0-f4283670b62b,10x 3' v3,EFO:0009922,leukocyte,CL:0000738,62-year-old human stage,HsapDv:0000156,normal,PATO:0000461,...,True,European,HANCESTRO:0005,female,PATO:0000383,cell,ovary,UBERON:0000992,ovary,UBERON:0000992
4,132649,d1207c81-7309-43a7-a5a0-f4283670b62b,10x 3' v3,EFO:0009922,leukocyte,CL:0000738,62-year-old human stage,HsapDv:0000156,normal,PATO:0000461,...,True,European,HANCESTRO:0005,female,PATO:0000383,cell,ovary,UBERON:0000992,ovary,UBERON:0000992
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
111055,43539341,ec6ceff8-c8bc-488d-b6bf-30df2fa92169,10x 3' v3,EFO:0009922,lymphocyte,CL:0000542,unknown,unknown,normal,PATO:0000461,...,True,unknown,unknown,unknown,unknown,nucleus,liver,UBERON:0002107,liver,UBERON:0002107
111056,43539769,ec6ceff8-c8bc-488d-b6bf-30df2fa92169,10x 3' v3,EFO:0009922,lymphocyte,CL:0000542,unknown,unknown,normal,PATO:0000461,...,True,unknown,unknown,unknown,unknown,nucleus,liver,UBERON:0002107,liver,UBERON:0002107
111057,43540093,ec6ceff8-c8bc-488d-b6bf-30df2fa92169,10x 3' v3,EFO:0009922,lymphocyte,CL:0000542,unknown,unknown,normal,PATO:0000461,...,True,unknown,unknown,unknown,unknown,nucleus,liver,UBERON:0002107,liver,UBERON:0002107
111058,43540098,ec6ceff8-c8bc-488d-b6bf-30df2fa92169,10x 3' v3,EFO:0009922,lymphocyte,CL:0000542,unknown,unknown,normal,PATO:0000461,...,True,unknown,unknown,unknown,unknown,nucleus,liver,UBERON:0002107,liver,UBERON:0002107


In [8]:
cell_10v3.shape

(111060, 21)

In [7]:
cell_10v3.columns

Index(['soma_joinid', 'dataset_id', 'assay', 'assay_ontology_term_id',
       'cell_type', 'cell_type_ontology_term_id', 'development_stage',
       'development_stage_ontology_term_id', 'disease',
       'disease_ontology_term_id', 'donor_id', 'is_primary_data',
       'self_reported_ethnicity', 'self_reported_ethnicity_ontology_term_id',
       'sex', 'sex_ontology_term_id', 'suspension_type', 'tissue',
       'tissue_ontology_term_id', 'tissue_general',
       'tissue_general_ontology_term_id'],
      dtype='object')

In [6]:

cell_10v3['cell_type_ontology_term_id'].unique()


array(['CL:0000738', 'CL:0000542'], dtype=object)

In [3]:
cell_test = (
   census["census_data"]["homo_sapiens"].obs.read(value_filter='''assay == "10x 3\' v3" ''').concat().to_pandas()
)

cell_test['cell_type_ontology_term_id'].unique()


array(['CL:0000151', 'CL:0000115', 'CL:0000499', 'CL:0000192',
       'CL:0000669', 'CL:0000623', 'CL:0000236', 'CL:0002138',
       'CL:0000235', 'CL:0000097', 'CL:0000067', 'CL:0000738',
       'CL:0000576', 'CL:0000451', 'CL:0000084', 'CL:4030006',
       'CL:0000057', 'CL:0000786', 'CL:0000003', 'CL:0002306',
       'CL:1001107', 'CL:1001431', 'CL:1001106', 'CL:1001432',
       'CL:1000500', 'CL:1000768', 'CL:1001111', 'CL:1000849',
       'CL:1000452', 'CL:0000653', 'CL:1000692', 'CL:1001318',
       'CL:0002319', 'CL:1000597', 'CL:1000449', 'CL:0000134',
       'CL:1000334', 'CL:0019032', 'CL:0002071', 'CL:0009039',
       'CL:0000677', 'CL:1000495', 'CL:0009042', 'CL:0009041',
       'CL:0009043', 'CL:0009017', 'CL:0002254', 'CL:0009012',
       'CL:0009011', 'CL:0011026', 'CL:0009006', 'CL:1000343',
       'CL:1000353', 'CL:4023040', 'CL:0002605', 'CL:4023051',
       'CL:4023070', 'CL:4023012', 'CL:4023013', 'CL:0000128',
       'CL:4023041', 'CL:4023017', 'CL:1001602', 'CL:40

In [4]:
cell_test.head()

Unnamed: 0,soma_joinid,dataset_id,assay,assay_ontology_term_id,cell_type,cell_type_ontology_term_id,development_stage,development_stage_ontology_term_id,disease,disease_ontology_term_id,...,is_primary_data,self_reported_ethnicity,self_reported_ethnicity_ontology_term_id,sex,sex_ontology_term_id,suspension_type,tissue,tissue_ontology_term_id,tissue_general,tissue_general_ontology_term_id
0,71428,90d4a63b-5c02-43eb-acde-c49345681601,10x 3' v3,EFO:0009922,secretory cell,CL:0000151,62-year-old human stage,HsapDv:0000156,normal,PATO:0000461,...,True,European,HANCESTRO:0005,female,PATO:0000383,cell,ampulla of uterine tube,UBERON:0012648,fallopian tube,UBERON:0003889
1,71429,90d4a63b-5c02-43eb-acde-c49345681601,10x 3' v3,EFO:0009922,endothelial cell,CL:0000115,62-year-old human stage,HsapDv:0000156,normal,PATO:0000461,...,True,European,HANCESTRO:0005,female,PATO:0000383,cell,ampulla of uterine tube,UBERON:0012648,fallopian tube,UBERON:0003889
2,71430,90d4a63b-5c02-43eb-acde-c49345681601,10x 3' v3,EFO:0009922,endothelial cell,CL:0000115,62-year-old human stage,HsapDv:0000156,normal,PATO:0000461,...,True,European,HANCESTRO:0005,female,PATO:0000383,cell,ampulla of uterine tube,UBERON:0012648,fallopian tube,UBERON:0003889
3,71431,90d4a63b-5c02-43eb-acde-c49345681601,10x 3' v3,EFO:0009922,stromal cell,CL:0000499,62-year-old human stage,HsapDv:0000156,normal,PATO:0000461,...,True,European,HANCESTRO:0005,female,PATO:0000383,cell,ampulla of uterine tube,UBERON:0012648,fallopian tube,UBERON:0003889
4,71432,90d4a63b-5c02-43eb-acde-c49345681601,10x 3' v3,EFO:0009922,smooth muscle cell,CL:0000192,62-year-old human stage,HsapDv:0000156,normal,PATO:0000461,...,True,European,HANCESTRO:0005,female,PATO:0000383,cell,ampulla of uterine tube,UBERON:0012648,fallopian tube,UBERON:0003889


In [5]:
help(cell_census.get_anndata)

Help on function get_anndata in module cell_census.get_anndata:

get_anndata(census: tiledbsoma.collection.Collection, organism: str, measurement_name: str = 'RNA', X_name: str = 'raw', obs_value_filter: Union[str, NoneType] = None, obs_coords: Union[NoneType, int, slice, Sequence[int], pyarrow.lib.Array, pyarrow.lib.ChunkedArray, numpy.ndarray[Any, numpy.dtype[numpy.integer]]] = None, var_value_filter: Union[str, NoneType] = None, var_coords: Union[NoneType, int, slice, Sequence[int], pyarrow.lib.Array, pyarrow.lib.ChunkedArray, numpy.ndarray[Any, numpy.dtype[numpy.integer]]] = None, column_names: Union[somacore.query.query.AxisColumnNames, NoneType] = None) -> anndata._core.anndata.AnnData
    Convience wrapper around soma.Experiment query, to build and execute a query,
    and return it as an AnnData object.
    
    [lifecycle: experimental]
    
    Parameters
    ----------
    census : soma.Collection
        The census object, usually returned by `cell_census.open_soma()`
    o

In [27]:
print(cell_types)

['T cell' 'monocyte' 'dendritic cell' 'alveolar macrophage'
 'natural killer cell' 'B cell' 'mast cell' 'macrophage' 'plasma cell'
 'type II pneumocyte' 'endothelial cell'
 'epithelial cell of lower respiratory tract' 'smooth muscle cell'
 'fibroblast' 'type I pneumocyte' 'endothelial cell of lymphatic vessel'
 'ciliated cell' 'pericyte' 'enterocyte of epithelium of small intestine'
 'intestinal tuft cell' 'enterocyte of epithelium of large intestine'
 'colon goblet cell' 'gut absorptive cell' 'small intestine goblet cell'
 'enteroendocrine cell of colon' 'tuft cell of colon'
 'intestinal crypt stem cell of colon'
 'intestinal crypt stem cell of small intestine'
 'epithelial cell of small intestine'
 'transit amplifying cell of small intestine'
 'transit amplifying cell of colon' 'progenitor cell'
 'enteroendocrine cell of small intestine'
 'paneth cell of epithelium of small intestine'
 'microfold cell of epithelium of small intestine'
 'luminal epithelial cell of mammary gland' 'basa