Skip to content

Commit

Permalink
cellxgene documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
lauraluebbert committed May 2, 2023
1 parent 67c2a15 commit 39de9bf
Show file tree
Hide file tree
Showing 4 changed files with 204 additions and 1 deletion.
2 changes: 2 additions & 0 deletions docs/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
- [gget archs4](archs4.md)
- [gget blast](blast.md)
- [gget blat](blat.md)
- [gget cellxgene](cellxgene.md)
- [gget enrichr](enrichr.md)
- [gget gpt](gpt.md)
- [gget info](info.md)
Expand All @@ -25,4 +26,5 @@

---

[Contributing Guide](contributing.md)
[Terms of Use](cite.md)
128 changes: 128 additions & 0 deletions docs/src/cellxgene.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
> Python arguments are equivalent to long-option arguments (`--arg`), unless otherwise specified. Flags are True/False arguments in Python. The manual for any gget tool can be called from the command-line using the `-h` `--help` flag.
## gget cellxgene 🍱
Query data from [CZ CELLxGENE Discover](https://cellxgene.cziscience.com/) using the [CZ CELLxGENE Discover Census](https://github.com/chanzuckerberg/cellxgene-census).

**General optional arguments**
`-s` `--species`
Choice of 'homo_sapiens' or 'mus_musculus'. Default: 'homo_sapiens'.

`-g` `--gene`
Str or list of gene name(s) or Ensembl ID(s). Default: None.
NOTE: Use `-e / --ensembl` (Python: `ensembl=True`) when providing Ensembl ID(s) instead of gene name(s).
See https://cellxgene.cziscience.com/gene-expression for examples of available genes.

`-cn` `--column_names`
List of metadata columns to return (stored in AnnData.obs).
Default: ['dataset_id', 'assay', 'suspension_type', 'sex', 'tissue_general', 'tissue', 'cell_type']
For more options see: https://api.cellxgene.cziscience.com/curation/ui/#/ -> Schemas -> dataset

`-o` `--out`
Path to file to save generated AnnData .h5ad file (or .csv with `-mo / --meta_only` (`anndata=False`)).
Required when using from command line!

**General flags**
`-e` `--ensembl`
Use when genes are provided as Ensembl IDs instead of gene names.

`-mo` `--meta_only`
Command line only! Only returns metadata dataframe (corresponds to AnnData.obs).
Python: Use `anndata=False`.

`-q` `--quiet`
Command-line only. Prevents progress information from being displayed.
Python: Use `verbose=False` to prevent progress information from being displayed.

**Optional arguments corresponding to CZ CELLxGENE Discover metadata attributes**
`--tissue`
Str or list of tissue(s), e.g. ['lung', 'blood']. Default: None.
See https://cellxgene.cziscience.com/gene-expression for examples of available tissues.

`--cell_type`
Str or list of celltype(s), e.g. ['mucus secreting cell', 'neuroendocrine cell']. Default: None.
See https://cellxgene.cziscience.com/gene-expression and select a tissue to see examples of available celltypes.

`--development_stage`
Str or list of development stage(s). Default: None.

`--disease`
Str or list of disease(s). Default: None.

`--sex`
Str or list of sex(es), e.g. 'female'. Default: None.

`--dataset_id`
Str or list of CELLxGENE dataset ID(s). Default: None.

`--tissue_general_ontology_term_id`
Str or list of high-level tissue UBERON ID(s). Default: None.
Tissue labels and their corresponding UBERON IDs are listed [here](https://github.com/chanzuckerberg/single-cell-data-portal/blob/9b94ccb0a2e0a8f6182b213aa4852c491f6f6aff/backend/wmg/data/tissue_mapper.py).

`--tissue_general`
Str or list of high-level tissue label(s). Default: None.
Tissue labels and their corresponding UBERON IDs are listed [here](https://github.com/chanzuckerberg/single-cell-data-portal/blob/9b94ccb0a2e0a8f6182b213aa4852c491f6f6aff/backend/wmg/data/tissue_mapper.py).

`--tissue_ontology_term_id`
Str or list of tissue ontology term ID(s) as defined in the CELLxGENE dataset schema. Default: None.

`--assay_ontology_term_id`
Str or list of assay ontology term ID(s) as defined in the CELLxGENE dataset schema. Default: None.

`--assay`
Str or list of assay(s) as defined in the CELLxGENE dataset schema. Default: None.

`--cell_type_ontology_term_id`
Str or list of celltype ontology term ID(s) as defined in the CELLxGENE dataset schema. Default: None.

`--development_stage_ontology_term_id`
Str or list of development stage ontology term ID(s) as defined in the CELLxGENE dataset schema. Default: None.

`--disease_ontology_term_id`
Str or list of disease ontology term ID(s) as defined in the CELLxGENE dataset schema. Default: None.

`--donor_id`
Str or list of donor ID(s) as defined in the CELLxGENE dataset schema. Default: None.

`--self_reported_ethnicity_ontology_term_id`
Str or list of self reported ethnicity ontology ID(s) as defined in the CELLxGENE dataset schema. Default: None.

`--self_reported_ethnicity`
Str or list of self reported ethnicity as defined in the CELLxGENE dataset schema. Default: None.

`--sex_ontology_term_id`
Str or list of sex ontology ID(s) as defined in the CELLxGENE dataset schema. Default: None.

`--suspension_type`
Str or list of suspension type(s) as defined in the CELLxGENE dataset schema. Default: None.


### Examples
```bash
gget cellxgene --gene ACE2 ABCA1 SLC5A1 --tissue lung --cell_type 'mucus secreting cell' 'neuroendocrine cell' -o example_adata.h5ad
```
```python
# Python
adata = gget.cellxgene(
gene = ["ACE2", "ABCA1", "SLC5A1"],
tissue = "lung",
cell_type = ["mucus secreting cell", "neuroendocrine cell"]
)
adata
```
→ Returns an AnnData object containing the scRNAseq ACE2, ABCA1, and SLC5A1 count matrix of 3322 human lung mucus secreting and neuroendocrine cells from CZ CELLxGENE Discover and their corresponding metadata.

Fetch metadata (corresponds to AnnData.obs) only:
```bash
gget cellxgene --meta_only --gene ENSMUSG00000015405 --ensembl --tissue lung --species mus_musculus -o example_meta.csv
```
```python
# Python
df = gget.cellxgene(
anndata = False,
gene = "ENSMUSG00000015405",
ensembl = True,
tissue = "lung",
species = "mus_musculus"
)
df
```
→ Returns only the metadata from ENSMUSG00000015405 (ACE2) expression datasets corresponding to mouse lung cells.
73 changes: 73 additions & 0 deletions docs/src/contributing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Welcome to gget's contributing guide <!-- omit in toc -->

Thank you for investing your time in contributing to our project! Any contribution you make will be reflected on the [gget repo](https://github.com/pachterlab/gget). ✨

Read our [Code of Conduct](./CODE_OF_CONDUCT.md) to keep our community approachable and respectable.

In this guide you will get an overview of the contribution workflow from opening an issue, creating a PR, reviewing, and merging the PR.

## Getting started

### Issues

#### Create a new issue

If you spot a problem with gget or you have an idea for a new feature, [check if an issue already exists](https://github.com/pachterlab/gget/issues). If a related issue doesn't exist, you can open a new issue using the relevant [issue form](https://github.com/pachterlab/gget/issues/new/choose).

#### Solve an issue

Scan through our [existing issues](https://github.com/pachterlab/gget/issues) to find one that interests you. You can narrow down the search using `labels` as filters. If you find an issue to work on, you are welcome to open a PR with a fix.

### Make Changes

### Getting started

1. Fork the repository.
- Using GitHub Desktop:
- [Getting started with GitHub Desktop](https://docs.github.com/en/desktop/installing-and-configuring-github-desktop/getting-started-with-github-desktop) will guide you through setting up Desktop.
- Once Desktop is set up, you can use it to [fork the repo](https://docs.github.com/en/desktop/contributing-and-collaborating-using-github-desktop/cloning-and-forking-repositories-from-github-desktop)!

- Using the command line:
- [Fork the repo](https://docs.github.com/en/github/getting-started-with-github/fork-a-repo#fork-an-example-repository) so that you can make your changes without affecting the original project until you're ready to merge them.

2. Create a working branch and start with your changes!

### Commit your update

Commit the changes once you are happy with them.

### ‼️ Self-review the following before creating a Pull Request ‼️

1. Review the content for technical accuracy.
2. Copy-edit the changes/comments for grammar, spelling, and adherence to the general style of existing gget code.
3. Format your code using [black](https://black.readthedocs.io/en/stable/getting_started.html).
4. Make sure the unit tests pass:
- Developer dependencies can be installed with `pip install -r dev-requirements.txt`
- Run existing unit tests from the gget repository root with `coverage run -m pytest -ra -v tests && coverage report --omit=main.py,tests*`
5. Add new unit tests if applicable:
- Arguments and expected results are stored in json files in ./tests/fixtures/
- Unit tests can be added to ./tests/test_*.py and will be automatically detected
6. Make sure the edits are compatible with both the Python and the command line interface
- The command line interface and arguments are defined in ./gget/main.py
8. Add new modules/arguments to the documentation if applicable:
- The manual for each module can be edited/added as ./docs/src/*.md

If you have any questions, feel free to start a [discussion](https://github.com/pachterlab/gget/discussions) or create an issue as described above.

### Pull Request

When you're finished with the changes, [create a pull request](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request), also known as a PR.

‼️ Please make all PRs against the `dev` branch of the gget repository.

- Don't forget to [link PR to issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue) if you are solving one.
- Enable the checkbox to [allow maintainer edits](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/allowing-changes-to-a-pull-request-branch-created-from-a-fork) so the branch can be updated for a merge.
- If you run into any merge issues, checkout this [git tutorial](https://github.com/skills/resolve-merge-conflicts) to help you resolve merge conflicts and other issues.

Once you submit your PR, a gget team member will review your proposal. We may ask questions or request additional information.

### Your PR is merged!

Congratulations! 🎉 The gget team thanks you. ✨

Once your PR is merged, your contributions will be publicly visible on the [gget repo](https://github.com/pachterlab/gget).
2 changes: 1 addition & 1 deletion gget/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -1213,7 +1213,7 @@ def main():
],
help=(
"""
List of metadata columns to return (stored in .obs when anndata=True).
List of metadata columns to return (stored in .obs).
Default: ["dataset_id", "assay", "suspension_type", "sex", "tissue_general", "tissue", "cell_type"]
For more options see: https://api.cellxgene.cziscience.com/curation/ui/#/ -> Schemas -> dataset
"""
Expand Down

0 comments on commit 39de9bf

Please sign in to comment.