Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scArches #9

Closed
scottgigante opened this issue Jul 20, 2020 · 3 comments
Closed

Add scArches #9

scottgigante opened this issue Jul 20, 2020 · 3 comments
Assignees

Comments

@scottgigante
Copy link
Collaborator

No description provided.

@scottgigante scottgigante added this to the CZI Webinar milestone Jul 20, 2020
@dburkhardt dburkhardt mentioned this issue Aug 31, 2020
9 tasks
@scottgigante scottgigante modified the milestones: CZI Webinar, Sept 7 2020 Aug 31, 2020
@scottgigante
Copy link
Collaborator Author

@M0hammadL Malte suggested adding scArches at a minimum to fill this out as three methods

@scottgigante scottgigante changed the title Add Label Projection SoTA methods Add scArches Sep 4, 2020
@M0hammadL
Copy link
Collaborator

@scottgigante I think scArches, will not be possible here I think, since the method does not perform classification by it self, it will align the query data to reference which we can use a simple knn classifier to carry labels from reference to the query. Therefore I am not sure whether it can be considered as a classification method or not!

@dburkhardt
Copy link
Member

I think that having scArches + kNN classifer would be a great baseline to have. Thumbing through the preprint, I think that these results are compelling:

Building upon the query-reference embedding, we investigated the transfer of cell-type labels from the reference dataset. We approached this classification problem by first training a simple kNN classifier on the latent space representation of the reference TS. Then each cell in the query TM was annotated using its closest neighbors in the reference dataset. Additionally, our classification pipeline provides an uncertainty score for each cell while reporting cells with more than 50 % uncertainty as unknown (see Methods). Our model transferred the labels from the reference atlas to the query atlas with ≈ 89% accuracy for all the tissues except tracheal cells (Figure 3d). Moreover, all misclassified cells and cells from the out-of-distribution tissue received high uncertainty scores (Figure 3e-f). Overall, the classification results across tissues indicated a robust prediction accuracy across most tissues (Figure 3g) while highlighting which cells were not mappable to the reference. The robust performance of a simple KNN classifier on the integrated latent space demonstrates that scArches can successfully merge large and complex query datasets into reference atlases.

I understand you would typically include some manual fine-tuning but I would love to see these results added to Open Problems

@dburkhardt dburkhardt modified the milestones: Sept 7 2020, October 13 - Community Call Sep 21, 2020
lazappi added a commit to michalk8/SingleCellOpenProblems that referenced this issue May 4, 2021
scottgigante-immunai referenced this issue in scottgigante-immunai/openproblems Jun 14, 2022
* tangram first

* tangram first

* tangram first

* tangram first

* flake8 + isort _destvi_utils

* tangram update; pancreas add string index

* tangram update; pancreas add string index

* tangram update; pancreas add string index; n_obs = 1000 in synth data

* tangram update; pancreas add string index; n_obs = 1000 in synth data

* new synth

* add tangram-sc to docker

* new synth approach

* new synth approach

* new synth approach

* new synth approach

* new synth approach

* new synth approach

* new synth approach

* new synth approach

* pancreas subset integer; comment pancreas dataset [skip actions]

* pancreas subset integer; comment pancreas dataset [skip actions]

* pancreas subset integer; comment pancreas dataset [skip actions]

* pancreas subset integer; comment pancreas dataset [skip actions]

* pancreas subset integer; comment pancreas dataset [skip actions]

* merge and split sc and st data

* merged anndata in methods

* merged anndata in methods

* fix destvi

* add code reference

* shorten

* Update openproblems/tasks/spatial_decomposition/_utils.py

Co-authored-by: Giovanni Palla <giov.pll@gmail.com>

* Update openproblems/tasks/spatial_decomposition/datasets/_sc_to_sp_utils.py [skip actions]

Co-authored-by: Giovanni Palla <giov.pll@gmail.com>

* Update openproblems/tasks/spatial_decomposition/datasets/_sc_to_sp_utils.py [skip actions]

Co-authored-by: Giovanni Palla <giov.pll@gmail.com>

* Update openproblems/tasks/spatial_decomposition/datasets/_sc_to_sp_utils.py  [skip actions]

Co-authored-by: Giovanni Palla <giov.pll@gmail.com>

* comment fix

* comment fix

* fix pancreas dataset

* update readme

* fix destvi genertaion

* fix sparse

* minor fix

* drop csr_matrix; fix double merge of anndata; update seurat v3

* updates

* fix test for sparse arrays

* test=False

* add geos to r-extras

* geos before r install

* add software-properties-common

* add python-software-properties

* add RUN before command

* rm geos from r-base

* fix merging of anndata by pinning higher version

* revert back anndata

* fix obs_names and pin anndata

* try to add swap

* reduce number of spatial spots

* remove swap

* reduce obs

* remove swap

* remove step in CI

* decrease dataset size

* remove sparse

* remove copy

* remove datasets

* remove datasets from init

* address scott comments

* skip all pancreas

* fix import

* remove destiv

* Merge `main` into `synthetic-data-generation` (#10)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: MalteDLuecken <m.d.luecken@gmail.com>
Co-authored-by: Scott Gigante <84813314+scottgigante-immunai@users.noreply.github.com>
Co-authored-by: SingleCellOpenProblems <singlecellopenproblems@protonmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Daniel Strobl <50872326+danielStrobl@users.noreply.github.com>

* change test

* fix from_cache

* pre-commit

* update data generation to remove inf

* change test

* check task

* resolve suggestions from scott

Co-authored-by: almaan <almaan@kth.se>
Co-authored-by: Giovanni Palla <giov.pll@gmail.com>
Co-authored-by: Scott Gigante <84813314+scottgigante-immunai@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: MalteDLuecken <m.d.luecken@gmail.com>
Co-authored-by: SingleCellOpenProblems <singlecellopenproblems@protonmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Daniel Strobl <50872326+danielStrobl@users.noreply.github.com>
scottgigante-immunai added a commit that referenced this issue Jul 21, 2022
* init spatial

* spatial decomposition

* init cleanup

* run precommit

* README update

* updated readme

* api update

* pre-commit

* readme update

* api stylefix

* pre-commit

* metrics update

* linting

* pre-commit

* black

* linting fix

* pre-commit

* cleanup

* cleanup

* linting

* linting

* task name change

* Rctd (#6)

Co-authored-by: almaan <almaan@kth.se>

* add stereoscope - nnls - nusvr - vanillanmf - nmfreg (#4)

Co-authored-by: Hirak Sarkar <hiraksarkar.cs@gmail.com>

* Seurat (#8)

Co-authored-by: almaan <almaan@kth.se>

* adding simulation (#7)

Co-authored-by: giovp <giov.pll@gmail.com>
Co-authored-by: almaan <almaan@kth.se>

* reorder requirements

* Update mse.py

* update R2 description

* review comments, populated __init__.py files for import

* update import statements

* fix label dataset

* Specify image

* fix random

* fix labels

* pre-commit

* add test=False

* add synth data from destVI

* remove logger

* return spatial reference in correct format

* specify cell type label

* add destVI simulation to datasets

* fix random

* fix nusvr

* fix stereoscope

* added destvi

* added destvi

* try fix data generation

* fix from previous delete

* add scvitools version

* Synthetic data generation (#9)

* tangram first

* tangram first

* tangram first

* tangram first

* flake8 + isort _destvi_utils

* tangram update; pancreas add string index

* tangram update; pancreas add string index

* tangram update; pancreas add string index; n_obs = 1000 in synth data

* tangram update; pancreas add string index; n_obs = 1000 in synth data

* new synth

* add tangram-sc to docker

* new synth approach

* new synth approach

* new synth approach

* new synth approach

* new synth approach

* new synth approach

* new synth approach

* new synth approach

* pancreas subset integer; comment pancreas dataset [skip actions]

* pancreas subset integer; comment pancreas dataset [skip actions]

* pancreas subset integer; comment pancreas dataset [skip actions]

* pancreas subset integer; comment pancreas dataset [skip actions]

* pancreas subset integer; comment pancreas dataset [skip actions]

* merge and split sc and st data

* merged anndata in methods

* merged anndata in methods

* fix destvi

* add code reference

* shorten

* Update openproblems/tasks/spatial_decomposition/_utils.py

Co-authored-by: Giovanni Palla <giov.pll@gmail.com>

* Update openproblems/tasks/spatial_decomposition/datasets/_sc_to_sp_utils.py [skip actions]

Co-authored-by: Giovanni Palla <giov.pll@gmail.com>

* Update openproblems/tasks/spatial_decomposition/datasets/_sc_to_sp_utils.py [skip actions]

Co-authored-by: Giovanni Palla <giov.pll@gmail.com>

* Update openproblems/tasks/spatial_decomposition/datasets/_sc_to_sp_utils.py  [skip actions]

Co-authored-by: Giovanni Palla <giov.pll@gmail.com>

* comment fix

* comment fix

* fix pancreas dataset

* update readme

* fix destvi genertaion

* fix sparse

* minor fix

* drop csr_matrix; fix double merge of anndata; update seurat v3

* updates

* fix test for sparse arrays

* test=False

* add geos to r-extras

* geos before r install

* add software-properties-common

* add python-software-properties

* add RUN before command

* rm geos from r-base

* fix merging of anndata by pinning higher version

* revert back anndata

* fix obs_names and pin anndata

* try to add swap

* reduce number of spatial spots

* remove swap

* reduce obs

* remove swap

* remove step in CI

* decrease dataset size

* remove sparse

* remove copy

* remove datasets

* remove datasets from init

* address scott comments

* skip all pancreas

* fix import

* remove destiv

* Merge `main` into `synthetic-data-generation` (#10)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: MalteDLuecken <m.d.luecken@gmail.com>
Co-authored-by: Scott Gigante <84813314+scottgigante-immunai@users.noreply.github.com>
Co-authored-by: SingleCellOpenProblems <singlecellopenproblems@protonmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Daniel Strobl <50872326+danielStrobl@users.noreply.github.com>

* change test

* fix from_cache

* pre-commit

* update data generation to remove inf

* change test

* check task

* resolve suggestions from scott

Co-authored-by: almaan <almaan@kth.se>
Co-authored-by: Giovanni Palla <giov.pll@gmail.com>
Co-authored-by: Scott Gigante <84813314+scottgigante-immunai@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: MalteDLuecken <m.d.luecken@gmail.com>
Co-authored-by: SingleCellOpenProblems <singlecellopenproblems@protonmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Daniel Strobl <50872326+danielStrobl@users.noreply.github.com>

* Remove duplicate line

* Fill in baseline decorator

* Check R version of seurat

* pre-commit

* Remove reference to __from_cache__

* Clean up proportions assert

* pre-commit

* set merge='unique' to retain uns

* fixes from scott comments

* Fix code_version for new API

* Set `max_epochs` on `test`

* pre-commit

* Temporarily remove destvi

* Add dataset metadata fields

* Add task summary

* Temporarily remove steroscope

* pre-commit

* Fix typo

* Copy `uns`

* fix uns_merge to include _from_cache

* convert NaNs in categorical dtypes

* convert string dtypes

* bump tangram-sc

* fix string dtypes

* convert strings to categoricals inside pancreas

* change api label to str

* obsm cannot be pd.DataFrame

* revert anndata change

* fix rctd

* fix R2

* fix lots of things

* fix nmfreg and sample method

* address scott comments

* add metadata attribute decorator

* Update r_requirements.txt

* Handle comments in `r_requirements.txt`

* Rename spacexr

* Move API below metrics

* Rename NNLS

* Fix RCTD code URL

* Set n_pcs in RCTD python call

* Revert 2077c35

* Set n_pca in seuratv3.py

* use `n_pcs` in seuratv3.R

* Split string rather than skipping QA

* Shorten line lengths

* shorten line lengths

* Clean up comment

* Delete pbmc3k_raw.h5ad

* Rename R2.py to r2.py

* Fix reference to r2.py

* pre-commit

* Rename sc_to_sp.py to pancreas.py

* Rename _sc_and_sp_utils.py to utils.py

* rename _utils.py to utils.py

* pre-commit

* import all pancreas datasets

* fix typo

* fix namespace clash

* need to pass test arg

* fix method name (0_1 -> 0_5)

* check tower auth explicitly

* filter genes and cells

* filter_genes_cells is in-place

* remaining todos from scott

* add destvi dataset

* delete scvi models and dataset

* fix shell string

* one more syntactic fix

* Add tangram to readme

* Specify cell types in description

* pre-commit

* Better dataset descriptors

* Clean up

* Split don't skip

* handle random_state

* Fix doi URL

* Move import inside

* Shorten line lengths

* Remove commented imports

* Shorten descriptors

* Fix seuratv3 URL

* Remove unused projection_type arg

* Remove unused toarray

* Remove unused toarray

* Update vanillanmf.py

* Remove unused DataFrame handler

* Remove unused categorical handler

* Remove unused pandas import

* update nmfreg

* fix nmfreg

* fix vanilla

* fix nmf

* fix alpha

* rctd

* pre-commit

* add dataset_reference

* shorten line lengths

* document PYTEST_MAX_RETRIES

* Allow 429 too many requests

Co-authored-by: almaan <almaan@kth.se>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Alma Andersson <kangarooblood@gmail.com>
Co-authored-by: Hirak Sarkar <hiraksarkar.cs@gmail.com>
Co-authored-by: Daniel Burkhardt <burkhardt.d.b@gmail.com>
Co-authored-by: Scott Gigante <84813314+scottgigante-immunai@users.noreply.github.com>
Co-authored-by: MalteDLuecken <m.d.luecken@gmail.com>
Co-authored-by: SingleCellOpenProblems <singlecellopenproblems@protonmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Daniel Strobl <50872326+danielStrobl@users.noreply.github.com>
Co-authored-by: Scott Gigante <scott.gigante@immunai.com>
rcannood pushed a commit that referenced this issue Sep 4, 2024
…hods-and-metrics

Feat/label projection methods and metrics

Former-commit-id: 44d0805
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants