Skip to content

Batch integration data#355

Merged
danielStrobl merged 59 commits intoopenproblems-bio:mainfrom
danielStrobl:batch-integration-data
May 11, 2022
Merged

Batch integration data#355
danielStrobl merged 59 commits intoopenproblems-bio:mainfrom
danielStrobl:batch-integration-data

Conversation

@danielStrobl
Copy link
Copy Markdown
Collaborator

@danielStrobl danielStrobl commented Apr 13, 2022

Add immune cell dataset from Luecken et al., 2022 for batch integration task.

Submission type

  • This submission adds a new dataset
  • This submission adds a new method
  • This submission adds a new metric
  • This submission adds a new task
  • This submission adds a new Docker image
  • This submission fixes a bug (link to related issue: )
  • This submission adds a new feature not listed above

Testing

  • This submission was written on a forked copy of SingleCellOpenProblems
  • GitHub Actions "Run Benchmark" tests are passing on this base branch of this pull request (include link to passed test: https://github.com/danielStrobl/SingleCellOpenProblems/runs/6011178735?check_suite_focus=true )
  • If this pull request is not ready for review (including passing the "Run Benchmark" tests), I will open this PR as a draft (click on the down arrow next to the "Create Pull Request" button) still waiting on run benchmark

Submission guidelines

  • This submission follows the guidelines in our Contributing document
  • I have checked to ensure there aren't other open Pull Requests for the same update/change

PR review checklist

This PR will be evaluated on the basis of the following checks:

  • The task addresses a valid open problem in single-cell analysis
  • The latest version of master is merged and tested
  • The methods/metrics are imported to __init__.py and were tested in the pipeline
  • Method and metric decorators are annotated with paper title, year, author, code version, and date
  • The README gives an outline of the methods, metrics and datasets in the folder
  • The README provides a satisfactory task explanation (for new tasks)
  • The sample test data is appropriate to test implementation of all methods and metrics (for new tasks)

Copy link
Copy Markdown
Collaborator

@LuckyMD LuckyMD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments on the order of filtering in the immune data. Also, shouldn't this PR also include the task dataset functions from the batch integration graph task? In those, you should assign adata.X = adata.layers['log_scran'] again.

Comment thread .trunk/out Outdated
Comment thread openproblems/data/immune_cells.py Outdated
Comment thread openproblems/data/immune_cells.py
Comment thread openproblems/data/immune_cells.py Outdated
Comment thread openproblems/data/pancreas.py
@LuckyMD
Copy link
Copy Markdown
Collaborator

LuckyMD commented Apr 13, 2022

I guess the PR should also be tested...

@danielStrobl danielStrobl marked this pull request as ready for review April 13, 2022 15:37
@danielStrobl danielStrobl force-pushed the batch-integration-data branch from 2e569eb to 099aacf Compare April 20, 2022 10:05
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 20, 2022

Codecov Report

Merging #355 (ea1adef) into main (84c8287) will increase coverage by 0.37%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #355      +/-   ##
==========================================
+ Coverage   91.48%   91.86%   +0.37%     
==========================================
  Files          83       89       +6     
  Lines        1973     2064      +91     
  Branches      117      118       +1     
==========================================
+ Hits         1805     1896      +91     
  Misses        125      125              
  Partials       43       43              
Flag Coverage Δ
unittests 91.86% <100.00%> (+0.37%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
openproblems/data/immune_cells.py 100.00% <100.00%> (ø)
openproblems/data/pancreas.py 100.00% <100.00%> (ø)
.../_batch_integration/batch_integration_graph/api.py 100.00% <100.00%> (ø)
...gration/batch_integration_graph/datasets/immune.py 100.00% <100.00%> (ø)
...ation/batch_integration_graph/datasets/pancreas.py 100.00% <100.00%> (ø)
...tegration/batch_integration_graph/methods/bbknn.py 100.00% <100.00%> (ø)
...integration/batch_integration_graph/metrics/ari.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 84c8287...ea1adef. Read the comment docs.

@scottgigante-immunai
Copy link
Copy Markdown
Collaborator

So this looks like it's all succeeded? The tower.nf logs strangely show the nextflow pipeline is still running, despite everything being complete.

@danielStrobl
Copy link
Copy Markdown
Collaborator Author

That's weird, run_benchmark is still shown as running on my branch, and has been so for 26 hours... not sure what's up with that

@scottgigante-immunai
Copy link
Copy Markdown
Collaborator

I'm looking into it, I think it's a nextflow bug (that I might have just fixed.)

@danielStrobl
Copy link
Copy Markdown
Collaborator Author

The issue seems to persist, still running for 5 hours

@danielStrobl
Copy link
Copy Markdown
Collaborator Author

Screenshot 2022-05-05 at 16 32 47

@scottgigante-immunai
Copy link
Copy Markdown
Collaborator

scottgigante-immunai commented May 5, 2022 via email

@danielStrobl danielStrobl requested a review from LuckyMD May 10, 2022 14:59
Copy link
Copy Markdown
Collaborator

@LuckyMD LuckyMD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few questions about the documentation and the testing that matches it, otherwise this is ready to go :). Nice job!


The `openproblems-python-batch-integration` docker container is used for the methods that
can be installed without package conflicts. For R methods, the `openproblems-r-extras`
container is used.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add other docker containers here, or make a note to add them in the other, larger PR.

* `full_scaled`
* `hvg_scaled`

An example script can be found [here](methods/_example.py)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this example script meant to be?

def check_dataset(adata):
"""Check that dataset output fits expected API."""

assert "X_uni" in adata.obsm
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is old code and should now be a test for adata.layers['counts']?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, this is not stated in the README and should be added... it seems to be used as a representation below for the sample_method function.


def check_method(adata):
"""Check that method output fits expected API."""
assert "connectivities" in adata.obsp
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either check for "distances" as well, or remove this from the API description in the README.

* `adata.layers['counts']` with raw, integer UMI count data, and
* `adata.obsp['uni_connectivities']` with an unintegrated connectivity matrix generated
by `scanpy.pp.neighbors()`
* `adata.X` with log-normalized data
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to add adata.obsm['X_uni'] as the PCA embedding of the unintegrated representation here.

@danielStrobl danielStrobl merged commit afafab4 into openproblems-bio:main May 11, 2022
@scottgigante-immunai
Copy link
Copy Markdown
Collaborator

Woohoo!

@LuckyMD
Copy link
Copy Markdown
Collaborator

LuckyMD commented May 11, 2022

Nice :). And by the deadline :D

rcannood pushed a commit that referenced this pull request Sep 4, 2024
* Update with to accomodate multimodal data

* rename bash script
rcannood pushed a commit that referenced this pull request Sep 4, 2024
* Update with to accomodate multimodal data

* rename bash script

Former-commit-id: 3a811a7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants