Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring with MuData #26

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

Refactoring with MuData #26

wants to merge 8 commits into from

Conversation

emdann
Copy link
Owner

@emdann emdann commented Aug 22, 2022

The original cell-level AnnData object and the sample x nhoods AnnData are stored in a common MuData object with .mod cells and samples. Samples are in mdata["samples"].obs and nhoods are in mdata["samples"].var.

Notes

  • The make_nhoods function is left as is, the MuData object gets created by count_nhoods. The rationale is that here samples are not taken into consideration yet.
  • This MuData doesn't follow any of the possible options for axis as defined in MuData documentation, since neither .obs nor .var are shared between cells and samples

To do

  • Update documentation
  • Update vignette
  • Update README
  • ?

Copy link

@Zethson Zethson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

A few general comments:

  1. You should consider adding all doc related files to the gitignore. They pollute pull requests. A CI job should build the docs and with for example RTD integration you can also preview the docs before merging. This does not matter for our pertpy plans, but wanted to write it nevertheless.
  2. A couple of tools of pertpy have a "load" function that prepares the input for the subsequent tool. I could see us having a load function that returns a MuData object (by creating or appending) that is ready for all downstream Milopy steps.
    ->
milopy = pt.tl.Milopy()
mdata = milopy.load(adata)  # adds compositional adata object and returns mdata object
# alternative
mdata = milopy.load(mdata_old, rna="rna")  # where rna refers to the existing RNA modality in the AnnData object
  1. I didn't add this comment everywhere, but we should ensure that the naming is consistent with unwritten muon/MuData usage rules -> "rna" and "compositional" ("samples" might be too unspecified). Likely should even name it "milo_compositional", because tascCODA might add its own "compositional"

== 1, "nhood_kth_distance"].values
# adata.uns["sample_adata"] = sample_adata
# Make MuData object
milo_mdata = MuData({"cells": adata, "samples": sample_adata})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what happens if I already use a MuData object as input?
The new sample_adata should in this case be added to the existing MuData object.

Also think that the naming is a bit off. The objects inside the MuData object are usually named according to the modality -> adata should be "rna" and sample_adata should be "compositional" or so.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I would parameterize this a bit more to allow for more flexibility of the inputs (and potentially the naming of the newly added modality)


import matplotlib.pyplot as plt
import seaborn as sns


def plot_nhood_graph(
adata: AnnData,
milo_mdata: MuData,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to ensure here that we're using the RNA + compositional modalities. Need good checks for this and sane defaults.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation only uses milo_mdata["compositional"] for plotting. Do you mean we need to use both RNA + compositional modalities? @Zethson

@Zethson Zethson mentioned this pull request Aug 29, 2022
11 tasks
@Zethson
Copy link

Zethson commented Sep 26, 2022

@emdann

@xinyuejohn started with the integration of milopy into pertpy here: theislab/pertpy#165
He could also tackle my feedback that I posted here if it's fine with you? Then we don't need to duplicate the work and John and me can figure this out directly.

What do you think?

@emdann
Copy link
Owner Author

emdann commented Sep 26, 2022

Hi @Zethson
Apologies for the late react on this and thanks a lot for the feedback. I am a bit slumped with some urgent work at the moment so I am more than happy for @xinyuejohn to take over and implement these changes during integration!

@xinyuejohn
Copy link

Great!

A few general comments:

  1. You should consider adding all doc related files to the gitignore. They pollute pull requests. A CI job should build the docs and with for example RTD integration you can also preview the docs before merging. This does not matter for our pertpy plans, but wanted to write it nevertheless.
  2. A couple of tools of pertpy have a "load" function that prepares the input for the subsequent tool. I could see us having a load function that returns a MuData object (by creating or appending) that is ready for all downstream Milopy steps.
    ->
milopy = pt.tl.Milopy()
mdata = milopy.load(adata)  # adds compositional adata object and returns mdata object
# alternative
mdata = milopy.load(mdata_old, rna="rna")  # where rna refers to the existing RNA modality in the AnnData object
  1. I didn't add this comment everywhere, but we should ensure that the naming is consistent with unwritten muon/MuData usage rules -> "rna" and "compositional" ("samples" might be too unspecified). Likely should even name it "milo_compositional", because tascCODA might add its own "compositional"

As for your comment 2, the current workflow works like this:

  1. Use milo.make_nhoods(adata, ...) to get neighbourhoods, which saved in adata itself
  2. Use milo_mdata = milo.count_nhoods(adata, sample_col="sample") to get a MuData object.

Therefore, if I need to add a mdata = milo.load() function, it should substitute the original milo.count_nhoods function. What do you think? Do you think it follows the best practice? @Zethson

@Zethson
Copy link

Zethson commented Sep 29, 2022

Therefore, if I need to add a mdata = milo.load() function, it should substitute the original milo.count_nhoods function. What do you think? Do you think it follows the best practice? @Zethson

What I had in mind is that the load function only generates the actual object that we will use for downstream tasks and not that it does any computation already. It basically only sets up the MuData object.

Does it make sense?

@xinyuejohn
Copy link

Therefore, if I need to add a mdata = milo.load() function, it should substitute the original milo.count_nhoods function. What do you think? Do you think it follows the best practice? @Zethson

What I had in mind is that the load function only generates the actual object that we will use for downstream tasks and not that it does any computation already. It basically only sets up the MuData object.

Does it make sense?

Makes sense to me and it makes the whole pipeline more clear.
In this way, the MuData object the load function generates only includes one 'rna' modality. And all the downstream functions take this mudata as the input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants