Refactoring with MuData #26

emdann · 2022-08-22T10:48:31Z

The original cell-level AnnData object and the sample x nhoods AnnData are stored in a common MuData object with .mod cells and samples. Samples are in mdata["samples"].obs and nhoods are in mdata["samples"].var.

Notes

The make_nhoods function is left as is, the MuData object gets created by count_nhoods. The rationale is that here samples are not taken into consideration yet.
This MuData doesn't follow any of the possible options for axis as defined in MuData documentation, since neither .obs nor .var are shared between cells and samples

To do

Update documentation
Update vignette
Update README
?

Zethson

Great!

A few general comments:

You should consider adding all doc related files to the gitignore. They pollute pull requests. A CI job should build the docs and with for example RTD integration you can also preview the docs before merging. This does not matter for our pertpy plans, but wanted to write it nevertheless.
A couple of tools of pertpy have a "load" function that prepares the input for the subsequent tool. I could see us having a load function that returns a MuData object (by creating or appending) that is ready for all downstream Milopy steps.
->

milopy = pt.tl.Milopy()
mdata = milopy.load(adata)  # adds compositional adata object and returns mdata object
# alternative
mdata = milopy.load(mdata_old, rna="rna")  # where rna refers to the existing RNA modality in the AnnData object

I didn't add this comment everywhere, but we should ensure that the naming is consistent with unwritten muon/MuData usage rules -> "rna" and "compositional" ("samples" might be too unspecified). Likely should even name it "milo_compositional", because tascCODA might add its own "compositional"

Zethson · 2022-08-29T11:48:10Z

milopy/core.py

+                                                     == 1, "nhood_kth_distance"].values
+    # adata.uns["sample_adata"] = sample_adata
+    # Make MuData object
+    milo_mdata = MuData({"cells": adata, "samples": sample_adata})


So what happens if I already use a MuData object as input?
The new sample_adata should in this case be added to the existing MuData object.

Also think that the naming is a bit off. The objects inside the MuData object are usually named according to the modality -> adata should be "rna" and sample_adata should be "compositional" or so.

Generally, I would parameterize this a bit more to allow for more flexibility of the inputs (and potentially the naming of the newly added modality)

Zethson · 2022-08-29T11:57:41Z

milopy/plot.py


 import matplotlib.pyplot as plt
 import seaborn as sns


 def plot_nhood_graph(
-    adata: AnnData,
+    milo_mdata: MuData,


We need to ensure here that we're using the RNA + compositional modalities. Need good checks for this and sane defaults.

The current implementation only uses milo_mdata["compositional"] for plotting. Do you mean we need to use both RNA + compositional modalities? @Zethson

Zethson · 2022-09-26T08:03:24Z

@emdann

@xinyuejohn started with the integration of milopy into pertpy here: theislab/pertpy#165
He could also tackle my feedback that I posted here if it's fine with you? Then we don't need to duplicate the work and John and me can figure this out directly.

What do you think?

emdann · 2022-09-26T21:31:04Z

Hi @Zethson
Apologies for the late react on this and thanks a lot for the feedback. I am a bit slumped with some urgent work at the moment so I am more than happy for @xinyuejohn to take over and implement these changes during integration!

xinyuejohn · 2022-09-29T11:10:55Z

Great!

A few general comments:

You should consider adding all doc related files to the gitignore. They pollute pull requests. A CI job should build the docs and with for example RTD integration you can also preview the docs before merging. This does not matter for our pertpy plans, but wanted to write it nevertheless.

A couple of tools of pertpy have a "load" function that prepares the input for the subsequent tool. I could see us having a load function that returns a MuData object (by creating or appending) that is ready for all downstream Milopy steps.
->
milopy = pt.tl.Milopy()
mdata = milopy.load(adata)  # adds compositional adata object and returns mdata object
# alternative
mdata = milopy.load(mdata_old, rna="rna")  # where rna refers to the existing RNA modality in the AnnData object
I didn't add this comment everywhere, but we should ensure that the naming is consistent with unwritten muon/MuData usage rules -> "rna" and "compositional" ("samples" might be too unspecified). Likely should even name it "milo_compositional", because tascCODA might add its own "compositional"

As for your comment 2, the current workflow works like this:

Use milo.make_nhoods(adata, ...) to get neighbourhoods, which saved in adata itself
Use milo_mdata = milo.count_nhoods(adata, sample_col="sample") to get a MuData object.

Therefore, if I need to add a mdata = milo.load() function, it should substitute the original milo.count_nhoods function. What do you think? Do you think it follows the best practice? @Zethson

Zethson · 2022-09-29T11:18:00Z

Therefore, if I need to add a mdata = milo.load() function, it should substitute the original milo.count_nhoods function. What do you think? Do you think it follows the best practice? @Zethson

What I had in mind is that the load function only generates the actual object that we will use for downstream tasks and not that it does any computation already. It basically only sets up the MuData object.

Does it make sense?

xinyuejohn · 2022-09-29T11:28:17Z

Therefore, if I need to add a mdata = milo.load() function, it should substitute the original milo.count_nhoods function. What do you think? Do you think it follows the best practice? @Zethson

What I had in mind is that the load function only generates the actual object that we will use for downstream tasks and not that it does any computation already. It basically only sets up the MuData object.

Does it make sense?

Makes sense to me and it makes the whole pipeline more clear.
In this way, the MuData object the load function generates only includes one 'rna' modality. And all the downstream functions take this mudata as the input.

emdann added 8 commits August 22, 2022 11:36

MVP with mudata objects

74e8829

refactored plotting functions

565d375

removed tests for I/O

6b4d23b

add leiden to requirements

1ef75bd

updated quick example on README

e4241e1

updated documentation

19463c9

updated documentation for pl and utils

eecfeb7

build docs

9a8d231

Zethson reviewed Aug 29, 2022

View reviewed changes

Zethson mentioned this pull request Aug 29, 2022

Integrate Milopy theislab/pertpy#164

Closed

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring with MuData #26

Refactoring with MuData #26

emdann commented Aug 22, 2022 •

edited

Zethson left a comment

Zethson Aug 29, 2022

Zethson Aug 29, 2022

Zethson Aug 29, 2022

xinyuejohn Sep 29, 2022 •

edited

Zethson commented Sep 26, 2022

emdann commented Sep 26, 2022

xinyuejohn commented Sep 29, 2022

Zethson commented Sep 29, 2022

xinyuejohn commented Sep 29, 2022

Refactoring with MuData #26

Are you sure you want to change the base?

Refactoring with MuData #26

Conversation

emdann commented Aug 22, 2022 • edited

Zethson left a comment

Choose a reason for hiding this comment

Zethson Aug 29, 2022

Choose a reason for hiding this comment

Zethson Aug 29, 2022

Choose a reason for hiding this comment

Zethson Aug 29, 2022

Choose a reason for hiding this comment

xinyuejohn Sep 29, 2022 • edited

Choose a reason for hiding this comment

Zethson commented Sep 26, 2022

emdann commented Sep 26, 2022

xinyuejohn commented Sep 29, 2022

Zethson commented Sep 29, 2022

xinyuejohn commented Sep 29, 2022

emdann commented Aug 22, 2022 •

edited

xinyuejohn Sep 29, 2022 •

edited