# MuData nuances

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/scverse/mudata/blob/master/docs/source/notebooks/nuances.ipynb)

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/scverse/mudata/master?labpath=docs%2Fsource%2Fnotebooks%2Fnuances.ipynb)

This is *the sharp bits* page for ``mudata``, which provides information on the nuances when working with ``MuData`` objects.

First, install and import `mudata` and other libraries.

In [1]:
%pip install mudata




In [2]:
import mudata
from mudata import MuData, AnnData

In [3]:
mudata.set_options(pull_on_update=False)

<mudata._core.config.set_options at 0x148b0fa90>

In [4]:
import numpy as np
import pandas as pd

Prepare some simple [AnnData](https://anndata.readthedocs.io/) objects:

In [5]:
n, d1, d2, k = 1000, 100, 200, 10

np.random.seed(1)
z = np.random.normal(loc=np.arange(k), scale=np.arange(k)*2, size=(n,k))
w1 = np.random.normal(size=(d1,k))
w2 = np.random.normal(size=(d2,k))

mod1 = AnnData(X=np.dot(z, w1.T))
mod2 = AnnData(X=np.dot(z, w2.T))

## Variable names

> ***NB:** It is best to keep variable names unique across all the modalities. This will help to avoid ambiguity as well as performance of some functionality such as updating (see below).*

``MuData`` is designed with features (variables) being different in different modalities in mind. Hence their names should be unique and different between modalities. In other words, ``.var_names`` are checked for uniqueness across modalities.

This behaviour ensures all the functions are easy to reason about. For instance, if there is a ``var_name`` that is present in both modalities, what happens during plotting a joint embedding from ``.obsm`` coloured by this ``var_name`` is not strictly defined.

Nevertheless, ``MuData`` can accommodate modalities with duplicated ``.var_names``. For the typical workflows, we recommend renaming them manually or calling ``.var_names_make_unique()``.

In [6]:
mdata = MuData({"mod1": mod1, "mod2": mod2})
print(mdata.var_names)
mdata.var_names_make_unique()
print(mdata.var_names)

Index(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
       ...
       '190', '191', '192', '193', '194', '195', '196', '197', '198', '199'],
      dtype='object', length=300)
Index(['mod1:0', 'mod1:1', 'mod1:2', 'mod1:3', 'mod1:4', 'mod1:5', 'mod1:6',
       'mod1:7', 'mod1:8', 'mod1:9',
       ...
       'mod2:190', 'mod2:191', 'mod2:192', 'mod2:193', 'mod2:194', 'mod2:195',
       'mod2:196', 'mod2:197', 'mod2:198', 'mod2:199'],
      dtype='object', length=300)




### Variable names in AnnData objects

In the example above it is worth pointing out that `.var_names_make_unique()` is an in-place operation, just as [the same method](https://anndata.readthedocs.io/en/stable/anndata.AnnData.var_names_make_unique.html) is in `anndata`.

Hence original AnnData objects' `.var_names` have also been modified:

In [7]:
mdata["mod1"].var_names[:10]

Index(['mod1:0', 'mod1:1', 'mod1:2', 'mod1:3', 'mod1:4', 'mod1:5', 'mod1:6',
       'mod1:7', 'mod1:8', 'mod1:9'],
      dtype='object')

## Update

> ***NB:** If individual modalities are changed, updating the MuData object containing it might be required.*

Modalities in ``MuData`` objects are full-featured ``AnnData`` objects. Hence they can be operated individually, and their ``MuData`` parent will have to be updated to fetch this information.

>**NB:** Starting from `v0.3`, `mudata` will be adopting a more flexible approach to metadata management: updating global index with `.update()` will become independent from managing columns, which can now be done with `.pull_obs()`/`.pull_var()` and `.push_obs()`/`.push_var()`.

See more about annotations management in [in the respective tutorial](./annotations_management.ipynb).

### Filtering data

In some cases some observations (or variables) can be dropped from all the contained modalities:

In [8]:
mdata.obs["dummy_index"] = range(len(mdata))

smaller_mdata = mdata.copy()

smaller_mdata.mod['mod1'] = mod1[:900]
smaller_mdata.mod['mod2'] = mod2[:900]

While `smaller_mdata` now includes modalities with fewer observations, it currently does not know about this change:

In [9]:
smaller_mdata

In [10]:
print(max(smaller_mdata.obs['dummy_index']))

999


`.update()` method will fetch these updates:

In [11]:
smaller_mdata.update()
smaller_mdata

In [12]:
print(max(smaller_mdata.obs['dummy_index']))

899


Notice the global dimensions are now correctly reflected in the `MuData` object.

### Observations annotations

Consider the following example: a new column has been added to a modality-specific metadata table:

In [13]:
mod1.obs["mod1_profiled"] = True

While `mdata` includes `mod1` as its first modality, nothing has changed at the global level of the annotation:

In [14]:
mdata.obs.columns

Index(['dummy_index'], dtype='object')

`.update()` method will only sync the `obs_names`:

In [15]:
# default from v0.4
mudata.set_options(pull_on_update=False)

<mudata._core.config.set_options at 0x1491e95d0>

In [16]:
mdata.update()
print(mdata.obs.columns)

Index(['dummy_index'], dtype='object')


If we need the annotation at the global level, we can copy it from the all the underlying modalities:

In [17]:
mdata.pull_obs()
print(mdata.obs.columns)

Index(['dummy_index', 'mod1:mod1_profiled'], dtype='object')


In [18]:
del mdata.obs["mod1:mod1_profiled"]

As `MuData` objects are designed with shared observations by default, this annotation is automatically prefixed by the modality that originated this annotation.

There is however flexibility when it comes to using prefixes for observations annotations that are specific to individual modalities:

In [19]:
mdata.pull_obs(prefix_unique=False)
print(mdata.obs.columns)

Index(['dummy_index', 'mod1_profiled'], dtype='object')


### Variables

On the other hand, for variables, the default consideration is that they are unique to their modalities. This allows us to merge annotations across modalities, when possible.

In [20]:
mod1.var["assay"] = "A"
mod2.var["assay"] = "B"

# Will fetch these values
mdata.pull_var()

In [21]:
np.random.seed(10)
mdata.var.sample(5)

Unnamed: 0,assay
mod1:24,A
mod1:65,A
mod2:13,B
mod2:161,B
mod2:88,B


See how e.g. ``muon`` operates with ``MuData`` objects and enables access to modality-specific slots beyond just metadata [in the tutorials](https://muon-tutorials.readthedocs.io/en/latest/).