# Generating a 4D-Communication Tensor from computed communication scores

After inferring communication scores for combinations of ligand-receptor and sender-receiver cell pairs, we can use that information to generate a 4D-Communication Tensor.

This tutorial will guide users to load dataframes saved from LIANA's output for each sample/context separately and use them to generate a 4D-Communication Tensor that could be later used with Tensor-cell2cell.

**First, import the necessary libraries**

In [1]:
import cell2cell as c2c
import liana as li

import numpy as np
import pandas as pd

## Directories

In [2]:
data_folder = '../../data/liana-outputs/'

In [3]:
output_folder = '../../data/tc2c-outputs/'
c2c.io.directories.create_directory(output_folder)

../../data/tc2c-outputs/ already exists.


## Load Data

Open the dataframe containing LIANA results for every sample/context. These results contain the communication scores of the combinations of ligand-receptor pairs and sender-receiver pairs.

In [4]:
liana_df = pd.read_csv(data_folder + 'LIANA_by_sample.csv')

In [5]:
liana_df['one_minus_magnitude_rank'] = liana_df['magnitude_rank'].apply(lambda x: 1.-x)

Once the dataframe is opened, it can be grouped by samples and converted to a dictionary containing individual dataframes for each sample/context.

In [6]:
data = dict(list(liana_df.groupby('sample_new')))

In [7]:
data.keys()

dict_keys(['HC1', 'HC2', 'HC3', 'M1', 'M2', 'M3', 'S1', 'S2', 'S3', 'S4', 'S5', 'S6'])

## Create 4D-Communication Tensor

### Specify the order of the samples/contexts

Here, we will specify an order of the samples/contexts given the condition they belong to (HC or *Control*, M or *Moderate COVID-19*, S or *Severe COVID-19*).

In [8]:
sorted_names = sorted(data.keys())

In [9]:
sorted_names

['HC1', 'HC2', 'HC3', 'M1', 'M2', 'M3', 'S1', 'S2', 'S3', 'S4', 'S5', 'S6']

## Generate tensor

To generate the tensor, we will need the dictionary storing the dataframes, and the names of columns containing the useful information across all dataframes.

To consider which elements to include (that is cells and ligand-receptor pairs) we can specify whether including only those present in all samples/contexts `how='inner'` or include even those that are not present across all samples/contexts `how='outer'`. We can also decide whether including cells that are not present in all samples/contexts while including only LR pairs that are present across all of them `how='outer_cells'`, or viceversa `how='outer_lrs'`.

To include cells and/or LR pairs that in some of the samples/contexts, we can also control the minimum fraction of samples/contexts that must include the cells and/or LR pairs. For example, if we have 16 samples/contexts, and we would like to include cells and/or LR pairs present in at least 8 samples/contexts, we can specify `outer_fraction=8./16.`. If `outer_fraction=0.0`, all cells and/or LR pairs present across the dataframes will be included.

In [10]:
# Use this once LIANA does not have the issue of duplicated rows

# li.multi.to_tensor_c2c(liana_res=liana_df,
#                        sample_key='sample_new', # Column name of the samples
#                        source_key='source', # Column name of the sender cells
#                        target_key='target', # Column name of the receiver cells
#                        ligand_key='ligand_complex', # Column name of the ligands
#                        receptor_key='receptor_complex', # Column name of the receptors
#                        score_key='magnitude_rank', # Column name of the communication scores
#                        non_expressed_fill=None, # Value to replace missing values with 
#                        how='outer', # What to include across all samples
#                        outer_fraction=0.5, # Fraction of samples as threshold to include cells and LR pairs.
#                        lr_sep='^', # How to separate ligand and receptor names to name LR pair
#                        context_order=sorted_names, # Order to store the contexts in the tensor
#                        sort_elements=True # Whether sorting element names of each tensor dim. Does not apply for context order if context_order is passed.
#                       )

In [11]:
tensor = c2c.tensor.dataframes_to_tensor(context_df_dict=data,
                                         sender_col='source', # Column name of the sender cells
                                         receiver_col='target', # Column name of the receiver cells
                                         ligand_col='ligand_complex', # Column name of the ligands
                                         receptor_col='receptor_complex', # Column name of the receptors
                                         score_col='one_minus_magnitude_rank', # Column name of the communication scores
                                         how='outer', # What to include across all samples
                                         outer_fraction=1/3., # Fraction of samples as threshold to include cells and LR pairs.
                                         lr_sep='^', # How to separate ligand and receptor names to name LR pair
                                         context_order=sorted_names, # Order to store the contexts in the tensor
                                         sort_elements=True # Whether sorting element names of each tensor dim. Does not apply for context order if context_order is passed.
                                        )

  0%|          | 0/12 [00:00<?, ?it/s]

## Evaluate some tensor properties

### Tensor shape
This indicates the number of elements in each tensor dimension: (Contexts, LR pairs, Sender cells, Receiver cells)

In [12]:
tensor.tensor.shape

(12, 1410, 10, 10)

### Missing values
This represents the fraction of values that are missing. In this case, missing values are combinations of contexts x LR pairs x Sender cells x Receiver cells that did not have a communication score or were missing in the dataframes.

In [13]:
tensor.missing_fraction()

0.8896790780141844

### Sparsity
This represents the fraction of values that are a real zero (excluding the missing values)

In [14]:
tensor.sparsity_fraction()

0.0693161938534279

### Fraction of excluded elements
This represents the fraction of values that are ignored (masked) in the analysis. In this case it coincides with the missing values because we did not generate a new `tensor.mask` to manually ignore specific values. Instead, it automatically excluded the missing values.

In [15]:
tensor.excluded_value_fraction() # Percentage of values in the tensor that are masked/missing

0.8896790780141844

## Prepare Tensor Metadata

To interpret analysis on the tensor, we can assign groups to each sample/context, and to every elements in the other dimensions (LR pairs and cells).

We can generate respective dictionaries manually or automatically from DBs.

**Default dict to return Unknown if major groups are not present for a given element**

In [16]:
from collections import defaultdict

element_dict = defaultdict(lambda: 'Unknown')

**Major groups of the samples/contexts**

Please note that this `context_dict` could be directly generated from the `adata` object in the [Notebook for Inferring the Communication Scores](./02-Infer-Communication-Scores.ipynb) by using the command:

```context_dict = adata.obs.set_index('sample_new')['condition'].sort_values().to_dict()```

In [17]:
context_dict = element_dict.copy()

context_dict.update({'HC1' : 'Control',
                     'HC2' : 'Control',
                     'HC3' : 'Control',
                     'M1' : 'Moderate COVID-19',
                     'M2' : 'Moderate COVID-19',
                     'M3' : 'Moderate COVID-19',
                     'S1' : 'Severe COVID-19',
                     'S2' : 'Severe COVID-19',
                     'S3' : 'Severe COVID-19',
                     'S4' : 'Severe COVID-19',
                     'S5' : 'Severe COVID-19',
                     'S6' : 'Severe COVID-19',
                    })

**Generate a list containing metadata for each tensor order/dimension - Later used for coloring factor plots**

In [18]:
meta_tf = c2c.tensor.generate_tensor_metadata(interaction_tensor=tensor,
                                              metadata_dicts=[context_dict, None, None, None],
                                              fill_with_order_elements=True
                                             )

If you want to color the elements of another dimension by major groups, just replace the corresponding `None` in `metadata_dicts=[context_dict, None, None, None]` by a dictionary whose keys are the element names of the dimension  and the values are the major groups.  For example, if you want to color LR pairs, you should create a dictionary whose keys are the names from `tensor.order_names[1]`, and put that new dictionary (e.g. `lr_dict`) in `metadata_dicts=[context_dict, lr_dict, None, None]`. For sender and receiver cells, the same could be done.

## Export Tensor

Here we will save the `tensor` as a pickle object with `cell2cell`, so we can use it later with other analyses.

In [19]:
c2c.io.export_variable_with_pickle(tensor, output_folder + '/BALF-Tensor.pkl')

../../data/tc2c-outputs//BALF-Tensor.pkl  was correctly saved.


## Export Tensor Metadata

In [20]:
c2c.io.export_variable_with_pickle(meta_tf, output_folder + '/BALF-Tensor-Metadata.pkl')

../../data/tc2c-outputs//BALF-Tensor-Metadata.pkl  was correctly saved.


**Make sure to use this pandas version to load the metadata in the future to avoid errors**

In [21]:
pd.__version__

'1.4.2'