# Reporting ViScore benchmark results

> **&copy; David Novak 2024**, see [LICENSE](https://github.com/saeyslab/ViScore/blob/main/LICENSE)

Once our [benchmark](https://github.com/saeyslab/ViScore/tree/main/benchmarking) is completed, we will want to report the results somehow.
This notebook will help you create tables and informative figures to that end.

We will need a Python environment with ViScore, its dependencies, `funkyheatmappy` and `adjustText`.
`funkyheatmappy` and `adjustText` are installed using the following command in shell/Anaconda Prompt:

```
pip install git+https://github.com/funkyheatmap/funkyheatmappy.git
pip install git+https://github.com/Phlya/adjustText.git
```

We assume that you followed instructions in `ViScore/benchmarking/README.md` for designing and running your benchmark.
In accordance with that, we assume that

* results of benchmark are stored in `./results`
* all datasets listed in `./datasets.txt` were used
* all methods listed in `./config.json` were used

In [1]:
import os, re, copy, pandas as pd, json, numpy as np, matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

You have an opportunity to adjust the colour palette per cell population to use in plotting embeddings directly and the marker symbols and colours used for each DR method in plotting structure-preservation scores.

In [2]:
palette_pops = [
    '#726ca6','#8ff56b','#79d0f9','#fba56a','#eefc85','#aeaead','#6e85ff','#b97671','#dbbafd','#6bb277',
    '#6af1b0','#b26ae7','#fb6c98','#fdc4b8','#c1c86c','#699dc0','#d889c1','#a89ef4','#95d598','#757469',
    '#78fefe','#f1f7c3','#b2ddfb','#cad9a3','#9b9f69','#aa7caa','#74c7c0','#face7e','#fe9cdb','#ce9f81',
    '#bafc85','#fdd5f1','#e97f6c','#8d89d7','#839095','#d68ef9','#a5d1ca','#d1fcfe','#6eaaee','#f799a1',
    '#d7b2c8','#70d87a','#99faa9','#6b70d4','#dbd8d2','#fb77c9','#88f4d2','#d17a98','#90b2d3','#aafef7',
    '#debc9d','#d2e96a','#96c96a','#8c6ff5','#927286','#7cff8e','#80b19e','#adbcfa','#d86fdd','#aee276',
    '#eee1a5','#feb6fe','#996dc9','#b699cc','#ad908d','#76946b','#d2fea8','#a7b883','#b881fe','#69e7da',
    '#92e8f5','#b5b6d4','#dadcfa','#bf6cbe','#9199b6','#70d79f','#6afd6e','#dcb26c','#d69fae','#b5eab1',
    '#fce96a','#6987aa','#8dadfc','#938afd','#c7ebe4','#de6a7f','#938669','#c4cce8','#e36daf','#e8f1e8',
    '#86e1b6','#ff6b69','#ed9ffc','#87d7d6','#feb58d','#b96a93','#dcd189','#adc9a7'
]
palette_methods = ['orange',      # PCA 
                   'teal',        # UMAP
                   'darkmagenta', # DensMAP
                   'darkblue',    # tSNE
                   'maroon',      # PHATE
                   '#404040',     # PaCMAP
                   'firebrick',   # TriMap
                   'darkkhaki',   # SQuad-MDS
                   'olivedrab',   # VAE
                   'plum',        # ivis
                   'darkorange',  # ViVAE
                   'darkred'      # ViVAE-EncoderOnly
]
markers_methods = ['o', # PCA
                   '^', # UMAP
                   'v', # DensMAP
                   's', # tSNE
                   'o', # PHATE
                   '^', # PaCMAP
                   'v', # TriMap
                   's', # SQuad-MDS
                   'o', # VAE
                   '^', # ivis
                   'v', # ViVAE
                   's'  # ViVAE-EncoderOnly
]

## **0.** Collecting results

We start by aggregating quantitative results.

* The `rnx`, `sl` and `sg` dictionaries will contain denoised and non-denoised RNX, Local SP and Global SP values for all method-dataset combinations.
* The `df_all` dataframe will contain all Local SP, Global SP and Balanced SP (using either geometric mean or harmonic mean) per method-dataset-denoising combination, for each run (random seed).
* The `df_avg` dataframe will contain mean and standard-deviation values, aggregated across runs.
* The `df_time` dataframe will contain mean standard-deviation values for running times required for training, aggregated across runs.

<hr>

The first step is to determine the datasets and methods used in this benchmark.
**If this is anything other than what is indicated in `./datasets.txt` and `./config.json`, you will need to adapt this manually.**

In [3]:
fname_datasets = './datasets.txt'
with open(fname_datasets, 'r') as f:
  datasets = [line.strip() for line in f.readlines()]
fname_config = './config.json'
with open(fname_config, encoding='utf-8') as f:
    conf = json.load(f)
methods = list(conf['methods'].keys())

Next, we need to specify:

* Which target dimensionality we are working with (`zdim`).

* How many repeated runs of each set-up we have (`nruns`).

* Whether we are working with results for denoised inputs. The `denoised` variable can be `False` (use results on non-denoised data), `True` (use results on denosied data) or `'ViVAE'` (only use denoised data results for ViVAE: this is what ViVAE was designed for).
**Crucially, this does not mean ViVAE is evaluated against denoised inputs (this would be an unfair comparison): it is evaluated the same way all other methods are.**

* Whether to use `'geometric_mean'` or `'harmonic_mean'`  for computing balanced (local-global) structure preservation. There is a case to be made for either, but the more comprehensive way to look at results is to plot both Local and Global SP in a biaxial plot (which we also do).

In [16]:
zdim = 2
nruns = 5
denoised = 'ViVAE'
balanced_measure = 'harmonic_mean'

We load some helper and plotting functions from an auxiliary script (they are mostly documented and should be easy to tweak if needed).

In [17]:
## Import
from aux import collect_dicts, collect_df_avg, collect_df_all, collect_df_times_avg, get_denoised_mask, prepare_denoising_data

## Plotting
from aux import plot_separate_sp, plot_sp_tradeoffs, fh, plot_funky_heatmap, plot_rnx_curves, plot_embeddings, plot_denoising_sp_change, plot_denoising_rnx_change

Let's collect the results.

In [18]:
rnx, sl, sg = collect_dicts(datasets, methods, zdim=zdim, nruns=nruns)
df_avg      = collect_df_avg(datasets, methods, nruns=nruns, zdim=zdim, balanced_measure=balanced_measure, wide=True)
df_all      = collect_df_all(datasets, methods, nruns=nruns, zdim=zdim, balanced_measure=balanced_measure, wide=True)
df_time     = collect_df_times_avg(datasets, methods, nruns=nruns, zdim=zdim, wide=True)

We create a 'report' directory where outputs will be saved (unless it already exists).
**If you already generated outputs for a previous benchmark using this notebook, they will eventually be overwritten.**

In [21]:
if not os.path.exists('./report'):
    os.mkdir('./report')

## **1.** Plotting structure-preservation values

We will plot the Local, Global and Balanced SP using scatterplots with errorbars for separate categories and a scatterplot showing the Local-Global trade-off.
By default, we take results for ViVAE run on de-noised inputs and results for other methods on original inputs.
This is because ViVAE was designed specifically to work with the de-noising, which is part of the algorithm.
However, it is fair to also the effects of de-noising on SP by other methods (this amounts to an ablation experiment), so we also do that below.

<hr>

First, the separate plotting of Local, Global and Balanced SP, using points with error bars (mean and standard deviation), separately also for each dataset.
This is not the easiest plot to look at, but we use it to show the standard deviations as indicators of stability.

The figure is exported as a PNG and SVG file: `report/01_sp_separate.[png|svg]`.

In [22]:
plot_separate_sp(datasets, methods, df_all, palette=palette_methods)
plt.close()

Second, we plot the trade-off/balance between Local and Global using a scatter plot.
We do this dataset by dataset, using the *x*-axis for Local SP, the *y*-axis for Global SP.

The Pareto front is indicated, so that the reader can easily check which methods offer a favourable trade-off between the two criteria.
(Note that this **does not** mean that any methods not on the Pareto front in your benchmark are not useful or summarily worse than the other methods!
There are many ways to evaluate a method, depending on the type of analysis we're doing.
Also, while using a decent number of datasets to evaluate a method on increases the informativeness of a benchmark, this is still an empirical evaluation that may give different results on different datasets.)

The figure is exported as a PNG file: `report/01_sp_tradeoffs.png`.

In [23]:
%%capture
plot_sp_tradeoffs(datasets, methods, df_avg, df_all, palette_methods, markers_methods)

## **2.** Plotting structure preservation in a heatmap

As an additional visualisation method, we use a [funky heatmap](https://funkyheatmap.github.io/funkyheatmap/) to plot Local, Global and Balanced SP.
This is not included in our paper, but for large benchmarks it is a nice way of plotting quantitative results.
In order for things to work correctly for us, we monkey-patched some functions in the `funkyheatmappy` module (see `aux.py`).

We define a `plot_funky_heatmap` function.
The `geom` argument determines the visualisation technique for all scores.
It can be set to `'bar'`, `'funkyrect'` or `'circle'`.

**The plotted values are min-max scaled for each of the 3 categories.
Otherwise, if `scale_column` is set to True, values are scaled per column.**

The figure is exported as a PNG and SVG file: `report/02_funky_heatmap.[png|svg]`.

In [24]:
%%capture
plot_funky_heatmap(df_avg, datasets, methods)

## **3.** Plotting $R_{NX}$ curves

The $R_{\mathrm{NX}}$ curve approximations (from which Local and Global SP are calculated) can be plotted directly for each dataset and method, and we can show the effect of de-noising as well.

The figure is exported as a PNG and SVG file: `report/03_rnx_curves.[png|svg]`.

In [25]:
%%capture
plot_rnx_curves(rnx, datasets, methods)

## **4.** Plotting labelled embeddings

We create a plot of embeddings of all datasets by all tested methods, with points coloured by labelled cell populations, and export it as a PNG file: `report/04_embeddings.png`.
Legends for the colour scheme will be saved separately for each dataset in `report/04_legends`.

SVG files are not generated here, because they might be huge (depending on sizes of embedded datasets).

In [27]:
%%capture
plot_embeddings(datasets, methods, palette_pops)

## **5.** Plotting effects of denoising

Our ViVAE pipeline includes nearest neighbour-based denoising ('smoothing') of inputs prior to training the model on them.
This typically results in local structures being better preserved in the embedding by ViVAE.
Our estimation is that we force ViVAE to model truly important structures and not get overwhelmed by spurious noise patterns.

However, in our study we are interested in what effect this denoising might have on other DR methods.
In particular, VAE-based methods often benefit from denoising.

To document this, we report the effects of denoising for all methods and datasets, as an ablation study.

We plot the difference in $R_{NX}$ curves for each method and dataset with and without denoising, and the Local and Global SP shift due to denoising.

**Only run code in this section if you tested denoising also.**

In [30]:
rnx_lims, l_vals0, l_vals1, l_diffs, g_vals0, g_vals1, g_diffs, l_diff_lims, g_diff_lims = prepare_denoising_data(datasets, methods, rnx, sl, sg)

In [31]:
%%capture
plot_denoising_rnx_change(datasets, methods, rnx)
plot_denoising_sp_change(datasets, methods, l_diffs, g_diffs, l_diff_lims, g_diff_lims)

## **6.** Creating tables with structure-preservation results

We generate a table with SP results in two formats: LaTeX and CSV.

<hr>

LaTeX is perhaps the best way to report numerical results in a report or paper.
**However, you might need to do some formatting tweaks to make it look nice.**

If the `highlight_best` argument is set to `True`, we put the best average score per dataset in each category in bold.

The LaTeX code is saved in a text file: `report/06_results_table.txt`.
This code can be used in a `.tex` file.
The `longtable` package, and perhaps some other ones, need to be loaded for the source file to compile.
The easiest way to compile is via [Overleaf](https://www.overleaf.com/).

In [32]:
highlight_best=True
nruns=5
label='tab:sp'

caption=f"""Mean and standard deviation values from {nruns} runs of each set-up are reported.
Entries with highest mean value per dataset are in bold."""

d = copy.deepcopy(df_avg)

if highlight_best:
    idcs_best_localsp = []
    idcs_best_globalsp = []
    idcs_best_balancedsp = []

for dataset in datasets:
    idcs = np.where(d['Dataset']==dataset)[0]
    idcs_best_localsp.append(idcs[np.argmax(d['LocalSP_Mean'][idcs])])
    idcs_best_globalsp.append(idcs[np.argmax(d['GlobalSP_Mean'][idcs])])
    idcs_best_balancedsp.append(idcs[np.argmax(d['BalancedSP_Mean'][idcs])])

d['Denoising'] = ['On' if x==True else 'Off' for x in d['Denoised']]
d['Local SP']  = [f'${np.round(d["LocalSP_Mean"][i], 3)} \\pm {np.round(d["LocalSP_SD"][i], 3)}$' for i in range(d.shape[0])]
d['Global SP'] = [f'${np.round(d["GlobalSP_Mean"][i], 3)} \\pm {np.round(d["GlobalSP_SD"][i], 3)}$' for i in range(d.shape[0])]
d['Balanced SP'] = [f'${np.round(d["BalancedSP_Mean"][i], 3)} \\pm {np.round(d["BalancedSP_SD"][i], 3)}$' for i in range(d.shape[0])]

d = d[['Dataset', 'Method', 'Denoising', 'Local SP', 'Global SP', 'Balanced SP']]

if highlight_best:
    for i in idcs_best_localsp:
        s = d['Local SP'][i]
        d['Local SP'][i] = re.sub('\$$', '}$', re.sub('^\$', r'$\\mathbf{', s))
    for i in idcs_best_globalsp:
        s = d['Global SP'][i]
        d['Global SP'][i] = re.sub('\$$', '}$', re.sub('^\$', r'$\\mathbf{', s))
    for i in idcs_best_balancedsp:
        s = d['Balanced SP'][i]
        d['Balanced SP'][i] = re.sub('\$$', '}$', re.sub('^\$', r'$\\mathbf{', s))

## Merge adjacent cells containing method names

for method in methods:
    idcs = np.where(d['Method']==method)[0]

    idcs_multirow = np.array([idcs[i] for i in range(len(idcs)) if np.mod(i, 2)==0])
    idcs_empty = np.array([idcs[i] for i in range(len(idcs)) if np.mod(i, 2)==1])

    d['Method'][idcs_multirow] = '\\multirow{2}{*}{'+method+'}'
    d['Method'][idcs_empty] = ''

## Merge adjacent cells containing dataset names

for dataset in datasets:
    idcs = np.where(d['Dataset']==dataset)[0]
    n = len(idcs)
    d['Dataset'][idcs[0]] = '\\multirow{'+str(n)+'}{*}{'+dataset+'}'
    for i in range(1, n):
        d['Dataset'][idcs[i]] = ''

## Make LaTeX code
d = d.set_index('Dataset', append=True).swaplevel(0, 1)
code = d.to_latex(index=True)

## Make table page-breakable
code = code.replace('begin{tabular}', 'begin{longtable}')
code = code.replace('end{tabular}', 'end{longtable}')

## Fix alignment of rows and colums
code = re.sub(pattern='} \& [0-9]+ \&', repl='} &', string=code)
code = re.sub(pattern='\\\\\n \& [0-9]+ \&', repl='\\\\\n &', string=code)
code = re.sub(pattern='\&  \& Method', repl='& Method', string=code)
code = code.replace('$ \\\\\n\\cline{1-7}\n\\multirow[t]', '$ \\\\\n\\multirow[t]')
code = code.replace('\n & Method & De-noising & Local SP & Global SP & Balanced SP \\\\\nDataset &', '\n Dataset & Method & De-noising & Local SP & Global SP & Balanced SP \\\\\n &')
code = code.replace('\\bottomrule\n', '')

## Add caption and label
code = code.replace('\\end{longtable}', '\\caption{'+caption+'}\n\\label{'+label+'}\n\\end{longtable}')

## Adjust font size
code = '{\n\\renewcommand{\\arraystretch}{0.45}\n\\tiny'+code+'}'

## Save as textfile
with open('./report/06_results_table.txt', 'w') as text_file:
    text_file.write(code)

Next, we create a CSV file with the same information.
CSV files are easy to read programatically, and can thus be used for making custom plots.
Additionally, if your results table is too large to fit on a single page of a manuscript, you will likely need to include a stand-alone CSV or Excel file anyway.

The CSV file is saved as `report/06_results_table.csv`.

In [34]:
df_avg.to_csv('./report/06_results_table.csv', sep=',')

## **7.** Creating tables with running times

We also create a LaTeX and CSV table that aggregates the running time of each set-up.

This generates `report/07_times_table.txt` and `report/07_times_table.csv`.

In [37]:
nruns=5
label='tab:times'

caption=f"""Mean and standard deviation values for running times from {nruns} runs of each set-up are reported."""

d = copy.deepcopy(df_time)

d['Denoising'] = ['On' if x==True else 'Off' for x in d['Denoised']]
d['RunningTime']  = [f'${np.round(d["RunningTime_Mean"][i], 1)}\\pm {np.round(d["RunningTime_SD"][i], 1)}$' for i in range(d.shape[0])]

d = d[['Dataset', 'Method', 'Denoising', 'RunningTime']]

## Merge adjacent cells containing method names
for method in methods:
    idcs = np.where(d['Method']==method)[0]

    idcs_multirow = np.array([idcs[i] for i in range(len(idcs)) if np.mod(i, 2)==0])
    idcs_empty = np.array([idcs[i] for i in range(len(idcs)) if np.mod(i, 2)==1])

    d['Method'][idcs_multirow] = '\\multirow{2}{*}{'+method+'}'
    d['Method'][idcs_empty] = ''

## Merge adjacent cells containing dataset names
for dataset in datasets:
    idcs = np.where(d['Dataset']==dataset)[0]
    n = len(idcs)
    d['Dataset'][idcs[0]] = '\\multirow{'+str(n)+'}{*}{'+dataset+'}'
    for i in range(1, n):
        d['Dataset'][idcs[i]] = ''

## Adjust headers
code = code.replace('Dataset & Method & Denoising & RunningTime', 'Dataset & Method & De-noising & Running time (seconds)')

## Make LaTeX code
code = d.to_latex(index=False)

## Make table page-breakable
code = code.replace('begin{tabular}', 'begin{longtable}')
code = code.replace('end{tabular}', 'end{longtable}')

## Adjust formatting
code = code.replace('\\\\\n\\multirow{'+str(len(methods)*2)+'}{*}', '\\\\\n\\cline{1-4}\n\\multirow{'+str(len(methods)*2)+'}{*}')

## Add caption and label
code = code.replace('\\end{longtable}', '\\caption{'+caption+'}\n\\label{'+label+'}\n\\end{longtable}')


## Adjust font size
code = '{\n\\renewcommand{\\arraystretch}{0.45}\n\\tiny'+code+'}'

## Save as textfile
with open('./report/07_times_table.txt', 'w') as text_file:
    text_file.write(code)

In [38]:
df_time.to_csv('./report/07_times_table.csv', sep=',')