# Multiverse Meta-Analysis

This notebook contains all code required for **Multiverse Meta-Analyses**, including the generation of specifications, bootstrap data, and visualizations.

## Imports

In [None]:
import numpy as np
import pandas as pd

from bootstrap import generate_boot_data
from config import read_config
from data import prepare_data
from plotting import (get_cluster_fill_data, get_spec_fill_data,
                      get_colors, plot_treemap, plot_multiverse,
                      plot_caterpillar, plot_sample_size, plot_cluster_size,
                      plot_spec_tiles, plot_cluster_tiles, plot_inferential,
                      plot_p_hist)
from specs import generate_specs
from user_data import preprocess_data

import plotly.io as pio
pio.renderers.default = "plotly_mimetype+notebook"

In [None]:
%load_ext autoreload
%autoreload 2

## Dashboard

The interactive Dashboard can be launched from this notebook.

In [None]:
%run -i "./dashboard.py"

## Constants

In this cell, set the **title**, the **working directory** and the **path to the dataset** for this analysis. The **config**, **preprocessed data**, **specs**, and **bootstrap data** paths depend on the working directory and the title. This naming convention can be changed, but the prefixes (i.e. `boot`, `config`, `data` and `specs`) are required for the **Dashboard** to work. The configuration file must exist, all other data can either be loaded or generated, using the boolean flags. The generated data will be stored at the specified paths, or loaded from that path.

In [None]:
# TITLE = "R2D4D_2"
# DIR = "../examples/R2D4D"
# DATA_PATH = f"{DIR}/R2D4D.csv"

# TITLE = "Chernobyl_3"
# DIR = "../examples/Chernobyl"
# DATA_PATH = f"{DIR}/Chernobyl.rda"

TITLE = "IandR_3"
DIR = "../examples/IandR"
DATA_PATH = f"{DIR}/iandr.sav"

PREPROCESS_DATA = True # Load of preprocess data
GENERATE_SPECS = True # Load or generate specs
GENERATE_BOOTDATA = True # Load or generate boot data

PP_DATA_PATH = f"{DIR}/data_{TITLE}.csv"
CONFIG_PATH = f"{DIR}/config_{TITLE}.json"
SPECS_PATH = f"{DIR}/specs_{TITLE}.csv"
BOOT_PATH = f"{DIR}/boot_{TITLE}.csv"

## Configuration

In this cell, the configuration file is processed. The cell prints out the parsed configuration, so the user can double-check if the result is as expected.

In [None]:
config = read_config(path=CONFIG_PATH)
if config is not None:
    c_info = [
        f"{config['level']} - Level Meta-Analysis",
        f"   Minimum Nr. of Samples to include Specification: {config['k_min']}",
        f"   Bootstrap Iterations: {config['n_boot_iter']}",
        f"   {config['n_which']} Which-Factors:",
        *[f"     {k} : {(', ').join(v)}" for k, v in config['which_lists'].items()],
        f"   {config['n_how']} How-Factors:",
        *[f"     {k} : {(', ').join(v)}" for k, v in config['how_lists'].items()],
        f"   Labels",
        *[f"     {l}" for l in config['labels']],
        f"   Column-Map",
        *[f"     {k} : {v}" for k, v in config['colmap'].items()]
    ]
    print(("\n").join(c_info))

## Preprocess and Prepare Data

In this cell, the dataset is either preprocessed and stored at `PP_DATA_PATH`, or the preprocessed dataset is loaded from `PP_DATA_PATH`. The cell prints out the head and the dimensions of the data. If preprocessing is desired, the function `preprocess_data()` must be defined by the user, in the file `user_data.R`.

In [None]:
if PREPROCESS_DATA:
    ma_data = preprocess_data(DATA_PATH, title=TITLE)
else:
    ma_data = pd.read_csv(PP_DATA_PATH)
print(f"Data Shape: {ma_data.shape}")
ma_data.head()

In this cell, the preprocessed dataset is prepared for meta-analysis. Preparation adds **cluster-** and **effect- IDs**, sets datatypes, etc.. For details, consult the function documentation of `prepareData()`. The cell prints out the head and the dimensions of the prepared data.

In [None]:
data = prepare_data(config["colmap"], data=ma_data)
print(f"Data Shape: {data.shape}")
data.head()

## Specifications

In this cell, the specifications are either generated and stored at `SPECS_PATH`, or loaded from `SPECS_PATH`. For details, consult the function documentation of `generate_specs()`.

In [None]:
if GENERATE_SPECS:
    specs = generate_specs(
        data,
        config["which_lists"],
        config["how_lists"],
        config["colmap"],
        config["k_min"],
        config["level"],
        SPECS_PATH
    )
else:
    specs = pd.read_csv(SPECS_PATH)
print(specs.shape)
specs.head()

## Bootstrap Data

In this cell, the bootstrap data is either generated and stored at `BOOT_PATH`, or loaded from `BOOT_PATH`. For details, consult the function documentation of `generate_boot_data()`.

In [None]:
if GENERATE_BOOTDATA:
    boot_data = generate_boot_data(
        specs,
        config["n_boot_iter"],
        data,
        config["colmap"],
        config["level"],
        BOOT_PATH
    )
else:
    boot_data = pd.read_csv(BOOT_PATH)
print(boot_data.shape)
boot_data.head()

## Plotting

In this cell, the **cluster-** and **specification-** fill data for the respective tile maps is prepared, as well as the list of colors that constitute the color scheme. For details, consult the respective function documentation.

In [None]:
cluster_fill_data = get_cluster_fill_data(
    data,
    specs,
    config["colmap"]
)
spec_fill_data = get_spec_fill_data(
    config["n_which"],
    config["which_lists"],
    config["n_how"],
    config["how_lists"],
    specs
)
fill_levels = len(np.unique([v for v in spec_fill_data.values()]))
colors = get_colors(fill_levels)

Here we define important variables for plotting that will be reused in several plots, to improve readability.

In [None]:
colmap = config["colmap"]
k_range = [config["k_min"], max(specs["k"])]
labels = config["labels"]
level = config["level"]
n_total_specs = len(specs)
title = config["title"]

### Treemap

Treemap of the meta-analytic dataset. It visualizes each study and the reported effect size, with the colors indicating the size of the study sample size `N` (hot colors for low, cold colors for high sample sizes). If studies report multiple effect sizes, the size of each study's tile corresponds to the amount of reported effect sizes. The tile's color indicates the average sample size of the reported effects.

In [None]:
treemap = plot_treemap(data, title, colmap)
treemap.show()

### Inferential Specification Plot

In [None]:
fig_inferential = plot_inferential(boot_data, title, n_total_specs)
fig_inferential.show()

### p-Value Histogram

In [None]:
fig_p_hist = plot_p_hist(specs, title, n_total_specs)
fig_p_hist.show()

### Multiverse

In [None]:
fig = plot_multiverse(
    specs,
    n_total_specs,
    k_range,
    cluster_fill_data,
    spec_fill_data,
    labels,
    colors,
    config["level"],
    title,
    fill_levels
)
fig.show()

# fig.write_image("multiverse.pdf")
# fig.write_image("multiverse.pdf", width=1000, height=1500)

### Individual Multiverse Components

In [None]:
fig_cluster_tiles = plot_cluster_tiles(specs, cluster_fill_data, n_total_specs, title)
fig_cluster_tiles.show()

In [None]:
fig_caterpillar = plot_caterpillar(specs, n_total_specs, colors, k_range, title, fill_levels)
fig_caterpillar.show()

In [None]:
fig_cluster_size = plot_cluster_size(specs, k_range, n_total_specs, title)
fig_cluster_size.show()

In [None]:
fig_sample_size = plot_sample_size(specs, k_range, n_total_specs, title)
fig_sample_size.show()

In [None]:
fig_spec_tiles = plot_spec_tiles(specs, n_total_specs, spec_fill_data, labels, colors, k_range, title, fill_levels)
fig_spec_tiles.show()