# OHBM 2025 IBMA with NeuroVault tutorial

## IBMA: Image-based meta-analysis

IBMA, considered the gold standard of neuroimaging meta-analysis (Salimi-Khorshidi et al., 2009; Salo et al., 2023), consists of aggregating results from group-level, whole-brain statistical maps from individually conducted functional magnetic resonance imaging (fMRI) studies. IBMA outperforms other popular meta-analysis methods, such as coordinate-based meta-analysis (CBMA). IBMA methods use whole-brain statistics; thus, all existing voxel-wise statistical methods are available to analyze subject-level data within studies (Lazar et al., 2002). IBMA is known to produce richer and more detailed results, with additional brain structures that are often absent from CBMA results. IBMA also has greater power; thus, one could potentially achieve similar or even better results with a small fraction of studies generally required in CBMA. In addition, when both the parameters and variance estimates are available, hierarchical mixed effect models can be used to account for both within- and between-study variance (Salimi-Khorshidi et al., 2009)

## Tools

### NiMARE

![NiMARE banner](images/nimare_banner.png)

[NiMARE](https://nimare.readthedocs.io/en/latest/) is a Python library for performing neuroimaging meta-analyses and related analyses, like automated annotation and functional decoding. The goal of NiMARE is to centralize and standardize implementations of common meta-analytic tools, so that researchers can use whatever tool is most appropriate for a given research question.

### NeuroVault

![NeuroVault logo](images/neurovault-logo.svg)

[NeuroVault](https://neurovault.org) is a web-based repository of fMRI statistical maps from neuroimaging studies (Gorgolewski et al., 2015). The brain maps are grouped in collections that are created and updated voluntarily. This repository can be explored and downloaded with the help of an API, which is supported by some Python neuroimaging tools (e.g., Nilearn and NiMARE).

### Cognitive Atlas

![CogAt logo](images/cogat-logo.png)

[Cognitive Atlas](https://www.cognitiveatlas.org/) (Poldrack et al., 2011) is an online repository of cumulative knowledge from experienced researchers from the psychology, cognitive science, and neuroscience fields. The repository currently offers two knowledge bases: 907 cognitive concepts and 841 tasks with definitions and properties. Cognitive concepts contain relationships with other concepts and tasks, with the goal of establishing a map between mental processes and brain function. It provides an API to download the database, which is also integrated into NiMARE.

## Goals for this tutorial

1. Familiarize and download with NeuroVault data
2. Identify usable Neurovault images for IBMA
3. LEarn identify outliers and 
4. Use NiMARE to run IBMA
5. Interpret IBMA results

In [None]:
import tarfile
import os.path as op
import warnings
import os

import pandas as pd
from nimare.reports.base import run_reports
from nimare.workflows import IBMAWorkflow
import requests

from utils import download_images, convert_to_nimare_dataset
from outliers import _rm_nonstat_maps, _rm_extreme_maps, _rm_duplicates_maps

warnings.filterwarnings('ignore')
# Slide on motivations with repect to Neurvault data. potential use cases for this nmotebook.
# Soince peole are going to be using it for their own analyss in the fututerr

In [None]:
data_dir = "./data"
filename = "november_2022"
nv_data_dir = op.join(data_dir, filename)
image_dir = op.join(data_dir, "nv_images")
report_dir = op.join(data_dir, "report")
report_clean_dir = op.join(data_dir, "report_clean")
report_clean_clean_dir = op.join(data_dir, "report_clean_clean")

# Download Neurovault data

To explore the NeuroVault database, we created an SQL query and exported the database contents to human-readable tables while filtering sensitive user information. This provided sufficient metadata from all collections and images to investigate the entire database without downloading the files. The images identified as usable for IBMA (see the following section on the image selection framework) were downloaded along with their metadata and converted to a NiMARE Dataset object to leverage existing IBMA methods implemented in NiMARE.

In [None]:
github_url = (
    f"https://raw.githubusercontent.com/NeuroVault/nv-data/master/{filename}.tar.gz"
)
response = requests.get(github_url)

file_path = op.join(data_dir, f"{filename}.tar.gz")

In [None]:
with open(file_path, "wb") as f:
    f.write(response.content)

In [None]:
with tarfile.open(file_path, "r:gz") as tar:
    tar.extractall(path=data_dir)

In [None]:
sorted(os.listdir(nv_data_dir))

The files availabels are the following:

- `django_content_type.csv`: 
- `statmaps_atlas.csv`: Contain some atlases available in NeuroVaults
- `statmaps_basecollectionitem.csv`: Contain the image IDs, name, description and its associated collections ID
- `statmaps_cognitiveatlascontrast.csv`: Contains the contrast names and Ids from Cognitive Atlas
- `statmaps_cognitiveatlastask.csv`: Contains the task names and Ids from Cognitive Atlas
- `statmaps_collection.csv`: Contain the collection level metadata, including name, DOI, authors, etc.
- `statmaps_collection_communities.csv`: Contains the comunities link to a collection
- `statmaps_collection_contributors.csv`: Contains the user ID link to a colleciton
- `statmaps_community.csv`: Contain the information about communities in NeuroVault, such as their label and description
- `statmaps_image.csv`: Contain image file name and demographic information linked to the image
- `statmaps_statisticmap.csv`: COntain image metadata such as map type, modality, task, analysis leve, etc

To identify images usable for IBMA were first get the images ID (`id`) and name (`name`), and their affiliated collection ID (`collection_id`).

In [None]:
# Get images and collection IDs
cmeta_cols = ["id", "collection_id", "name"]
image_colection = pd.read_csv(
    op.join(nv_data_dir, "statmaps_basecollectionitem.csv"),
    usecols=cmeta_cols,
)

Second, from `statmaps_image.csv` we get the file name, which will use later to download the images using the NuroVault API. In this database `basecollectionitem_ptr_id` refers to the image ID `id`.

In [None]:
# Get path to images in NeuroVault
image_cols = ["file", "basecollectionitem_ptr_id"]
image = pd.read_csv(
    op.join(nv_data_dir, "statmaps_image.csv"),
    usecols=image_cols,
)
image = image.rename(columns={"basecollectionitem_ptr_id": "id"})

Third, extract some relevant images metadata from `statmaps_statisticmap.csv`. The selected metadata included:
- `image_ptr_id`: 
- `map_type`:
- `modality`:
- `analysis_level`:
- `number_of_subjects`:
- `is_thresholded`:
- `brain_coverage`:
- `not_mni`:
- `cognitive_paradigm_cogatlas_id`:

One can find the comlete list of Cognitive Atlas task and thier IDs in the file `statmaps_cognitiveatlastask.csv`, in the Cognitive Atlas website https://www.cognitiveatlas.org/tasks, which are also acccesible via API in https://www.cognitiveatlas.org/api/v-alpha/task. 

Here the image ID is identify by `image_ptr_id`.

In [None]:
# Get image metadata
imeta_cols = [
    "image_ptr_id",
    "map_type",
    "modality",
    "analysis_level",
    "number_of_subjects",
    "is_thresholded",
    "brain_coverage",
    "not_mni",
    "cognitive_paradigm_cogatlas_id",
]
image_meta = pd.read_csv(
    op.join(nv_data_dir, "statmaps_statisticmap.csv"),
    usecols=imeta_cols,
)
image_meta = image_meta.rename(columns={"image_ptr_id": "id"})

Finally, we combine the previous extracted information into a single Pnadas DataFrame by merging on the image ID (`id`).

In [None]:
image_merge = pd.merge(image_colection, image, how="left", on="id")
image_full_df = pd.merge(image_merge, image_meta, how="left", on="id")

# Select images for IBMA

Using the available metadata from the retrieved tables, we set different inclusion criteria for images to be considered for a meta-analysis. We focused on fMRI-BOLD images, as they are the most prevalent modality in NeuroVault. Note that the methods presented in this paper should work with other image modalities (e.g., PET, diffusion MRI, structural MRI). Still, only fMRI-BOLD had enough data in NeuroVault for meta-analyses. Then, we specifically chose images from group-level analyses. Additionally, we retained only images from studies with a sample size greater than ten subjects. Next, we selected images classified as T or Z statistics. Although best practices in meta-analysis suggest using meaningful units and incorporating uncertainty through standard errors, T/Z statistic maps are the most commonly shared images in NeuroVault (Maumet and Nichols, 2016). We discuss this further in the following sections. Upon review, it is important to note here that plenty of images in NeuroVault are labeled as “Other” for the image type. Nonetheless, most of those images actually correspond to known image types (e.g., T/Z statistic). As a result, we relabeled those images to their original type if keywords such as “zstat,” “tstat,” “Z_,” or “T_” were present in the image name, file name, or image description. Following that, we retained unthresholded images that cover 40% of the brain and are in MNI space. Ultimately, we narrowed our selection to images associated with a Cognitive Atlas task.

In [None]:
# Specify columns definitions
image_selected_df = image_full_df.query(
    'modality == "fMRI-BOLD"'
    ' & analysis_level == "G"'
    ' & number_of_subjects > 10'
    ' & (map_type == "Z" | map_type == "T")'
    ' & is_thresholded == "f"'
    ' & brain_coverage > 40'
    ' & not_mni == "f"'
    ' & cognitive_paradigm_cogatlas_id == "tsk_4a57abb949bcd"'  # n-back task
)

# Download images for IBMA

Next, we attempt to downlod the selected image from NeuroVault, to facilitate that process we implemented a function named download images, which take the images IDs from the previously created Python DataFrame, and a path to a directory to download the images. Note that not all images can be downloaded, some ogf them belong to private collection, while others have a corrupted file

In [None]:
# Keep downloaded images only
# Explain file name meaning: col_id, and img_id + name of the files as represented in NeuroVault
image_ids = image_selected_df["id"].unique()
image_usable_df = download_images(image_ids, image_dir)
print(f"Usable images: {len(image_usable_df)}/{len(image_ids)}")
# Focus on input and outputs

In [None]:
image_df = pd.merge(image_selected_df, image_usable_df, on="id")
image_df

# Create NiMARE Dataset object for IBMA

THen, we create

In [None]:
dset = convert_to_nimare_dataset(image_df)

# Initialize an IBMA workflow with NiMARE



In [None]:
# Using all the default for teaching popurse
# List the default so that people have an understandin of the flexibility
workflow = IBMAWorkflow()

# Apply workflow to a the dataset object

In [None]:
results = workflow.fit(dset)

# Generate HTML report of IBMA results 

In [None]:
run_reports(results, report_dir)

# Eclude images 

Even after applying the previous strict preliminary inclusion criteria, we still found plenty of wrongly annotated images, especially representing other image modalities and others with extreme values. Therefore, we developed an automatic heuristic selection to remove those spurious images from the meta-analysis. The heuristic selection consisted of two steps. First, we removed all images from collections that lacked a link to a publication. Also, images with a minimum Z value smaller than 1.96 (i.e., Z score for a 0.05 p-value) were removed as they potentially consisted of mislabeled correlation maps, inverted p-value maps, or did not contain voxels statistically significant. We also excluded images with a maximum Z score larger than 50. Although the number 50 is arbitrary, we wanted to detect images with an unusually large signal. For example, mislabeled BOLD or COPE (contrast of parameter estimates) images or others resulting from studies with a huge sample size. Additionally, using the image metadata, we analyzed the image and file name. We removed those containing keywords such as "ICA," "PCA," "PPI," "seed," "functional connectivity," “cope,” “tfce,” and "correlation," which represent modalities not of interest for the meta-analysis of the current work. 

In [None]:
image_clean_df = _rm_nonstat_maps(image_df)
print(image_clean_df.shape)
image_clean_df = _rm_extreme_maps(image_clean_df)
image_clean_df

In [None]:
dset_clean_clean = _rm_duplicates_maps(dset_clean)

In [None]:
dset_clean_clean

In [None]:
# Making explicit we are trasforming Z->T and T->Z
# Explain why we are using T/Z

dset_clean = convert_to_nimare_dataset(image_clean_df)

In [None]:
# Perform heuristic selection of images
# Run the meta-analysis and compare

In [None]:
workflow_clean = IBMAWorkflow()
results_clean = workflow_clean.fit(dset_clean)
run_reports(results_clean, report_clean_dir)

It is quite common for NeuroVault users to upload inverted contrasts and duplicates. For example, one might find two images representing the same contrast (such as House > Face) but with the signs reversed (i.e., Face > House). This creates problems for meta-analyses, as these images effectively cancel each other out when aggregated. Additionally, it is typical for users to upload multiple images of the same contrast, differing only by the covariate used in the group-level analysis. These can be considered duplicates, especially when the covariate does not influence the final estimate. To identify duplicates, we utilize the correlation matrix of the input samples. Image pairs with a correlation close to 1 are considered duplicates, while those with a correlation close to -1 are labeled as inverted contrasts. From the identified duplicates, we randomly selected one image from each pair. For pairs of inverted contrasts, we choose the image with a positive slope relative to the median image.

In [None]:
from nimare.meta.ibma import Stouffers

workflow_clean_clean = IBMAWorkflow(
    estimator=Stouffers(aggressive_mask=False),
    corrector="fdr",
    diagnostics="jackknife",
    voxel_thresh=3.2,
    cluster_threshold=90,
    output_dir=None,
    n_cores=1,
)
results_clean_clean = workflow_clean_clean.fit(dset_clean_clean)
run_reports(results_clean_clean, report_clean_clean_dir)

# Next steps


# References

For additional detail 
