This R repo is a development branch, the actively developed repo is in Python at https://github.com/neurodata/hyppo.
Multiscale Graph Correlation (MGC)
Contents
- Overview
- Repo Contents
- System Requirements
- Installation Guide
- Instructions for Use
- License
- Issues
- Citation
- Reproducibility
Overview
In modern scientific discovery, it is becoming increasingly critical to uncover whether one property of a dataset is related to another. The MGC (pronounced magic), or Multiscale Graph Correlation, provides a framework for investigation into the relationships between properties of a dataset and the underlying geometries of the relationships, all while requiring sample sizes feasible in real data scenarios.
Repo Contents
- R:
Rpackage code. - docs: package documentation.
- man: package manual for help in R session.
- tests:
Runit tests written using thetestthatpackage. - vignettes:
Rvignettes for R session html help pages.
System Requirements
Hardware Requirements
The MGC package requires only a standard computer with enough RAM to support the operations defined by a user. For minimal performance, this will be a computer with about 2 GB of RAM. For optimal performance, we recommend a computer with the following specs:
RAM: 16+ GB
CPU: 4+ cores, 3.3+ GHz/core
The runtimes below are generated using a computer with the recommended specs (16 GB RAM, 4 cores@3.3 GHz) and internet of speed 25 Mbps.
Software Requirements
OS Requirements
This package is supported for Linux operating systems. The package has been tested on the following systems:
Linux: Ubuntu 20.04, 18.04
Mac OSX:
Windows:
Before setting up the MGC package, users should have R version 3.4.0 or higher, and several packages set up from CRAN.
Installing R version 3.4.2 on Ubuntu 16.04
the latest version of R can be installed by adding the latest repository to apt:
sudo echo "deb http://cran.rstudio.com/bin/linux/ubuntu xenial/" | sudo tee -a /etc/apt/sources.list
gpg --keyserver keyserver.ubuntu.com --recv-key E084DAB9
gpg -a --export E084DAB9 | sudo apt-key add -
sudo apt-get update
sudo apt-get install r-base r-base-dev
which should install in about 20 seconds.
Package dependencies
Users should install the following packages prior to installing mgc, from an R terminal:
install.packages(c('ggplot2', 'reshape2', 'Rmisc', 'devtools', 'testthat', 'knitr', 'rmarkdown', 'latex2exp', 'MASS'))
which will install in about 80 seconds on a recommended machine.
Package Versions
The mgc package functions with all packages in their latest versions as they appear on CRAN on October 15, 2017. Users can check CRAN snapshot for details. The versions of software are, specifically:
ggplot2: 2.2.1
reshape2: 1.4.2
Rmisc: 1.5
devtools: 1.13.3
testthat: 0.2.0
knitr: 1.17
rmarkdown: 1.6
latex2exp: 0.4.0
MASS: 7.3
If you are having an issue that you believe to be tied to software versioning issues, please drop us an Issue.
Installation Guide
From an R session, type:
require(devtools)
install_github('neurodata/r-mgc', build_vignettes=TRUE) # install mgc with the vignettes
require(mgc) # source the package now that it is set up
vignette("MGC", package="mgc") # view one of the basic vignettes
The package should take approximately 20 seconds to install with vignettes on a recommended computer.
Instructions for Use
Please see the vignettes for help using the package:
vignette("MGC", package="mgc")
vignette("Discriminability", package="mgc")
vignette("simulations", package="mgc")
Pseudocode
Pseudocode for the methods employed in the mgc package can be found on the arXiv - MGC in Appendix C (starting on page 30).
Citation
For citing code or the paper, please use the citations found in citation.bib.
Reproducibility
MGC
All the code to reproduce any figures from https://arxiv.org/abs/1609.05148 is available here https://github.com/neurodata/mgc-paper.
Discriminability
Here, we describe how to reproduce the manuscript figures from the discriminability paper. To begin, clone this repository locally:
git clone https://github.com/neurodata/r-mgc.git
We assume that the directory r-mgc placed locally on the system is <package_root>. Note that all figures were stylized using Adobe Photoshop prior to submission.
-
Figure 1: Mini Sims Figure This figure demonstrates the behavior of discriminability, Fingerprinting, ICC/I2C2, and Kernel methods under a range of basic simulation settings in 1 dimension.
-
Figure 2: Multisim Figure This figure demonstrates the behavior of discriminability, ICC, and I2C2 under a variety of simulation benchmark settings. To execute the script with fresh data:
setwd('<package_root>/docs/discriminability/paper/simulations')
source('shared_scripts.R`)
Note: the scripts will automatically multithread, however, the simulation benchmarks take quite a while to execute (1.5 days on a 96 core machine with 1 TB of RAM).
Using the included bound, one sample, and two sample data, you can proceed to duplicate the figure by opening the R notebook simulation plots, and executing the script.
- Figure 3: 64 pipelines figure. To regenerate the source data for this portion of the manuscript, users can use the following two scripts from an R terminal:
setwd('<package_root>/docs/discriminability/paper/discr_computation')
# edit lines 17 and 18, and lines 210 and 211, and set to your local path where
# preprocessed brains are located
source('./real_data_driver.R') # runs the discriminability calculations
# edit lines 17 and 18, and lines 108 and 109, to the location of the
# preprocessed brains
source('./realdat_perm_testing.R') # runs the two sample testing
Again, the scripts will multithread, but can be expected to take approximately 3 days on a 96 core, 1 TB RAM machine.
To regenerate Figure 2 from the manuscript, users can execute the 64 Pipelines Figure notebook.
-
Figure 4: Marginalized Options Comparison Users can regenerate the figure by using the notebook Multi Modal Opts.
-
Figure 5: Effect Size Investigation Users can reproduce the data collected with:
setwd('<package_root>/docs/discriminability/paper/dcor_fig')
source('./dep_wt_driver.R')
Results can be expected to take 2 days on a 96 core, 1 TB machine.
To reproduce the figure, users can use the Effect Size Investigation notebook.