Latent Multimodal Functional Graphical Model Estimation

output

pdf_document	html_document
default	default

Latent Multimodal Functional Graphical Model Estimation

This repository implements the method developed in Latent Multimodal Functional Graphical Model Estimation.

This form documents the artifacts associated with the article (i.e., the data and code supporting the computational findings) and describes how to reproduce the findings.

Part 1: Data

This paper does not involve analysis of external data (i.e., no data are used or the only data are generated by the authors via simulation in their code).

I certify that the author(s) of the manuscript have legitimate access to and permission to use the data used in this manuscript.

Abstract

The dataset consists of simulated and real data of EEG-fMRI. The simulated data along with data generation code is available. We provide generation code for four types of simulated graph under 2 different noise models, as detailed in Appendix K. We will provide a download link for a simulated dataset for these four types of graphs and with the sample size 100, and the dimension 50,100,150. The real data is available upon request. Since we do not own the original dataset of concurrent measurements of EEG-fMRI, we kindly ask to send the request to the original authors referenced in the manuscript.

Availability

Data are publicly available.
Data cannot be made publicly available.

If the data are publicly available, see the Publicly available data section. Otherwise, see the Non-publicly available data section, below.

Publicly available data

Data are available online at:here
Data are available as part of the paper’s supplementary material.
Data are publicly available by request, following the process described here:
Data are or will be made available through some other mechanism, described here:

The data we use is originally from Morillon et al. (2010). We have contacted the data owner, Anne-Lise Giraud, for data sharing. She has agreed to share data when individuals request it. Please contact the Anne-Lise Giraud (email:anne-lise.giraud-mamessier@pasteur.fr) to request the data. The simulated data are available in the above link. Partial simulated data, i.e, the data that run the sample complexity in section 7.2 of the manuscript are not available online due to the fact that the size is too big (~1TB). But the data generation code is provided so practitioners can generate data on their own.

Reference

Morillon, B., Lehongre, K., Frackowiak, R. S., Ducorps, A., Kleinschmidt, A., Poeppel, D., & Giraud, A. L. (2010). Neurophysiological origin of human brain asymmetry for speech and language. Proceedings of the National Academy of Sciences, 107(43), 18688-18693.

Non-publicly available data

Description

File format(s)

CSV or other plain text.
Software-specific binary format (.Rda, Python pickle, etc.): pkcle
Standardized binary format (e.g., netCDF, HDF5, etc.):
Other (please specify):

Data dictionary

Provided by authors in the following file(s): Under the directory data/README.md
Data file(s) is(are) self-describing (e.g., netCDF files)
Available at the following URL:

Additional Information (optional)

Part 2: Code

Abstract

The code contains source files and testing files. We briefly outline the content of each directory. The code/synth_data directory contains files to generate synthetic data and script files to generate a batch of synthetic data. The code/src contains all the source code. We do not provide the codes for other comparing methods as we do not possess the ownership. Under the directory code/tests, the directory notebook contains all the step-by-step instruction and visualization code, the script folder contains the execution script. The code/experiments directory contains the data preprocessing code and graph estimation code for real data.

Description

Code format(s)

Supporting software requirements

Version of primary software used

R version 3.6.0 Python version 3.7.3

Libraries and dependencies used by the code

R-packages
- wordspace_0.2-6
- fields_12.5
- viridis_0.6.1
- viridisLite_0.4.0
- spam_2.7-0
- dotCall64_1.0-1
- plotly_4.10.0
- ggplot2_3.3.5
- pracma_2.3.3
- R.matlab_3.6.2
- far_0.6-5
- nlme_3.1-139
- matrixcalc_1.0-5
- poweRlaw_0.70.6
- fgm_1.0
- mvtnorm_1.1-2
- fda_5.4.0
- deSolve_1.30
- fds_1.8
- RCurl_1.98-1.5
- rainbow_3.6
- pcaPP_1.9-74
- MASS_7.3-51.3
- Matrix_1.2-17
- RSpectra_0.16-0
- doParallel_1.0.16
- iterators_1.0.10
- foreach_1.4.4
Python packages
- numpy_1.19.1
- scipy_1.5.2
- pathos_0.2.8
- matplotlib_3.4.3
- multiprocessing_0.70.12.2
- nilearn_0.9.0
- rpy2_2.9.4

Supporting system/hardware requirements (optional)

Platform: x86_64-conda_cos6-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)

All experiments are run on a cluster and no GPUs are required. Any single run of the experiment can be run on a standalone desktop. However, if practitioners want to test different variables to generate ROC curves, it is highly recommended to use a cluster.

Parallelization used

No parallel code used
Multi-core parallelization on a single machine/node
- Number of cores used: 8-160
Multi-machine/multi-node parallelization
- Number of nodes and cores used:

License

Additional information (optional)

Part 3: Reproducibility workflow

Scope

The provided workflow reproduces:

Any numbers provided in text in the paper
The computational method(s) presented in the paper (i.e., code is provided that implements the method(s))
All tables and figures in the paper
Selected tables and figures in the paper, as explained and justified below:

Workflow

Location

The workflow is available:

As part of the paper’s supplementary material.
In this Git repository: The git respository will be made public if accepted. Now we include the contents under the directory code/
Other (please specify):

Format(s)

Single master code file
Wrapper (shell) script(s)
Self-contained R Markdown file, Jupyter notebook, or other literate programming approach
Text file (e.g., a readme-style file) that documents workflow
Makefile
Other (more detail in Instructions below)

Instructions

Each simulated experiment is consisted of three steps (i) generate simulated data (ii) run the proposed algorithm (w/ variable selection) (iii) visualize the results. The code for the first step is under the directory code/synth_data. The file for the second step is under code/tests or the proposed algorithm can be run in batch by running the script files in code/tests. The tools to visualize the results are under code/tests/notebook. Each directory also contains the README.md file for more detailed instruction.

Figure 2/Table 3

Data preparation
- Data generation script: ./code/synth_data/run_dgp_N2.sh
- Data downlaod link. Store the files under data_batch_N2
Estimation
- Script file: ./code/tests/script/noise_model_2/*N100.sh. Modify the script file to specify the conda environment, file path, and save path.
Visualization
- Result download link. The directory /proposed contain the results of proposed method. The directory /comparison/ constain the results of other comparison methods.
- Visualization notebook: /code/notebook/plot_Comparison.ipynb
- Instruction to generate table: to print the AUC and AUC15, set verbose=True

Figure 3

Data preparation
- Data generation script: ./code/synth_data/run_dgp_sample.sh
Estimation
- Script file: ./code/tests/script/sample_sample/. Modify the script file to specify the conda environment, file path, and save path.
Visualization
- Result download link. Please select to download the directory ./p50, ./p100, ./p150
- Visualization notebook:/code/notebook/plot_SampleComplexity.ipynb

Figure 5/Table 2

Data preparation
- Data generation script: ./code/synth_data/run_dgp_N1.sh
- Data downlaod link. Store the files under data_batch_N1
Estimation
- Script file: ./code/tests/script/noise_model_1/*N100.sh. Modify the script file to specify the conda environment, file path, and save path.
Visualization
- Result download link The directory /proposed contain the results of proposed method. The directory /comparison/ constain the results of other comparison methods.
- Visualization notebook: /code/notebook/plot_Comparison.ipynb
- Instruction to generate table: to print the AUC and AUC15, set verbose=True

Figure 6/Table 4

Data preparation
- Data generation script: ./code/synth_data/run_dgp_N2.sh
- Data downlaod link. Store the files under data_batch_N2
Estimation
- Script file: ./code/tests/script/noise_model_2/. Modify the script file to specify the conda environment, file path, and save path.
Visualization
- Result download link The directory /proposed contain the results of proposed method.
- Visualization notebook: /code/notebook/plot_Comparison.ipynb
- Instruction to generate table: to print the AUC and AUC15, set verbose=True

Figure 7/Table 5

Data preparation
- Data generation script: ./code/synth_data/run_dgp_N1.sh
- Data downlaod link. Store the files under data_batch_N1
Estimation
- Script file: ./code/tests/script/noise_model_1/. Modify the script file to specify the conda environment, file path, and save path.
Visualization
- Result download link The directory /proposed contain the results of proposed method.
- Visualization notebook: /code/notebook/plot_Comparison.ipynb
- Instruction to generate table: to print the AUC and AUC15, set verbose=True

Figure 8

Data preparation
- Data generation script: /code/synth_data/run_dgp_k.sh,
- Data downlaod link
Estimation
- Script file: ./code/tests/script/sample_k/.
Visualization
- Result download link
- Visualization notebook: /code/notebook/plot_SampleComplexity_2.ipynb.

Figure 9

Estimation
- Run /code/tests/notebook/plot_elbo.ipynb and save the result
Visualization
- Visualization notebook: /code/tests/notebook/plot_elbo2.ipynb

Figure 10

Estimation
- Run /code/tests/notebook/plot_elbo.ipynb and save the result
Visualization
- Visualization notebook: /code/tests/notebook/plot_elbo2.ipynb

Figure 11

Data preparation
- Data generation script: ./code/synth_data/run_dgp_N2.sh
- Data downlaod link. Store the files under data_batch_N2
Estimation
- Script file: ./code/tests/script/noise_model_2/. Modify the script file to specify the conda environment, file path, and save path
Visualization
- Result download link
- Visualization notebook: /code/tests/notebook/plot_VariableSelection.ipynb

Figure 12

Data preparation
- Data generation script: ./code/synth_data/run_dgp_sample.sh
Estimation
- Script file: ./code/tests/script/sample_alpha/.
Visualization
- Result download link
- Visualization notebook: /code/notebook/plot_SampleComplexity_2.ipynb

Figure 13/Table 6

Data preparation
- Data generation script: ./code/synth_data/run_dgp_kmk.sh
- Data downlaod link
Estimation
- Script file: ./code/tests/script/noise_model1_varykmk/
Visualization
- Result download link
- Visualization notebook: /code/notebook/plot_Comparison.ipynb
- Instruction to generate table: to print the AUC and AUC15, set verbose=True

Figure 14/Table 7

Data preparation
- Data generation script: ./code/synth_data/run_dgp_kmk.sh
- Data downlaod link
Estimation
- Script file: ./code/tests/script/noise_model1_varykmk/
Visualization
- Result download link
- Visualization notebook: /code/notebook/plot_Comparison.ipynb
- Instruction to generate table: to print the AUC and AUC15, set verbose=True

Expected run-time

Approximate time needed to reproduce the analyses on a standard desktop machine:

< 1 minute
1-10 minutes
10-60 minutes
1-8 hours
> 8 hours
Not feasible to run on a desktop machine, as described here: It is safest to run on a cluster as original tests are implemented with parallelization. One can modify the number of cores in the test file to make it suitable for a desktop machine.

Additional information (optional)

We provide a demo example that can be run on standard desktop. Please see the /code/README.md for further instruction.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
code		code
data		data
manuscript		manuscript
output		output
LICENSE		LICENSE
README.md		README.md
acc_form.Rmd		acc_form.Rmd

License

stair-lab/jasa-multimodalfGGM

Folders and files

Latest commit

History

Repository files navigation

Latent Multimodal Functional Graphical Model Estimation

Part 1: Data

Abstract

Availability

Publicly available data

Non-publicly available data

Description

File format(s)

Data dictionary

Additional Information (optional)

Part 2: Code

Abstract

Description

Code format(s)

Supporting software requirements

Version of primary software used

Libraries and dependencies used by the code

Supporting system/hardware requirements (optional)

Parallelization used

License

Additional information (optional)

Part 3: Reproducibility workflow

Scope

Workflow

Location

Format(s)

Instructions

Figure 2/Table 3

Figure 3

Figure 5/Table 2

Figure 6/Table 4

Figure 7/Table 5

Figure 8

Figure 9

Figure 10

Figure 11

Figure 12

Figure 13/Table 6

Figure 14/Table 7

Expected run-time

Additional information (optional)

Notes (optional)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages