# GTEx model building with MOFA2

Empirical Bayes non-negative matrix factorization for single-cell RNA-seq data

## Libraries

In [1]:
library(data.table)
library(here)
library(MOFA2)
library(dplyr)

here() starts at /home/msubirana/Documents/pivlab/plier2-analyses


Attaching package: ‘MOFA2’


The following object is masked from ‘package:stats’:

    predict



Attaching package: ‘dplyr’


The following objects are masked from ‘package:data.table’:

    between, first, last


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union




# Input

In [2]:
gtex_data <- readRDS(here('output/gtex/df_gtex_fbm_filt.rds'))
head(gtex_data)

Unnamed: 0_level_0,GTEX-1117F-0226-SM-5GZZ7,GTEX-1117F-0426-SM-5EGHI,GTEX-1117F-0526-SM-5EGHJ,GTEX-1117F-0626-SM-5N9CS,GTEX-1117F-0726-SM-5GIEN,GTEX-1117F-1326-SM-5EGHH,GTEX-1117F-2426-SM-5EGGH,GTEX-1117F-2526-SM-5GZY6,GTEX-1117F-2826-SM-5GZXL,GTEX-1117F-2926-SM-5GZYI,⋯,GTEX-ZZPU-1126-SM-5N9CW,GTEX-ZZPU-1226-SM-5N9CK,GTEX-ZZPU-1326-SM-5GZWS,GTEX-ZZPU-1426-SM-5GZZ6,GTEX-ZZPU-1826-SM-5E43L,GTEX-ZZPU-2126-SM-5EGIU,GTEX-ZZPU-2226-SM-5EGIV,GTEX-ZZPU-2426-SM-5E44I,GTEX-ZZPU-2626-SM-5E45Y,GTEX-ZZPU-2726-SM-5NQ8O
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
WASH7P,3.2874723,2.2812531,3.0616034,3.5933538,2.1063483,2.6755901,3.6993295,4.1659119,3.4646683,3.7548875,⋯,1.3818371,1.708408,2.674913,1.7268312,1.789103,2.328549,1.5469562,1.9474791,0.9027296,1.663117
RP11-34P13.15,2.0755326,0.3210045,1.2363395,1.5165195,0.9458324,1.760008,1.4302853,2.9412941,1.6031219,2.5395311,⋯,2.1156992,3.018812,5.148934,3.1322478,2.537545,4.292782,3.3077199,2.1210154,0.7822408,2.867106
RP11-34P13.16,3.0021624,0.5819778,0.5661819,1.5134907,1.4745658,2.2097653,1.08882,3.6701605,2.304511,2.7911889,⋯,3.2237314,4.214125,5.580447,4.1667154,3.230818,5.26341,4.5014391,2.7152345,0.9684963,3.764474
RP11-34P13.18,3.5741015,2.4800066,3.4025858,3.4672795,1.8976277,2.4192692,4.1424134,3.7949357,3.0134623,3.6519127,⋯,1.75446,2.301295,2.827616,1.9500951,2.272023,2.955871,1.6780719,2.3152764,1.4076247,2.553361
AP006222.2,0.8335783,0.2752455,0.4659222,0.4257815,0.2309408,0.2118844,0.2096405,0.1629834,0.3923174,0.1650436,⋯,0.6360793,1.226509,1.14991,0.4867659,0.625177,1.531569,0.4006472,0.5525738,0.4739427,1.921817
MTND1P23,3.252779,5.0386996,3.8288346,2.3030501,3.257765,5.165108,2.9783787,2.5192903,4.2517191,3.3279747,⋯,6.4319572,3.848998,3.836934,3.6892992,4.092546,3.941106,3.3367118,3.9030383,4.6707271,3.879706


# Create model

In [3]:
X <- as.matrix(gtex_data)

In [4]:
data_list <- list(RNA = X)
mofa <- create_mofa(data_list)

Creating MOFA object from a list of matrices (features as rows, sample as columns)...




In [5]:
data_opts  <- get_default_data_options(mofa)
# your data are already normalized; don't rescale views/groups
data_opts$scale_views  <- FALSE
data_opts$scale_groups <- FALSE

In [6]:
model_opts <- get_default_model_options(mofa)
model_opts$num_factors <- 412  

In [7]:
train_opts <- get_default_training_options(mofa)
train_opts$convergence_mode <- "fast" # quick run
train_opts$maxiter <- 1000            # default; you can lower to ~500 for speed

In [11]:
set.seed(1)
MOFAobject <- prepare_mofa(
  object = mofa,
  data_options = data_opts,
  model_options = model_opts,
  training_options = train_opts
)

“Some view(s) have a lot of features, it is recommended to perform a more stringent feature selection before creating the MOFA object....”
Checking data options...

Checking training options...

Checking model options...

“The number of factors is very large, training will be slow...”


In [12]:
MOFAobject

Untrained MOFA model with the following characteristics: 
 Number of views: 1 
 Views names: RNA 
 Number of features (per view): 21613 
 Number of groups: 1 
 Groups names: group1 
 Number of samples (per group): 17382 
 

In [None]:
outfile = file.path(tempdir(),"model.hdf5")
MOFAobject.trained <- run_mofa(MOFAobject, outfile, use_basilisk=TRUE)

Connecting to the mofapy2 package using basilisk. 
    Set 'use_basilisk' to FALSE if you prefer to manually set the python binary using 'reticulate'.

+ /home/msubirana/.cache/R/basilisk/1.14.1/0/bin/conda create --yes --prefix /home/msubirana/.cache/R/basilisk/1.14.1/MOFA2/1.12.0/mofa_env 'python=3.10.5' --quiet -c conda-forge

+ /home/msubirana/.cache/R/basilisk/1.14.1/0/bin/conda install --yes --prefix /home/msubirana/.cache/R/basilisk/1.14.1/MOFA2/1.12.0/mofa_env 'python=3.10.5' -c conda-forge

+ /home/msubirana/.cache/R/basilisk/1.14.1/0/bin/conda install --yes --prefix /home/msubirana/.cache/R/basilisk/1.14.1/MOFA2/1.12.0/mofa_env -c conda-forge 'python=3.10.5' 'python=3.10.5' 'numpy=1.23.1' 'scipy=1.8.1' 'pandas=1.4.3' 'h5py=3.6.0' 'scikit-learn=1.1.1' 'dtw-python=1.2.2'

