# Ejecutar GWAS usando GMMAT
**Autores:** Rafaella Ormond y Jose Jaime Martinez-Magana  
**Fecha:** 25 de julio de 2025  

Este script fue desarrollado para ejecutar GMMAT en las cohortes LAGC.  

**ENTRADA:**
- Datos de genotipo en formato PLINK pgen/pvar/sample (.pgen, .pvar, .sample)
- Archivos de fenotipo y covariables correspondientes a la cohorte
- Covariables para el ajuste en el GWAS
- Matriz GRM (estimada en el script de PCAir: [ENLACE AQUÍ](https://github.com/ormondr/Smoking_GWAS_LAGC/blob/main/English/01PC/00PCAir.ipynb))

**SALIDA:** 
- Resultados de asociación GWAS de GMMAT (archivos de GMMAT con estadísticas resumen)

### Requisitos

### Descargar e Instalar R

Puedes descargar e instalar R desde el Comprehensive R Archive Network (CRAN) para tu sistema operativo:

- [R para Windows](https://cran.r-project.org/bin/windows/base/)
- [R para macOS](https://cran.r-project.org/bin/macosx/)
- [R para Linux](https://cran.r-project.org/bin/linux/)
  
Después de instalar R, puedes instalar los paquetes requeridos como se muestra a continuación.

### Descargar GMMAT
GMMAT es un paquete de R para análisis de asociación genética. Para instalar GMMAT, necesitas tener R (versión 3.5 o superior) y el paquete devtools instalado. 
[Accede aquí al github de GMMAT](https://github.com/hanchenphd/GMMAT)<br>
o<br>
Ejecuta los siguientes comandos en R:

`if (!requireNamespace("devtools", quietly = TRUE))`

`install.packages("devtools")`

`devtools::install_github("jiehuang2020/GMMAT")`

### Instalar otros paquetes en R 
`install.packages("optparse")`

`install.packages("plyr")`

Este método se recomienda principalmente para muestras con menos de 300 individuos [(ver plan de análisis)](https://docs.google.com/document/d/1RzD5kBlj9rfiomda1G3NfxYDXLdmIUO7VX0cSNj70Kk/edit?usp=sharing).

### Pasos del Análisis:
1) Construir modelos nulos
2) Ejecutar pruebas de asociación usando el test score de GMMAT

### 1. Construir Modelos Nulos

**Descripción:**  
Este script construye el **modelo nulo** usando **GMMAT**.  
El modelo incluye **10 componentes principales (PCs)**, **edad** y la **matriz de relación genética (GRM)** como covariables.  
Deberás ajustar las rutas de los archivos de entrada y los nombres de las variables según tu conjunto de datos antes de ejecutar el script.

In [None]:
########### 
# This script need adjustment for your files
###########

# load GMMAT
library(GMMAT)

# read phenofile
# the phenofile needs to have the FID, IID and all the phenotypes of this cohort
# male
pheno_male=read.table("/path_to_your_data/cohort_name_pheno_male.txt", header= T)
#female
pheno_female=read.table("/path_to_your_data/cohort_name_female.txt", header= T)

# read pca and GRM
grm=readRDS("/path_to_your_data/cohort_name.gds_prunned_grm_pca.rds")

# merging PCs for model
pcs=grm$PCair$vectors[,c(1:10)]
pcs=as.data.frame(pcs)
colnames(pcs)=paste0(rep("PC",10),rep(1:10))

# add sample ID
pcs$SampleID=rownames(pcs)

# make a column with SampleID (same than IID)
# male
pheno_male$SampleID=pheno_male$IID
# female
pheno_female$SampleID=pheno_female$IID

# merging 
library(plyr)
# male
pheno_pcs_male <- join_all(list(pheno_male, pcs), by = "SampleID", type = "inner")
#female
pheno_pcs_female <- join_all(list(pheno_female, pcs), by = "SampleID", type = "inner")

# subset GRM
# male
grm_subset_male=grm$grm_sparse[,rownames(grm$grm_sparse) %in% pheno_pcs_male$SampleID]
grm_subset_male=grm_subset_male[rownames(grm$grm_sparse) %in% pheno_pcs_male$SampleID,]
#female
grm_subset_female=grm$grm_sparse[,rownames(grm$grm_sparse) %in% pheno_pcs_female$SampleID]
grm_subset_female=grm_subset_female[rownames(grm$grm_sparse) %in% pheno_pcs_female$SampleID,]

# verify the order of the GRM matrix, pheno
# extract ordering index
# male
reorde_idx_male=match(rownames(grm_subset_male), pheno_pcs_male$SampleID)
# female
reorde_idx_female=match(rownames(grm_subset_female), pheno_pcs_female$SampleID)

# reorder pheno based on index
# male
pheno_pcs_male_or=pheno_pcs_male[reorde_idx_male,]
# female
pheno_pcs_female_or=pheno_pcs_female[reorde_idx_female,]


### Quantitative Phenotypes
# Substitute here the quantitative phenotypes you have for this cohort (this example will be only for "AgeSmkInit")
# get phenotypes to run linear models
# for linear variables (quantitative traits)
# for quantitative traits we are going to use a Gaussian distribution for the underlying model. Please make sure that the "family" in the lmmkin function is set as: gaussian(link = "identitty")
# this step will run separetely for each phenotype

## Age of Smoking Initiation (AgeSmkInit)
# male
model_null_AgeSmkInit_10pcs_male=glmmkin(fixed = AgeSmkInit ~ age + PC1 + PC2 + PC3 + PC4 + PC5 + PC6 + PC7 + PC8 + PC9 + PC10,
                                      data=pheno_pcs_male_or,
                                      id="SampleID",
                                      kins=grm_subset_male,
                                      family=gaussian(link = "identity"))

# female
model_null_AgeSmkInit_10pcs_female=glmmkin(fixed = AgeSmkInit ~ age + PC1 + PC2 + PC3 + PC4 + PC5 + PC6 + PC7 + PC8 + PC9 + PC10,
                                      data=pheno_pcs_female_or,
                                      id="SampleID",
                                      kins=grm_subset_female,
                                      family=gaussian(link = "identity"))

### Binary Phenotypes
# Substitute here the quantitative phenotypes you have for this cohort (this example will be only for "SmkCes")
# get phenotypes to run logistic models
# adjust coding for Binary Traits
# this step will run separetely for each phenotype
# for binary traits we are going to use a Binomial distribution for the underlying model. Please make sure that the "family" in the lmmkin function is set as: binomial(link = "logit")

## Smoking Cessation (SmkCes)
# male
model_null_SmkCes_10pcs_male=glmmkin(fixed = SmkCes ~ age + PC1 + PC2 + PC3 + PC4 + PC5 + PC6 + PC7 + PC8 + PC9 + PC10,
                   data=pheno_pcs_male_or,
                   id="SampleID",
                   kins=grm_subset_male,
                   family=binomial(link = "logit"))
# female
model_null_SmkCes_10pcs_female=glmmkin(fixed = SmkCes ~ age + PC1 + PC2 + PC3 + PC4 + PC5 + PC6 + PC7 + PC8 + PC9 + PC10,
                   data=pheno_pcs_female_or,
                   id="SampleID",
                   kins=grm_subset_female,
                   family=binomial(link = "logit"))


En este paso, guardamos los modelos nulos que se utilizarán en la segunda etapa de GMMAT para la prueba de asociación.<br>
Los modelos se guardarán como objetos RDS, que luego pueden ser cargados en R.

In [None]:
# save linear null models contruct for Quantitative Traits
# Substitute the path here
# male
AgeSmkInit_null_10pcs_male="/path_to_your_data//model_null_AgeSmkInit_10pcs_male.rds"
saveRDS(file=AgeSmkInit_null_10pcs_male,
        model_null_AgeSmkInit_10pcs_male)
# female
AgeSmkInit_null_10pcs_female="/path_to_your_data/model_null_AgeSmkInit_10pcs_female.rds"
saveRDS(file=AgeSmkInit_null_10pcs_female,
        model_null_AgeSmkInit_10pcs_female)

# save linear null models contruct for Binary Traits
# male
SmkCes_null_10pcs_male="/path_to_your_data/model_null_SmkCes_10pcs_male.rds"
saveRDS(file=SmkCes_null_10pcs_male,
        model_null_SmkCes_10pcs_male)
# female
SmkCes_null_10pcs_female="/path_to_your_data/model_null_SmkCes_10pcs_female.rds"
saveRDS(file=SmkCes_null_10pcs_female,
        model_null_SmkCes_10pcs_female)

# exit R
q()

### 2. Score with GMMAT



### 2.1 Crear script en R para ejecutar el scoring de GMMAT
Rscript: gmmat_wrapper.Rscript<br>
Este script puede descargarse en github [ENLACE AQUÍ](https://github.com/ormondr/Smoking_GWAS_LAGC/blob/main/English/02GWAS/01GMMAT/gmmat_wrapper.Rscript)

In [None]:
####################################################################################
# script to run GMMAT scoring test in PLINK files
# day: 25 jul 2025
# authors: Rafaella Ormond and Jose Jaime Martinez-Magana
####################################################################################
# This script will run GMMAT using null models performed before
####################################################################################
# loading libraries
library(optparse)
library(GMMAT)
####################################################################################
# set parameters
# this function uses the library optparse to add arguments to the script
# adding arguments to the script
option_list = list(
    make_option(c("--nullmodel_path"), type="character", default=NULL,
                help="complete path to rds object having the null model built with GMMAT. Example: /data/nul_model/null_model_hgt.rds ", metavar="character"),
    make_option(c("--genofile"), type="character", default=NULL,
                help="path to the plink files. Example: /data/example.bed", metavar="character"),
    make_option(c("--remove_fid"), type="character", default=NULL,
                help="TRUE if the script should split the character to split the FID and IID in the null model, based on underscore _ Example: FID_IID to only IID", metavar="character"),
    make_option(c("--outfile_path"), type="character", default=NULL,
                help="output file name for the tsv with association statistics. Example /data/analysis/results_ancestry_specific.tsv", metavar="character")
);
# setting parameters
opt_parser = OptionParser(option_list=option_list);
opt = parse_args(opt_parser);
####################################################################################
# reading data
# reading null models
model0=readRDS(opt$nullmodel_path)
# setting path for genofile
geno_file=opt$genofile
# setting path for storing summary stats results
outfile=opt$outfile_path

# splitting FID_and IID
if(opt$remove_fid){
    print(paste0("Removing FID from null models based on _"))
    fid_id_s=strsplit(model0$id_include, "_")
    fid_id_splitted=c()
    for(i in 1:length(fid_id_s)){
        id=fid_id_s[[i]][2]
        fid_id_splitted=c(fid_id_splitted, id)
    }
    model0$id_include=fid_id_splitted
} else {
    model0$id_include=model0$id_include
}

# running scoring test with GMMAT
glmm.score(model0,
           infile=geno_file,
           outfile=outfile)

### Sustituya aquí las rutas y archivos
### Sustituya los fenotipos que tiene en la cohorte

In [None]:
### Substitute here the path and files
### Substitute the phenotypes you have in the cohort

## Age of Smoking Initiation (AgeSmkInit)
## male
gmmat_wrapper="gmmat_wrapper.Rscript"
AgeSmkInit_male_models="/path_to_your_data/model_null_AgeSmkInit_10pcs_male.rds"
genofile="/path_to_your_data/cohort_name.allchr-merge"
outfile="/path_to_your_data/cohort_path/AgeSmkInit_male_score.txt"
# testing script
Rscript ${gmmat_wrapper} --nullmodel_path=${AgeSmkInit_male_models} \
--genofile=${genofile} \
--remove_fid=FALSE \
--outfile=${outfile}

## female
gmmat_wrapper="gmmat_wrapper.Rscript"
AgeSmkInit_female_models="/path_to_your_data/model_null_AgeSmkInit_10pcs_female.rds"
genofile="/path_to_your_data/cohort_name.allchr-merge"
outfile="/path_to_your_data/AgeSmkInit_female_score.txt"
# testing script
Rscript ${gmmat_wrapper} --nullmodel_path=${AgeSmkInit_female_models} \
--genofile=${genofile} \
--remove_fid=FALSE \
--outfile=${outfile}

## Smoking Cessation (SmkCes)
## male
gmmat_wrapper="gmmat_wrapper.Rscript"
SmkCes_male_models="/path_to_your_data/odel_null_SmkCes_10pcs_male.rds"
genofile="/path_to_your_data/cohort_name.allchr-merge"
outfile="/path_to_your_data/SmkCes_male_score.txt"
# testing script
Rscript ${gmmat_wrapper} --nullmodel_path=${SmkCes_male_models} \
--genofile=${genofile} \
--remove_fid=FALSE \
--outfile=${outfile}

## female
gmmat_wrapper="gmmat_wrapper.Rscript"
SmkCes_female_models="/path_to_your_data/model_null_SmkCes_10pcs_female.rds"
genofile="/path_to_your_data/cohort_name.allchr-merge"
outfile="/path_to_your_data/SmkCes_female_score.txt"
# testing script
Rscript ${gmmat_wrapper} --nullmodel_path=${SmkCes_female_models} \
--genofile=${genofile} \
--remove_fid=FALSE \
--outfile=${outfile}
