# Taller 5. GWAS-Mapeo por asociación para un rasgo complejo de interés
Genética de rasgos complejos 2024-01 - Universidad Nacional de Colombia<br>
Docente Johana Carolina Soto Sedano - jcsotos@unal.edu.co
 
__Hans D. Escobar H.__ ([hdescobarh](https://github.com/hdescobarh))

In [8]:
# Configuración
options(digits = 3)

# Validaciones

results_directory <- "../Results/ncomponents_3"
if (!dir.exists(results_directory)) {
  stop("Results directory does not exist.", call. = FALSE)
}

phenotypes <- c("Cyanidin", "Delphinidin")
missing <- character()
results_paths <- character()
for (p in phenotypes) {
  current_file <- sprintf(
    "%s/GAPIT.Association.GWAS_Results.MLM.%s.csv",
    results_directory, p
  )
  if (file.exists(current_file)) {
    results_paths <- append(results_paths, current_file)
  } else {
    missing <- append(missing, p)
  }
}

if (length(missing) > 0) {
  stop(
    paste(
      "Did not found the following phenotypes: ",
      paste(missing, collapse = ", ")
    ),
    call. = FALSE
  )
} else {
  names(results_paths) <- phenotypes
}


## SNPs significativamente asociados


In [9]:
obtener_significativos <- function(
    results_file, p_value_threshold = 0.05, fdr_threshold = 0.05) {
  df <- read.csv(
    results_file,
    row.names = 1
  )
  df <- df[
    df$P.value < p_value_threshold & df$H.B.P.Value < fdr_threshold,
  ]
  df[order(df$P.value), ]
  df <- subset(df, select = -c(4:5, 7))
  df
}
significativos <- lapply(results_paths, obtener_significativos)


In [10]:
# Encontrar tablas con información del la varianza fenotípica explicada
pve_files <- list.files(results_directory, "*PVE.MLM*")
pve_files


In [11]:
# Solo Cianidina tiene PVE, extraerlo y anexarlo
significativos$Cyanidin <- merge(
  significativos$Cyanidin,
  subset(read.csv(
    paste(results_directory, "/", pve_files[[1]], sep = ""),
    row.names = 1
  ), select = c("Phenotype_Variance_Explained...")),
  by = "row.names"
)
rownames(significativos$Cyanidin) <- significativos$Cyanidin$Row.names
significativos$Cyanidin <- subset(significativos$Cyanidin, select = -c(1))
colnames(significativos$Cyanidin)[
  which(
    names(significativos$Cyanidin) == "Phenotype_Variance_Explained..."
  )
] <- "PVE"


### Cianidina

In [12]:
significativos$Cyanidin


Unnamed: 0_level_0,Chr,Pos,P.value,H.B.P.Value,PVE
Unnamed: 0_level_1,<chr>,<int>,<dbl>,<dbl>,<dbl>
SNP54914,ST4.03ch08,20571240,4.59e-08,0.00384,14.84
SNP83234,ST4.03ch12,59154862,3.12e-07,0.00809,68.94
SNP9317,ST4.03ch01,76112400,3.86e-07,0.00809,3.12
SNP9318,ST4.03ch01,76112450,3.86e-07,0.00809,2.32


In [13]:
significativos$Cyanidin


Unnamed: 0_level_0,Chr,Pos,P.value,H.B.P.Value,PVE
Unnamed: 0_level_1,<chr>,<int>,<dbl>,<dbl>,<dbl>
SNP54914,ST4.03ch08,20571240,4.59e-08,0.00384,14.84
SNP83234,ST4.03ch12,59154862,3.12e-07,0.00809,68.94
SNP9317,ST4.03ch01,76112400,3.86e-07,0.00809,3.12
SNP9318,ST4.03ch01,76112450,3.86e-07,0.00809,2.32


![GWAS_Manhattan_Cyanidin](../Results/ncomponents_3/GAPIT.Association.Manhattan_Geno.MLM.Cyanidin_1.png)

### Delfinidina

In [14]:
significativos$Delphinidin


Unnamed: 0_level_0,Chr,Pos,P.value,H.B.P.Value
Unnamed: 0_level_1,<chr>,<int>,<dbl>,<dbl>
SNP12429,ST4.03ch02,7161936,2.91e-06,0.0489
SNP12430,ST4.03ch02,7161938,2.91e-06,0.0489
SNP12431,ST4.03ch02,7161942,2.91e-06,0.0489
SNP12432,ST4.03ch02,7161962,2.91e-06,0.0489
SNP12433,ST4.03ch02,7162010,2.91e-06,0.0489


![GWAS_Manhattan_Delphinidin](../Results/ncomponents_3/GAPIT.Association.Manhattan_Geno.MLM.Delphinidin_1.png)

Me llama la atención que GAPIT consideró que los resultados para Delfinidina fueron no significativos. Sin embargo, el tipo _double_ tiene precisión entorno a 6-7 cifras significativas, y 0.0488524 < 0.05; por lo que, en el sentido estricto, es un positivo.

## Ajuste del modelo

![QQ_plot_Cyanidin](../Results/ncomponents_3/GAPIT.Association.QQ.MLM.Cyanidin_1.png)

![QQ_plot_Delphinidin](../Results/ncomponents_3/GAPIT.Association.QQ.MLM.Delphinidin_1.png)

## Estructura poblacional

![Kinship](../Results/ncomponents_3/GAPIT.Genotype.Kin_Zhang_1.png)

## Bibliografía

Wang, J., & Zhang, Z. (2021). GAPIT Version 3: Boosting Power and Accuracy for Genomic Association and Prediction. Genomics, Proteomics & Bioinformatics, 19(4), 629–640. https://doi.org/10.1016/j.gpb.2021.08.005
