# Extract literature space for a GWAS trait

This script show how to run a query to extract a literature space of literature-mined triples for a GWAS trait from EpiGraphDB.

In [1]:
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(epigraphdb))

In [3]:
#  function that will export query result as a table
query_epigraphdb_as_table <- function(query){
  results_subset <- query_epigraphdb(
    route = "/cypher",
    params = list(query = query),
    method = "POST",
    mode = "table")
}

In [14]:
# set GWAS id
gwas_id = 'prot-a-710' # cardiotrophin-1

In [15]:
# query for extracting a literature space of triples
query =  paste0("
    MATCH (gwas:Gwas)-[gs1:GWAS_TO_LITERATURE_TRIPLE]->(s1:LiteratureTriple) -[:SEMMEDDB_OBJ]->(st:LiteratureTerm)
    WHERE gwas.id = '", gwas_id, "'
    AND gs1.pval < 0.01
    MATCH (s1)-[:SEMMEDDB_SUB]-(st1:LiteratureTerm) 
    MATCH (gwas)-[:GWAS_TO_LITERATURE]-(lit:Literature)-[]-(s1)
    RETURN lit.id, lit.year,  gwas {.id, .trait}, 
    gs1 {.pval, .localCount}, st1 {.name, .type}, s1 {.id, .subject_id, .object_id, .predicate}, st {.name, .type}
    ")
litspace <- query_epigraphdb_as_table(query)

In [16]:
head(litspace)

lit.id,lit.year,gwas.trait,gwas.id,gs1.localCount,gs1.pval,st1.name,st1.type,s1.subject_id,s1.predicate,s1.id,s1.object_id,st.name,st.type
<chr>,<int>,<chr>,<chr>,<int>,<dbl>,<chr>,<list>,<chr>,<chr>,<chr>,<chr>,<chr>,<list>
11749038,2001,Cardiotrophin-1,prot-a-710,2,7.660981e-07,cytokine,"aapp, gngm",C0079189,STIMULATES,C0079189:STIMULATES:C0044602,C0044602,1-Phosphatidylinositol 3-Kinase,"aapp, gngm, enzy"
22207116,2012,Cardiotrophin-1,prot-a-710,2,8.362033e-08,cardiotrophin 1,"aapp, gngm",C0294361,ASSOCIATED_WITH,C0294361:ASSOCIATED_WITH:C1135196,C1135196,Diastolic heart failure,dsyn
15361284,2004,Cardiotrophin-1,prot-a-710,2,8.362033e-08,STAT3,"aapp, gngm",6774,STIMULATES,6774:STIMULATES:C0294361,C0294361,cardiotrophin 1,"aapp, gngm"
15361284,2004,Cardiotrophin-1,prot-a-710,2,8.362033e-08,STAT3 gene,"aapp, gngm",C1367307,STIMULATES,C1367307:STIMULATES:C0294361,C0294361,cardiotrophin 1,"aapp, gngm"
16269246,2005,Cardiotrophin-1,prot-a-710,2,0.0005867606,nesiritide,"aapp, gngm, phsu, horm",C0054015,ASSOCIATED_WITH,C0054015:ASSOCIATED_WITH:C0018801,C0018801,Heart failure,dsyn
12948841,2003,Cardiotrophin-1,prot-a-710,2,0.0005867606,nesiritide,"aapp, gngm, phsu, horm",C0054015,ASSOCIATED_WITH,C0054015:ASSOCIATED_WITH:C0018801,C0018801,Heart failure,dsyn


NB: if a query takes a long time to run, it is likely that literature space for the selected trait is very large (and the query may not complete). It is possible to split the query into two steps to reduce the computational burden. See function [`extract_literature_space`](https://github.com/mvab/epigraphdb-breast-cancer/blob/main/R/02_literature_related/scripts/app3_sankey_app/functions_literature.R#L6) in literature data code in the main dev repo. 

In [11]:
# code for tidying 'type' columns (otherwise won't be able to save the df to file)
litspace <- litspace %>% rowwise() %>% 
      mutate(st1.type = paste0(unlist(st1.type), collapse="/")) %>%  
      mutate(st.type = paste0(unlist(st.type), collapse="/"))

The extracted literature space may be difficult to work with due to multiple reduncancies in term/type names. Further tidying options are available in the main dev repo, e.g. staring with function [`tidy_lit_space`](https://github.com/mvab/epigraphdb-breast-cancer/blob/main/R/02_literature_related/scripts/app3_sankey_app/functions_literature.R#L86).

_The code for literature data tidying and performing literature spaces overalap will be made available in a separate (more structured) tutorial / R package at a later date._