## How to query EpiGraphDB to find all traits with effect on outcome of interest

The basic examples of querying EpiGraphDB with `epigraphdb-r` R package are provided in the ["Getting started"](https://mrcieu.github.io/epigraphdb-r/articles/getting-started-with-epigraphdb-r.html#explore-mendelian-randomization-studies) guide.

To collects all traits connected to an outcome of interest via MR-EvE, we need to perform a more complex query using Cypher that will be processed by `epigraphdb-r`. Some examples of this are provide in the guide's [_Advanced examples_ ](https://mrcieu.github.io/epigraphdb-r/articles/getting-started-with-epigraphdb-r.html#advanced-examples).

In [1]:
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(epigraphdb))

In [4]:
#  function that will export query result as a table
query_epigraphdb_as_table <- function(query){
  results_subset <- query_epigraphdb(
    route = "/cypher",
    params = list(query = query),
    method = "POST",
    mode = "table")
}

### Basic query to extract all exposures for one outcome (`ieu-a-1126`):

In [18]:
# query all MR results for the outcomes, not restricting by p-value
query = 
  paste0("
    MATCH (exposure:Gwas)-[mr:MR_EVE_MR]->(outcome:Gwas)
    WHERE outcome.id = 'ieu-a-1126' 
    RETURN exposure.id, exposure.trait, outcome.id, 
              mr.pval, mr.b, mr.se, mr.nsnp,mr.method, mr.moescore
    ") 

results <- query_epigraphdb_as_table(query)

The query finds all `Gwas` nodes that are connected via `MR_EVE_MR` relationship; they form pairs of exposure and outcome GWAS. Then we subset the result to only have outcomes = `ieu-a-1126`. Next, we return specified columns of the query results.

In [11]:
head(results)

exposure.id,exposure.trait,outcome.id,mr.pval,mr.b,mr.se,mr.nsnp,mr.method,mr.moescore
<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<int>,<chr>,<dbl>
ukb-d-XIII_MUSCULOSKELET,Diseases of the musculoskeletal system and connective tissue,ieu-a-1126,0.3858781,0.30825968,0.3554997,5,FE IVW,1.0
ukb-d-XI_DIGESTIVE,Diseases of the digestive system,ieu-a-1126,0.1808853,0.62550974,0.4674841,3,FE IVW,1.0
ukb-d-PULM_MEDICATIO_COMORB,Medication related adverse effects (Asthma/COPD),ieu-a-1126,0.5327973,-0.41299666,0.6621278,11,Weighted median,0.79
ukb-d-RHEUMA_NOS,Other/unspecified rheumatoid arthritis,ieu-a-1126,0.9915101,0.01963965,1.8457168,5,FE IVW,1.0
ukb-d-ULCERNAS,"Ulcerative colitis, NAS",ieu-a-1126,0.7811931,0.7316548,2.6340832,5,FE IVW,1.0
ukb-d-M13_SOFTTISSUEOTH,"Other soft tissue disorders, not elsewhere classified",ieu-a-1126,0.5,-2.67781029,-2.6778103,1,Wald ratio,1.0


### Query to find all expsoures for multple outcomes

Analogous query was run to collect all expsures (i.e. potential risk factors) for all available breast cancer outcome. Here we include three outcomes.

In [20]:
# list of outcome datasets
outcomes_list <- c('ieu-a-1126', 'ieu-a-1127', 'ieu-a-1128')

# query all MR results for the outcomes, not restricting by p-value
query = 
  paste0("
    MATCH (exposure:Gwas)-[mr:MR_EVE_MR]->(outcome:Gwas)
    WHERE outcome.id in ['", paste0(outcomes_list, collapse = "', '"),"'] 
    AND  not exposure.id  in ['", paste0(outcomes_list, collapse = "', '"),"']
    AND (not (toLower(exposure.trait) contains 'breast')) 
    AND mr.pval < 1
    with mr, exposure, outcome
    ORDER BY mr.pval 
    RETURN exposure.id, exposure.trait, exposure.sample_size, exposure.sex, exposure.note,
          toInteger(exposure.year) as year, exposure.author as author, exposure.consortium as consortium,
              outcome.id, outcome.sample_size, toInteger(outcome.ncase) as N_case, outcome.year, outcome.nsnp,
              mr.pval, mr.b, mr.se,mr.nsnp,mr.method, mr.moescore
    ") 

results_multiple <- query_epigraphdb_as_table(query)

In the query, we subset all result to three breast cancer outcomes (`outcomes_list`), and we also make sure they are not present among the exposures. We also exclude all expsoure that have key word `breast` in them (e.g. other breast cancer GWAS that are not in our `outcomes_list`). Next, by setting `mr.pval < 1` we confirm that we want to extract all results (as we will do filtering later), although one can filter by p-value in the query too. Finally, in this query, we extract a lot more columns.

In [21]:
head(results_multiple)

exposure.id,exposure.trait,exposure.sample_size,exposure.sex,exposure.note,year,author,consortium,outcome.id,outcome.sample_size,N_case,outcome.year,outcome.nsnp,mr.pval,mr.b,mr.se,mr.nsnp,mr.method,mr.moescore
<chr>,<chr>,<chr>,<chr>,<chr>,<int>,<chr>,<chr>,<chr>,<chr>,<int>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<int>,<chr>,<dbl>
ukb-d-20406,Ever addicted to alcohol,6514.0,Males and Females,,2018,Neale lab,,ieu-a-1126,228951.0,122977,2017.0,10680257,0,-0.05482452,0.0010902995,2,FE IVW,1
prot-a-2007,Neural cell adhesion molecule 1,3301.0,Males and Females,,2018,Sun BB,,ieu-a-1127,175475.0,69501,2017.0,10680257,0,0.05072097,0.0009244511,2,FE IVW,1
ukb-b-9127,Illnesses of father: Chronic bronchitis/emphysema,402389.0,Males and Females,20107#6: Output from GWAS pipeline using Phesant derived variables from UKBiobank,2018,Ben Elsworth,MRC-IEU,ieu-a-1127,175475.0,69501,2017.0,10680257,0,0.9729306,0.008222273,2,FE IVW,1
ukb-b-3672,Diagnoses - secondary ICD10: K44.9 Diaphragmatic hernia without obstruction or gangrene,463010.0,Males and Females,41204#K449: Output from GWAS pipeline using Phesant derived variables from UKBiobank,2018,Ben Elsworth,MRC-IEU,ieu-a-1127,175475.0,69501,2017.0,10680257,0,-0.29575793,0.0051181577,2,FE IVW,1
ukb-a-60,Cancer code self-reported: squamous cell carcinoma,337159.0,Males and Females,,2017,Neale,Neale Lab,ieu-a-1127,175475.0,69501,2017.0,10680257,0,2.05372125,0.0516566127,2,FE IVW,1
ukb-a-295,Chest pain or discomfort,334053.0,Males and Females,,2017,Neale,Neale Lab,ieu-a-1128,127442.0,21468,2017.0,10680257,0,1.4451447,0.0121743415,2,FE IVW,1
