#### Main methodology results

*Disclaimer*: please note that many parts of this code require the preprocessed data from ADNI (both genetic and diagnostic related) as input. This data has not been uploaded to the repository for privacy reasons.

In [None]:

import networks.get_PPI_STRING as string
import create_datasets.create_nx_datasets
import networkx as nx
import datetime, pickle, subprocess


**1. Obtain genes of interest**

Using DisGeNET to get Gene-Disease-Associations (GDAs) to Alzheimer's Disease (AD gene set) and other neurodegenerative diseases (ND). This is already obtained from [first part of the results](1_main_methodology.ipynb).

**2. Obtain biological networks**

Using genes of interst obtained from DisGeNET, obtain PPI between them from STRING. This is already obtained from [first part of the results](1_main_methodology.ipynb).

**3. Data preprocessing**

Please refer to `data_preprocessing` subdirectory for this part.
1. [make_BED_files.R](data_preprocessing/make_BED_files.R) creates BED files with the genomic coordinates of the genes of interest. This is already obtained from [first part of the results](1_main_methodology.ipynb).
2. [extract_and_annotate_missense_LOAD.sh](data_preprocessing/extract_and_annotate_missense_LOAD.sh) is the script for obtaining missense variants from the VCF files.

**4. Create graph datasets**

Create graph datasets (one graph representing each patient) for different targets with ADNI dataset.

In [None]:
dataset = 'LOAD'
target  = 'LOAD'
diseases = ['AD', 'ND']
network = 'original'

for disease in diseases:

    indir = 'data'
    outdir = f'data/graph_datasets/{target}'
    print('Input directory:', indir)
    print('Output directory:', outdir)
    print()

    start_time = datetime.datetime.now()
    print()

    result_nodes = create_datasets.create_nx_datasets.main(indir, dataset, target, disease, network, 'missense', None)
    print('Coding: number of missense variants per node')

    outfile = f'{outdir}/{disease}_PPI_missense.pkl'
    print('Resulting dataset saved at:', outfile)
    print()

    with open(outfile, 'wb') as f:
        pickle.dump(result_nodes, f)

    result_nodes_time = datetime.datetime.now()
    print('Processing time:', result_nodes_time - start_time)
    print('\n\n')

Create graph datasets without APOE for different targets with ADNI dataset.

In [None]:
dataset = 'ADNI'
targets = ['PET', 'PETandDX']
diseases = ['AD' 'ND']
network = 'noAPOE'

for target in targets:

    for disease in diseases:

        indir = 'data'
        outdir = f'data/graph_datasets/{target}'
        print('Input directory:', indir)
        print('Output directory:', outdir)
        print()

        start_time = datetime.datetime.now()
        print()

        result_nodes = create_datasets.create_nx_datasets.main(indir, dataset, target, disease, network, 'missense', None)
        print('Coding: number of missense variants per node')

        outfile = f'{outdir}/{disease}_PPI_noAPOE_missense.pkl'
        print('Resulting dataset saved at:', outfile)
        print()

        with open(outfile, 'wb') as f:
            pickle.dump(result_nodes, f)

        result_nodes_time = datetime.datetime.now()
        print('Processing time:', result_nodes_time - start_time)
        print('\n\n')

**4. Graph classification with GNNs**

We then evaluated and tested different GNNs in the framework called [GraphGym](https://github.com/snap-stanford/GraphGym) (You *et al.*, 2020).

Configuration and grid files employed are in the subdirectory [graphgym_files](graphgym_files).

Summarized results obtained by GraphGym and other models are in [results/GNN_comparison](results/GNN_comparison)