## Basic API Usage of KGWAS

KGWAS consists of two main class `KGWAS` and `KGWAS_Data`. `KGWAS` is the main class for the KGWAS model, and `KGWAS_Data` is the class for the data manipulation. In default, to ensure fast user experience, we provide a default fast mode of KGWAS, which uses Enformer embedding for variant feature and ESM embedding for gene features (instead of the baselineLD for variant and PoPS for gene since they are large files). For the fast mode, you do not need to download any data, the KGWAS API will automatically download the relevant files. This mode can be used to apply KGWAS to your own GWAS sumstats. 

In [1]:
import sys
sys.path.append('../')

from kgwas import KGWAS, KGWAS_Data
data = KGWAS_Data(data_path = './data/')
data.load_kg()

All required data files are present.
--loading KG---
--using enformer SNP embedding--
--using random go embedding--
--using ESM gene embedding--


Now, the data needed for training is downloaded from the server and the knowledge graph is loaded. Next, we load the GWAS file. Here, we are using an example GWAS file, which is also automatically downloaded from the server. But you can also use your own GWAS file. The GWAS file should be in the format of a pandas DataFrame with columns `CHR`/`#CHROM`, `SNP`, `P`, `N`. Note that at the moment, our knowledge graph is UKBioBank directly genotyped variant set so it will automatically takes the overlap with the KG. Current efforts are underway for improving the coverage of the KG.

In [2]:
data.load_external_gwas(example_file = True)
data.process_gwas_file()
data.prepare_split()

Loading example GWAS file...
Example file already exists locally.
Loading GWAS file from ./data/biochemistry_Creatinine_fastgwa_full_10000_1.fastGWA...
Using ldsc weight...
ldsc_weight mean:  0.9999999999999993


In [3]:
data.lr_uni

Unnamed: 0,#CHROM,ID,POS,A1,A2,N,AF1,BETA,SE,P,ld_score,w_ld_score,y
0,1,rs3131962,756604,A,G,9988,0.131007,-0.117134,0.246231,0.634282,72.862240,4.474788,0.226298
1,1,rs12562034,768448,A,G,9978,0.104981,-0.064894,0.273746,0.812611,34.749233,1.877341,0.056197
2,1,rs4040617,779322,G,A,9975,0.129123,-0.001462,0.247254,0.995281,72.271390,4.208873,0.000035
3,1,rs79373928,801536,G,T,9994,0.014659,0.081544,0.688261,0.905688,16.740126,1.949177,0.014037
4,1,rs11240779,808631,G,A,9919,0.226737,-0.184268,0.198982,0.354418,50.215000,2.825456,0.857575
...,...,...,...,...,...,...,...,...,...,...,...,...,...
542753,22,rs73174435,51174939,T,C,9979,0.056118,-0.158762,0.362390,0.661316,21.981667,1.363001,0.191929
542754,22,rs3810648,51175626,G,A,9931,0.058856,0.272493,0.352508,0.439515,34.619377,1.804193,0.597548
542755,22,rs5771002,51183255,A,G,9840,0.333638,0.116325,0.175675,0.507869,16.231083,1.273770,0.438456
542756,22,rs3865764,51185848,G,A,9974,0.051133,-0.026670,0.376132,0.943472,18.649513,1.010000,0.005028


Next, we are ready to train the model! Here we are using epoch = 1 for the demo purpose, but in reality, you should use a higher number of epochs for better performance.

In [4]:
run = KGWAS(data, device = 'cuda:9', exp_name = 'test')
run.initialize_model()
run.train(epoch = 1)

Creating data loader...
Start Training...
Training Progress Epoch 1/1:  52%|█████▏    | 500/956 [12:56<15:47,  2.08s/it]Epoch 1 Step 501 Train Loss: 1.8115
Training Progress Epoch 1/1: 100%|██████████| 956/956 [24:26<00:00,  1.53s/it]
100%|██████████| 50/50 [00:58<00:00,  1.17s/it]
Epoch 1: Validation MSE: 2.1730 Validation Pearson: 0.0096. 
Saving models to ./data//model/test
100%|██████████| 54/54 [00:56<00:00,  1.04s/it]
100%|██████████| 1061/1061 [05:40<00:00,  3.11it/s]


KGWAS prediction and p-values saved to ./data//model_pred/new_experiments/test_pred.csv


The output of the model is saved to `/model_pred/new_experiments/{exp_name}_pred.csv`. You can also load it via `run.kgwas_res`. The model is also saved to `/model/{exp_name}`.

In [5]:
run.kgwas_res

Unnamed: 0,#CHROM,ID,POS,A1,A2,N,AF1,BETA,SE,P,ld_score,w_ld_score,y,pred,P_weighted,KGWAS_P
0,1,rs3131962,756604,A,G,9988,0.131007,-0.117134,0.246231,0.634282,72.862240,4.474788,0.226298,1.082365,0.234167,0.346428
1,1,rs12562034,768448,A,G,9978,0.104981,-0.064894,0.273746,0.812611,34.749233,1.877341,0.056197,1.087724,0.382894,0.566456
2,1,rs4040617,779322,G,A,9975,0.129123,-0.001462,0.247254,0.995281,72.271390,4.208873,0.000035,1.058530,0.995281,1
3,1,rs79373928,801536,G,T,9994,0.014659,0.081544,0.688261,0.905688,16.740126,1.949177,0.014037,1.105125,0.225107,0.333025
4,1,rs11240779,808631,G,A,9919,0.226737,-0.184268,0.198982,0.354418,50.215000,2.825456,0.857575,1.081468,0.041646,0.061612
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
542753,22,rs73174435,51174939,T,C,9979,0.056118,-0.158762,0.362390,0.661316,21.981667,1.363001,0.191929,1.008835,0.233609,0.345602
542754,22,rs3810648,51175626,G,A,9931,0.058856,0.272493,0.352508,0.439515,34.619377,1.804193,0.597548,1.034187,0.439515,0.650221
542755,22,rs5771002,51183255,A,G,9840,0.333638,0.116325,0.175675,0.507869,16.231083,1.273770,0.438456,1.093221,0.449038,0.66431
542756,22,rs3865764,51185848,G,A,9974,0.051133,-0.026670,0.376132,0.943472,18.649513,1.010000,0.005028,0.987747,0.943472,1


If needed, you can load the pre-trained model via `run.load_pretrained()`.

In [None]:
run.load_pretrained('./data/model/test')

This is the basic KGWAS interface! Check out the other notebooks for other capabilities of KGWAS!