# GWAS Locus Browser Other Summary Stats Script
- **Author** - Frank Grenn
- **Date Started** - January 2020
- **Quick Description:** organize the summary statistics from other relevant GWAS studies into files for the app.
- **Data:** 
input files obtained from: [META5](https://www.ncbi.nlm.nih.gov/pubmed/31701892), [Age at onset GWAS](https://www.ncbi.nlm.nih.gov/pubmed/30957308), [GBA modifier GWAS](https://www.ncbi.nlm.nih.gov/pubmed/31755958), [LRRK2 modifier GWAS](https://onlinelibrary.wiley.com/doi/full/10.1002/mds.27974)  
final statistics curated by Cornelis Blauwendraat


In [None]:
import pandas as pd


In [5]:
loci = pd.read_csv("$PATH1/GWAS_loci_overview.csv")
print(loci.head())

   Locus Number DONE?  Date when done? Volunteer1 Volunteer2          SNP  \
0             1   NaN              NaN  Corrnelis      Lynne  rs114138760   
1             1   NaN              NaN  Corrnelis      Lynne   rs35749011   
2             1   NaN              NaN  Corrnelis      Lynne   rs76763715   
3             2   NaN              NaN        NaN        NaN    rs6658353   
4             3   NaN              NaN    Jillian   Emmeline   rs11578699   

  CHR:BP (hg19)     full region (hg19)  Number of genes  CHR  ...  \
0   1:154898185  1:153898185-155898185               92    1  ...   
1   1:155135036  1:154135036-156135036               92    1  ...   
2   1:155205634  1:154205634-156205634               92    1  ...   
3   1:161469054  1:160469054-162469054               55    1  ...   
4   1:171719769  1:170719769-172719769               24    1  ...   

  Effect allele Other allele Effect allele frequency Beta, all studies  \
0             c            g                  0.

### age of onset

In [3]:
aoo = pd.read_csv("$PATH2/sorted_AAO_april3_18_final_all_data.txt", sep="\t")
print(aoo.shape)
print(aoo.head())

(7426111, 15)
      MarkerName Allele1 Allele2   Freq1  FreqSE  MinFreq  MaxFreq  Effect  \
0  chr4:90666041       t       c  0.6148  0.0218   0.5505   0.6301  0.6264   
1  chr4:90641340       t       c  0.3880  0.0210   0.3700   0.4497 -0.6131   
2  chr4:90637601       a       g  0.6136  0.0211   0.5534   0.6319  0.6123   
3  chr4:90678541       a       g  0.5333  0.0281   0.4736   0.5953  0.5440   
4  chr4:90668614       t       c  0.4684  0.0242   0.4226   0.5267 -0.5430   

   StdErr       P-value           Direction  HetISq  HetChiSq  HetDf  HetPVal  
0  0.0890  1.901000e-12  +-++++++-+++++++++    38.7    27.720     17  0.04832  
1  0.0880  3.256000e-12  -+------+---------    37.5    27.189     17  0.05535  
2  0.0881  3.719000e-12  +-++++++-+++++++++    34.3    25.883     17  0.07662  
3  0.0889  9.488000e-10  +-++-++--+++++++++    35.3    26.282     17  0.06947  
4  0.0888  9.550000e-10  -+--+---+---------    34.7    26.027     17  0.07398  


In [12]:
print(list(aoo.columns.values))

['MarkerName', 'Allele1', 'Allele2', 'Freq1', 'FreqSE', 'MinFreq', 'MaxFreq', 'Effect', 'StdErr', 'P-value', 'Direction', 'HetISq', 'HetChiSq', 'HetDf', 'HetPVal']


In [9]:
loci['chrbp'] = 'chr' + loci['CHR:BP (hg19)']
print(loci.head())

   Locus Number DONE?  Date when done? Volunteer1 Volunteer2          SNP  \
0             1   NaN              NaN  Corrnelis      Lynne  rs114138760   
1             1   NaN              NaN  Corrnelis      Lynne   rs35749011   
2             1   NaN              NaN  Corrnelis      Lynne   rs76763715   
3             2   NaN              NaN        NaN        NaN    rs6658353   
4             3   NaN              NaN    Jillian   Emmeline   rs11578699   

  CHR:BP (hg19)     full region (hg19)  Number of genes  CHR  ...  \
0   1:154898185  1:153898185-155898185               92    1  ...   
1   1:155135036  1:154135036-156135036               92    1  ...   
2   1:155205634  1:154205634-156205634               92    1  ...   
3   1:161469054  1:160469054-162469054               55    1  ...   
4   1:171719769  1:170719769-172719769               24    1  ...   

  Other allele Effect allele frequency Beta, all studies SE, all studies  \
0            g                  0.0112        

In [10]:
merge_aoo = pd.merge(loci, aoo, how='left', left_on='chrbp', right_on='MarkerName')
print(merge_aoo.shape)
print(merge_aoo.head())

(90, 39)
   Locus Number DONE?  Date when done? Volunteer1 Volunteer2          SNP  \
0             1   NaN              NaN  Corrnelis      Lynne  rs114138760   
1             1   NaN              NaN  Corrnelis      Lynne   rs35749011   
2             1   NaN              NaN  Corrnelis      Lynne   rs76763715   
3             2   NaN              NaN        NaN        NaN    rs6658353   
4             3   NaN              NaN    Jillian   Emmeline   rs11578699   

  CHR:BP (hg19)     full region (hg19)  Number of genes  CHR  ... MinFreq  \
0   1:154898185  1:153898185-155898185               92    1  ...  0.0049   
1   1:155135036  1:154135036-156135036               92    1  ...  0.0084   
2   1:155205634  1:154205634-156205634               92    1  ...  0.9864   
3   1:161469054  1:160469054-162469054               55    1  ...  0.4543   
4   1:171719769  1:170719769-172719769               24    1  ...  0.1565   

  MaxFreq  Effect  StdErr   P-value           Direction  HetISq  

In [31]:
final_aoo = merge_aoo[['SNP','Allele1','Allele2','Freq1','Effect','StdErr','P-value']]
print(final_aoo.head())

           SNP Allele1 Allele2   Freq1  Effect  StdErr   P-value
0  rs114138760       c       g  0.0107  0.2097  0.4276  0.623900
1   rs35749011       a       g  0.0237 -0.8508  0.2820  0.002551
2   rs76763715       t       c  0.9950  2.5995  0.5533  0.000003
3    rs6658353       c       g  0.5017 -0.0650  0.0877  0.458700
4   rs11578699       t       c  0.1948 -0.0195  0.1132  0.863100


In [32]:
final_aoo.to_csv("$PATH1/results/aoo_stats.csv",index=False)

### GBA age of onset

In [6]:
gba_aoo = pd.read_csv("$PATH2/GBA_case_age_at_onset_GWAS.assoc", sep="\t")
print(gba_aoo.shape)
print(gba_aoo.head())

(692963, 9)
   CHROM      POS REF ALT  N_INFORMATIVE       Test      Beta        SE  \
0      1  1060355   G   C           1353  1:1060355 -0.556163  0.454815   
1      1  1061166   T   C           1353  1:1061166 -0.509397  0.442056   
2      1  1062025   A   G           1353  1:1062025 -0.624011  0.441528   
3      1  1062638   A   C           1353  1:1062638 -0.351802  0.441894   
4      1  1063241   T   G           1353  1:1063241 -0.310846  0.445900   

     Pvalue  
0  0.221393  
1  0.249183  
2  0.157568  
3  0.425961  
4  0.485727  


In [15]:
merge_gba_aoo = pd.merge(loci, gba_aoo, how='left', left_on='CHR:BP (hg19)', right_on='Test')
print(merge_gba_aoo.shape)
print(merge_gba_aoo.head())

(90, 33)
   Locus Number DONE?  Date when done? Volunteer1 Volunteer2          SNP  \
0             1   NaN              NaN  Corrnelis      Lynne  rs114138760   
1             1   NaN              NaN  Corrnelis      Lynne   rs35749011   
2             1   NaN              NaN  Corrnelis      Lynne   rs76763715   
3             2   NaN              NaN        NaN        NaN    rs6658353   
4             3   NaN              NaN    Jillian   Emmeline   rs11578699   

  CHR:BP (hg19)     full region (hg19)  Number of genes  CHR  ...  \
0   1:154898185  1:153898185-155898185               92    1  ...   
1   1:155135036  1:154135036-156135036               92    1  ...   
2   1:155205634  1:154205634-156205634               92    1  ...   
3   1:161469054  1:160469054-162469054               55    1  ...   
4   1:171719769  1:170719769-172719769               24    1  ...   

            chrbp CHROM          POS  REF  ALT  N_INFORMATIVE         Test  \
0  chr1:154898185   1.0  154898185.

In [16]:
final_gba_aoo = merge_gba_aoo[['SNP','REF','ALT','Beta','SE','Pvalue']]
print(final_gba_aoo.head())

           SNP  REF  ALT      Beta        SE    Pvalue
0  rs114138760    G    C  0.378928  1.038380  0.715169
1   rs35749011    G    A  0.522085  0.661859  0.430220
2   rs76763715  NaN  NaN       NaN       NaN       NaN
3    rs6658353    C    G -0.849169  0.418237  0.042321
4   rs11578699    C    T -0.043143  0.538940  0.936197


In [17]:
final_gba_aoo.to_csv("$PATH1/results/gba_aoo_stats.csv",index=False)

### GBA modifier

In [7]:
meta_gba = pd.read_csv("$PATH2/META_GBA_penetrance_modifier_carriers_GWAS.txt", sep="\t")
print(meta_gba.shape)
print(meta_gba.head())

(684609, 15)
   MarkerName Allele1 Allele2   Freq1  FreqSE  MinFreq  MaxFreq  Effect  \
0  4:90637601       a       g  0.6095  0.0045   0.6080   0.6232 -0.3191   
1  4:90635338       c       g  0.0796  0.0039   0.0694   0.0811  0.5733   
2  4:90641340       t       c  0.3916  0.0048   0.3771   0.3932  0.3139   
3  4:90666041       t       c  0.6101  0.0041   0.6087   0.6224 -0.3127   
4  4:90684278       a       g  0.9264  0.0015   0.9258   0.9299 -0.5916   

   StdErr       P-value Direction  HetISq  HetChiSq  HetDf  HetPVal  
0  0.0650  9.257000e-07        --    77.5     4.451      1  0.03488  
1  0.1184  1.285000e-06        ++    45.8     1.845      1  0.17440  
2  0.0651  1.410000e-06        ++    78.1     4.563      1  0.03268  
3  0.0650  1.485000e-06        --    73.4     3.755      1  0.05265  
4  0.1236  1.689000e-06        --     0.0     0.821      1  0.36490  


In [18]:
merge_gba = pd.merge(loci, meta_gba, how='left', left_on='CHR:BP (hg19)', right_on='MarkerName')
print(merge_gba.shape)
print(merge_gba.head())

(90, 39)
   Locus Number DONE?  Date when done? Volunteer1 Volunteer2          SNP  \
0             1   NaN              NaN  Corrnelis      Lynne  rs114138760   
1             1   NaN              NaN  Corrnelis      Lynne   rs35749011   
2             1   NaN              NaN  Corrnelis      Lynne   rs76763715   
3             2   NaN              NaN        NaN        NaN    rs6658353   
4             3   NaN              NaN    Jillian   Emmeline   rs11578699   

  CHR:BP (hg19)     full region (hg19)  Number of genes  CHR  ... MinFreq  \
0   1:154898185  1:153898185-155898185               92    1  ...  0.0433   
1   1:155135036  1:154135036-156135036               92    1  ...  0.3021   
2   1:155205634  1:154205634-156205634               92    1  ...     NaN   
3   1:161469054  1:160469054-162469054               55    1  ...  0.4974   
4   1:171719769  1:170719769-172719769               24    1  ...  0.1889   

  MaxFreq  Effect  StdErr   P-value  Direction  HetISq  HetChiSq 

In [33]:
final_gba = merge_gba[['SNP','Allele1','Allele2','Freq1','Effect','StdErr','P-value']]
print(final_gba.head())

           SNP Allele1 Allele2   Freq1  Effect  StdErr   P-value
0  rs114138760       c       g  0.0453 -0.1422  0.1515  0.347800
1   rs35749011       a       g  0.3075  0.2988  0.1039  0.004036
2   rs76763715     NaN     NaN     NaN     NaN     NaN       NaN
3    rs6658353       c       g  0.5018  0.0499  0.0625  0.424900
4   rs11578699       t       c  0.1900 -0.0564  0.0780  0.469700


In [34]:
final_gba.to_csv("$PATH1/results/gba_stats.csv",index=False)

### LRRK2 modifier

In [25]:
lrrk2 = pd.read_csv("$PATH1/othersummarystats/LRRK2_GWAS_risk_Variants.csv")
print(lrrk2.shape)
print(lrrk2.head())

(88, 6)
          name      beta  std.error   p.value A1  A1_freq
0   rs823118_C -0.413478   0.152453  0.006684  C   0.4225
1  rs2280104_T  0.387323   0.151989  0.010823  T   0.3684
2    rs26431_G -0.394072   0.162944  0.015587  G   0.2970
3  rs8087969_G  0.339419   0.141325  0.016319  G   0.4498
4  rs2251086_T -0.443993   0.214743  0.038682  T   0.1346


In [27]:
lrrk2['name']=lrrk2['name'].astype(str).str[:-2]
print(lrrk2.head())

        name      beta  std.error   p.value A1  A1_freq
0   rs823118 -0.413478   0.152453  0.006684  C   0.4225
1  rs2280104  0.387323   0.151989  0.010823  T   0.3684
2    rs26431 -0.394072   0.162944  0.015587  G   0.2970
3  rs8087969  0.339419   0.141325  0.016319  G   0.4498
4  rs2251086 -0.443993   0.214743  0.038682  T   0.1346


In [28]:
merge_lrrk2 = pd.merge(loci, lrrk2, how='left', left_on='SNP', right_on='name')
print(merge_lrrk2.shape)
print(merge_lrrk2.head())

(90, 30)
   Locus Number DONE?  Date when done? Volunteer1 Volunteer2          SNP  \
0             1   NaN              NaN  Corrnelis      Lynne  rs114138760   
1             1   NaN              NaN  Corrnelis      Lynne   rs35749011   
2             1   NaN              NaN  Corrnelis      Lynne   rs76763715   
3             2   NaN              NaN        NaN        NaN    rs6658353   
4             3   NaN              NaN    Jillian   Emmeline   rs11578699   

  CHR:BP (hg19)     full region (hg19)  Number of genes  CHR  ...  \
0   1:154898185  1:153898185-155898185               92    1  ...   
1   1:155135036  1:154135036-156135036               92    1  ...   
2   1:155205634  1:154205634-156205634               92    1  ...   
3   1:161469054  1:160469054-162469054               55    1  ...   
4   1:171719769  1:170719769-172719769               24    1  ...   

  Known GWAS locus within 1MB Locus within 250KB Odds Ratio           chrbp  \
0                           1     

In [35]:
final_lrrk2 = merge_lrrk2[['SNP','A1','A1_freq','beta','std.error','p.value']]
print(final_lrrk2.head())

           SNP A1   A1_freq      beta  std.error   p.value
0  rs114138760  C  0.010380 -0.161807   1.511186  0.914731
1   rs35749011  A  0.017370 -1.126067   0.828984  0.174346
2   rs76763715  C  0.006009 -0.268274   0.450911  0.551870
3    rs6658353  C  0.497200 -0.291482   0.143856  0.042743
4   rs11578699  T  0.192400 -0.388740   0.190157  0.040922


In [36]:
final_lrrk2.to_csv("$PATH1/results/lrrk2_stats.csv",index=False)