<b>Once the genetic loci correlated to a specific trait have been successfully detected, understanding how the variants are affecting the downstream biological pathways is not a trivial task. In opposition to Mendelian disorders in which a variant is affecting a gene and causing the phenotype, non-communicable diseases (NCDs) are highly polygenic disorders in which the phenotype is influenced by an interplay of numerous low-effect SNPs and environmental factors. Moreover, most of the times the SNPs are falling outside coding regions not directly affecting the sequence of a gene and are in close relation one to each other a phenomenon called linkage disequilibrium (LD).
In this view, understanding which are the causal variants as well as how the genes are affected by them is not a trivial task.



![title](Locus.png)

<b>Nowadays is plenty of repositories that collect the output of GWAS where the SNPs are correlated with a trait of interest with a P-value and the effect size of the variant, however the common practice in the case of a SNP falling in a non-coding region of the genome is to assign the closest gene as the functional element.

    
<b>In the last few years however, many works have tried to move on this poor practice by implementing diverse statistical methods, genomics data and tools to obtain a better understanding on how a non-coding variant could affect the phenotype.
A remarkable work was carried out from <a href="https://genetics.opentargets.org/"> Open Target Genetics Portal</a> in which they integrate functional and biological data from multiple sources to detect functionally implicated genes in complex traits.
    
<b> In particular they built a pipeline consisting of several steps;
    <ul>
      <li>Conditional Analysis: determine different signals at a locus</li>
      <li>Fine mapping: determine the trait-causal variants posterior probabilities (PP)</li>
      <li>Colocalization: determine the effect of variants on neighbouring genes expression</li>
      <li>Multiple data mining to describe each GWAS locus</li>
      <li>Machine learning model to learn feature importance and prioritize genes at GWAS loci</li>
    </ul>
        By programmatically querying the (Application Programming Interface) API of this database we will obtain a set of genes likely affected by the trait-associated variants we’ve obtained.


In [1]:
import biomapy as bp
import requests

In [2]:
def QueryOT(ListofVariants,score=0.1,output='genes'):
    query="""{
              genesForVariant(variantId:"%s"){
                overallScore
                gene{id}
              }
            }
            """
    OT_url='https://api.genetics.opentargets.org/graphql'
    
    results={}
    for variant in ListofVariants:
        r = requests.post(OT_url, json={'query': query % (variant)})
        r = r.json()
        ResultsForVariant=[]
        for data in r['data']['genesForVariant']:
            ResultsForVariant.append((data['gene']['id'],data['overallScore']))
        results[variant]=ResultsForVariant
    if output=='all':
        return results
    elif output=='GenScor':
        return {key:[value for value in values if value[1]>score] for (key,values) in results.items()}
    else:
        return list(set(sum([[value[0] for value in values if value[1]>score] for (key,values) in results.items()],[])))
    
            
            
    

In [5]:
def ConvertVariants(ListOfVariants,source='variantid',target='rsid'):
    
    OT_url='https://api.genetics.opentargets.org/graphql'
    MappingDict={}
    if (source=='variantid') & (target=='rsid'):
        query="""{
               variantInfo(variantId:"%s"){
                   rsId
                   }
              }"""
        for variant in ListOfVariants:
            r = requests.post(OT_url, json={'query': query % (variant)})
            JsonResponse=r.json()
            MappingDict[variant]=JsonResponse['data']['variantInfo']['rsId']
        return list(map(MappingDict.get,ListOfVariants))

    elif (source=='rsid') & (target=='variantid'):
        query="""{
              search(queryString:"%s"){
                  variants{
                      id
                      }
                   }
              }
              """
        for variant in ListOfVariants:
            r = requests.post(OT_url, json={'query': query % (variant)})
            JsonResponse=r.json()
            MappingDict[variant]=JsonResponse['data']['search']['variants'][0]['id']
        return list(map(MappingDict.get,ListOfVariants))





    

In [6]:
rsids = ['rs4129267', 'rs4133213']


In [7]:
Variants = ConvertVariants(rsids,source='rsid',target='variantid')
Variants

['1_154453788_C_T', '1_154422736_C_A']

In [12]:
rsids_converted=ConvertVariants(Variants)
rsids_converted

['rs4129267', 'rs4133213']

In [10]:
genes=QueryOT(Variants,output='GenScor')
genes

{'1_154453788_C_T': [('ENSG00000160712', 0.5396378269617707),
  ('ENSG00000163239', 0.1804828973843058),
  ('ENSG00000169291', 0.1402414486921529)],
 '1_154422736_C_A': [('ENSG00000160714', 0.1128772635814889),
  ('ENSG00000160712', 0.4197183098591549),
  ('ENSG00000163239', 0.17384305835010058)]}

In [11]:
genes=QueryOT(Variants)
genes

['ENSG00000160712', 'ENSG00000163239', 'ENSG00000160714', 'ENSG00000169291']

In [15]:
Entrez=bp.gene_mapping_many(genes,'ensembl','entrez')
Entrez


[3570, 126669, 126668, 55585]