## Simple showcase of the PyGeneBe library

Install genebe using 
```
pip install -U genebe
```

In [1]:
import genebe as gnb

print('Version:' + gnb.__version__)

Version:0.1.15


GeneBe makes it easy to parse HGVS, dbSNP and other variants. It works with `c`, `m`, `g` and `n` representation. It does support `p` partially.

In [4]:
parsed = gnb.parse_variants(['ENST00000679957.1:c.803C>T',
                         'ENST00000404276.6:c.1100del',
                         'NC_000003.12:g.39394574A>T',
                         'NC_012920.1:m.1243T>C',
                         '22 28695868 AG A',
                         '22-28695869--G', # look, there is no ref here as this is a deletion
                         'AGT M259T', # just gene with protein change
                         'chrX:153803771:1:A', # SPDI
                         'rs11', # rsID
                         'invalid one', # invalid query, should return ''
                         'PAH:p.Glu57Lys'] # HGVS query with protein change
                         )
parsed

100%|██████████| 1/1 [00:00<00:00,  2.14it/s]


['1-230710021-G-A',
 '22-28695868-AG-A',
 '3-39394574-A-T',
 'M-1243-T-C',
 '22-28695868-AG-A',
 '22-28695868-A-AG',
 '1-230710048-A-G',
 'X-153803772-C-A',
 '7-11324574-C-T',
 '',
 '12-102894918-C-T']

Here is an example of annotating a list of genetic variants represented as `chr-pos-ref-alt` to the `dict` data structure.

In [5]:
flat = gnb.annotate(parsed, flatten_consequences=True, use_refseq=False, output_format="list")
flat

INFO:root:I will try to log in as ps209497@gmail.com
100%|██████████| 1/1 [00:00<00:00,  2.79it/s]


[{'chr': '1',
  'pos': 230710021,
  'ref': 'G',
  'alt': 'A',
  'effect': 'missense_variant',
  'transcript': 'ENST00000366667.6',
  'consequences': 'missense_variant',
  'gene_symbol': 'AGT',
  'gene_hgnc_id': 333,
  'dbsnp': 'rs1228544607',
  'frequency_reference_population': 6.8406566e-06,
  'hom_count_reference_population': 0,
  'allele_count_reference_population': 10,
  'gnomad_exomes_af': 6.840659807494376e-06,
  'gnomad_genomes_af': None,
  'gnomad_exomes_ac': 10,
  'gnomad_genomes_ac': None,
  'gnomad_exomes_homalt': 0,
  'gnomad_genomes_homalt': None,
  'gnomad_mito_homoplasmic': None,
  'gnomad_mito_heteroplasmic': None,
  'computational_score_selected': 0.14022260904312134,
  'computational_prediction_selected': 'Benign',
  'computational_source_selected': 'MetaRNN',
  'splice_score_selected': 0.0,
  'splice_prediction_selected': 'Benign',
  'splice_source_selected': 'max_spliceai',
  'revel_score': 0.18700000643730164,
  'revel_prediction': 'Benign',
  'alphamissense_score'

It is natural to work with a lists of data in represented as Pandas dataframe. Here is an example of annotating a list of variants to the Pandas dataframe. In the `clingen-erepo.ipynb` example file you can find more examples of annotating and joining variants using Pandas.

In [7]:
df = gnb.annotate(parsed, flatten_consequences=True, use_ensembl=False, output_format="dataframe")
print(df.columns)
df

100%|██████████| 1/1 [00:00<00:00,  3.36it/s]

Index(['chr', 'pos', 'ref', 'alt', 'effect', 'transcript', 'consequences',
       'gene_symbol', 'gene_hgnc_id', 'dbsnp',
       'frequency_reference_population', 'hom_count_reference_population',
       'allele_count_reference_population', 'gnomad_exomes_af',
       'gnomad_genomes_af', 'gnomad_exomes_ac', 'gnomad_genomes_ac',
       'gnomad_exomes_homalt', 'gnomad_genomes_homalt',
       'gnomad_mito_homoplasmic', 'gnomad_mito_heteroplasmic',
       'computational_score_selected', 'computational_prediction_selected',
       'computational_source_selected', 'splice_score_selected',
       'splice_prediction_selected', 'splice_source_selected', 'revel_score',
       'revel_prediction', 'alphamissense_score', 'alphamissense_prediction',
       'bayesdelnoaf_score', 'bayesdelnoaf_prediction', 'phylop100way_score',
       'phylop100way_prediction', 'spliceai_max_score',
       'spliceai_max_prediction', 'dbscsnv_ada_score',
       'dbscsnv_ada_prediction', 'apogee2_score', 'apogee2_predic




Unnamed: 0,chr,pos,ref,alt,effect,transcript,consequences,gene_symbol,gene_hgnc_id,dbsnp,...,acmg_score,acmg_classification,acmg_criteria,acmg_by_gene,clinvar_disease,clinvar_classification,phenotype_combined,pathogenicity_classification_combined,custom_annotations,hgvs_c
0,1,230710021.0,G,A,missense_variant,NM_001384479.1,missense_variant,AGT,333.0,rs1228544607,...,0.0,Uncertain_significance,"PM2,BP4_Moderate",[],,,,,,c.803C>T
1,22,28695868.0,AG,A,frameshift_variant,NM_007194.4,frameshift_variant,CHEK2,16627.0,rs555607708,...,16.0,Pathogenic,"PVS1,PP5_Very_Strong",[],"Li-Fraumeni syndrome 2,Hereditary cancer-predi...",Pathogenic,Li-Fraumeni syndrome 2|Hereditary cancer-predi...,Pathogenic,,c.1100delC
2,3,39394574.0,A,T,"stop_gained,splice_region_variant",NM_017875.4,"stop_gained,splice_region_variant",SLC25A38,26054.0,rs121918332,...,9.0,Likely_pathogenic,"PVS1_Strong,PM2,PP3,PP5_Moderate",[],Sideroblastic anemia 2,Pathogenic,Sideroblastic anemia 2,Pathogenic,,c.790A>T
3,M,1243.0,T,C,non_coding_transcript_exon_variant,,"[{'protein_coding': False, 'consequences': ['n...",RNR1,,rs28358572,...,-16.0,Benign,"BP6_Very_Strong,BA1",[],"not provided,not specified",Benign,not provided|not specified,Benign,,
4,22,28695868.0,AG,A,frameshift_variant,NM_007194.4,frameshift_variant,CHEK2,16627.0,rs555607708,...,16.0,Pathogenic,"PVS1,PP5_Very_Strong",[],"Li-Fraumeni syndrome 2,Hereditary cancer-predi...",Pathogenic,Li-Fraumeni syndrome 2|Hereditary cancer-predi...,Pathogenic,,c.1100delC
5,22,28695868.0,A,AG,frameshift_variant,NM_007194.4,frameshift_variant,CHEK2,16627.0,rs2052566665,...,12.0,Pathogenic,"PVS1,PM2,PP5_Moderate",[],Hereditary cancer-predisposing syndrome,Pathogenic,Hereditary cancer-predisposing syndrome,Pathogenic,,c.1100dupC
6,1,230710048.0,A,G,missense_variant,NM_001384479.1,missense_variant,AGT,333.0,rs699,...,-20.0,Benign,"BP4_Strong,BP6_Very_Strong,BA1",[],"Hypertension, essential, susceptibility to,not...",Benign,"Hypertension, essential, susceptibility to|not...",Benign,,c.776T>C
7,X,153803772.0,C,A,missense_variant,NM_001303512.2,missense_variant,PDZD4,21167.0,,...,4.0,Uncertain_significance,"PM2,PP3_Moderate",[],,,,,,c.1909G>T
8,7,11324574.0,C,T,,,[],,,rs11,...,-2.0,Likely_benign,"PM2,BP4_Strong",[],,,,,,
9,,,,,,,[],,,,...,,,,[],,,,,,


## Liftover example
It's easy to make a liftover with `GeneBe`

In [9]:
import genebe as gnb
gnb.lift_over_variants(['chr6-161006172-T-G'], from_genome='hg19', dest_genome='hg38')

100%|██████████| 1/1 [00:00<00:00,  3.77it/s]


['chr6-160585140-T-G']