# Variant effect prediction
Variant effect prediction offers a simple way to predict effects of SNVs using any model that uses DNA sequence as an input. Many different scoring methods can be chosen, but the principle relies on in-silico mutagenesis. The default input is a VCF and the default output again is a VCF annotated with predictions of variant effects. 

For details please take a look at the documentation in Postprocessing/Variant effect prediction. This iPython notebook goes through the basic programmatic steps that are needed to preform variant effect prediction. First a variant-centered approach will be taken and secondly overlap-based variant effect prediction will be presented. For details in how this is done programmatically, please refer to the documentation.

## Variant centered effect prediction
Models that uses `kipoiseq.dataloaders.SeqIntervalDl` as a default dataloader can make use of variant-centered effect prediction. This procedure starts out from the query VCF and generates genomic regions of the length of the model input, centered on the individual variant in the VCF which are then mutated according to the alleles in the VCF. The model batch prediction function is then triggered for all mutated sequence sets and finally the scoring method is applied.

The selected scoring methods compare model predicitons for sequences carrying the reference or alternative allele. Those scoring methods can be `Diff` for simple subtraction of prediction, `Logit` for substraction of logit-transformed model predictions, or `DeepSEA_effect` which is a combination of `Diff` and `Logit`, which was published in the Troyanskaya et al. (2015) publication.

This ipython notebook assumes that it is executed in an environment in which kipoi-veff2 is installed. For more information check https://github.com/kipoi/kipoi-veff2#install-the-conda-environment

In [9]:
from kipoi_veff2 import variant_centered

vcf_file = "example_data/clinvar_donor_acceptor_chr22.vcf"
fasta_file = "example_data/hg19_chr22.fa"
output_file = "output.tsv"
model_name = "DeepSEA/variantEffects"

model_group = model_name.split("/")[0]
model_group_config_dict = (
    variant_centered.VARIANT_CENTERED_MODEL_GROUP_CONFIGS.get(
        model_group, {}
    )
)

model_config = variant_centered.get_model_config(
    model_name, **model_group_config_dict
)

variant_centered.score_variants(
    model_config=model_config,
    vcf_file=vcf_file,
    fasta_file=fasta_file,
    output_file=output_file,
)


Using downloaded and verified file: /Users/b260/.kipoi/models/DeepSEA/variantEffects/downloaded/model_files/weights/35956ab9c28960b5a3693f470fe980c1


Lets have a look at the output annotated tsv:

In [12]:
import pandas as pd

output_dataframe = pd.read_csv("output.tsv", sep='\t')
print(output_dataframe.iloc[: 5, : 10])

  #CHROM       POS   ID REF ALT  DeepSEA/variantEffects/8988T_DNase_None/diff  \
0  chr22  41320486    4   G   T                                     -0.001468   
1  chr22  31009031    9   T   G                                     -0.038191   
2  chr22  43024150   15   C   G                                      0.013784   
3  chr22  43027392   16   A   G                                     -0.060475   
4  chr22  37469571  122   C   T                                     -0.015216   

   DeepSEA/variantEffects/AoSMC_DNase_None/diff  \
0                                      0.001205   
1                                     -0.019323   
2                                      0.001041   
3                                     -0.186859   
4                                      0.012377   

   DeepSEA/variantEffects/Chorion_DNase_None/diff  \
0                                       -0.001497   
1                                       -0.009417   
2                                        0.0072

For more information and examples please check https://github.com/kipoi/kipoi-veff2