---
title: Testing the Enformer pipeline's handling of unphased VCFs
description: 'In the previous test, we added an option to read in unphased VCFs as if they were phased. It correctly ran Enformer on both "haplotypes", but to reduce number of runs, we implemented predictions on the mean haplotype for unphased VCFs. Now we check that the pipeline method returns the same results as original Enformer usage'
author: Sabrina Mi
date: 8/16/2023

---


### Read in h5 prediction file

In [2]:
import numpy as np
import h5py

In [24]:
f = h5py.File("/Users/sabrinami/Desktop/2022-23/tutorials/enformer_pipeline_test/predictions_folder/personalized_enformer_rat_single_gene/predictions_2023-08-16/enformer_predictions/000789972A/haplotype0/chr20_12118762_12118762_predictions.h5", "r")
predictions1 = f['chr20_12118762_12118762'][()]
print(predictions1)

[[0.23258275 0.2962714  0.52013165 ... 0.19615567 1.1101408  0.25560504]
 [0.15570731 0.20205402 0.3755348  ... 0.04365927 0.24989623 0.08517855]
 [0.1536611  0.21689793 0.4510562  ... 0.05227472 0.2147567  0.08478698]
 ...
 [0.1794057  0.22463816 0.29514343 ... 0.01105995 0.02652512 0.03385386]
 [0.1694869  0.20448665 0.26207498 ... 0.01688805 0.04071837 0.06028533]
 [0.15269741 0.20196484 0.22278813 ... 0.02438667 0.03900523 0.05988767]]


### Run non-pipeline Enformer

In [5]:
import EnformerVCF
import kipoiseq
fasta_file = '/Users/sabrinami/Desktop/2022-23/tutorials/enformer_pipeline_test/rn7_data/reference_genome/rn7_genome.fasta'
fasta_extractor = EnformerVCF.FastaStringExtractor(fasta_file)

In [7]:
# read VCFs and encode haplotypes
target_interval = kipoiseq.Interval("chr20", 12118762, 12118762)
chr20_vcf = EnformerVCF.read_vcf("/Users/sabrinami/enformer_rat_data/genotypes/BrainVCFs/chr20.vcf.gz")
haplo1, haplo2 = EnformerVCF.vcf_to_seq(target_interval, '000789972A', chr20_vcf, fasta_extractor)
haplo1_enc = EnformerVCF.one_hot_encode("".join(haplo1))[np.newaxis]
haplo2_enc = EnformerVCF.one_hot_encode("".join(haplo2))[np.newaxis]

In [21]:
mean_haplo = (haplo1_enc + haplo2_enc) / 2
predictions2 = EnformerVCF.model.predict_on_batch(mean_haplo)['human'][0]

In [25]:
print("There are", sum(sum(predictions1 != predictions2)), "differences between the two matrices.")

There are 0 differences between the two matrices.


It looks like our edits worked!