---
title: Testing the Enformer pipeline option to output both human and mouse head together
author: Sabrina Mi
date: 8/31/23
---

## Personalized Test

We chose `ENSRNOG00000054549`, centered at the TSS chr20:12118762.

Highthroughput on polaris:

```
module load conda
conda activate /lus/grand/projects/TFXcan/imlab/shared/software/conda_envs/enformer-predict-tools

python /home/s1mi/Github/enformer_epigenome_pipeline/enformer_predict.py --parameters /home/s1mi/Github/deep-learning-in-genomics/posts/2023-08-31-testing-multiple-heads-in-pipeline/local_test_personalized.json

```

Local:

```
conda activate enformer-predict-tools
python /Users/sabrinami/Github/enformer_epigenome_pipeline/enformer_predict.py --parameters /Users/sabrinami/Github/deep-learning-in-genomics/posts/2023-08-31-testing-multiple-heads-in-pipeline/local_test_personalized2.json

```

In [1]:
import h5py
import numpy as np
f = h5py.File("/Users/sabrinami/Desktop/2022-23/tutorials/enformer_pipeline_test/predictions_folder/personalized_enformer_rat_single_gene/predictions_2023-08-31/enformer_predictions/000789972A/haplotype0/chr20_12118762_12118762_predictions.h5", "r")
human_prediction = f['human'][()]
mouse_prediction = f['mouse'][()]

In [3]:
import EnformerVCF
import kipoiseq
fasta_file = '/Users/sabrinami/Desktop/2022-23/tutorials/enformer_pipeline_test/rn7_data/rn7_genome.fasta'
fasta_extractor = EnformerVCF.FastaStringExtractor(fasta_file)

In [5]:
target_interval = kipoiseq.Interval("chr20", 12118762, 12118762)
chr20_vcf = EnformerVCF.read_vcf("/Users/sabrinami/Desktop/2022-23/tutorials/enformer_pipeline_test/rn7_data/chr20.vcf.gz")
haplo1, haplo2 = EnformerVCF.vcf_to_seq(target_interval, '000789972A', chr20_vcf, fasta_extractor)
haplo1_enc = EnformerVCF.one_hot_encode("".join(haplo1))[np.newaxis]
haplo2_enc = EnformerVCF.one_hot_encode("".join(haplo2))[np.newaxis]

In [6]:
mean_haplo = (haplo1_enc + haplo2_enc) / 2
output = EnformerVCF.model.predict_on_batch(mean_haplo)

In [7]:
print(human_prediction)
print(output['human'][0])

[[0.23258275 0.2962714  0.52013165 ... 0.19615567 1.1101408  0.25560504]
 [0.15570731 0.20205402 0.3755348  ... 0.04365927 0.24989623 0.08517855]
 [0.1536611  0.21689793 0.4510562  ... 0.05227472 0.2147567  0.08478698]
 ...
 [0.1794057  0.22463816 0.29514343 ... 0.01105995 0.02652512 0.03385386]
 [0.1694869  0.20448665 0.26207498 ... 0.01688805 0.04071837 0.06028533]
 [0.15269741 0.20196484 0.22278813 ... 0.02438667 0.03900523 0.05988767]]
[[0.23258275 0.2962714  0.52013165 ... 0.19615567 1.1101408  0.25560504]
 [0.15570731 0.20205402 0.3755348  ... 0.04365927 0.24989623 0.08517855]
 [0.1536611  0.21689793 0.4510562  ... 0.05227472 0.2147567  0.08478698]
 ...
 [0.1794057  0.22463816 0.29514343 ... 0.01105995 0.02652512 0.03385386]
 [0.1694869  0.20448665 0.26207498 ... 0.01688805 0.04071837 0.06028533]
 [0.15269741 0.20196484 0.22278813 ... 0.02438667 0.03900523 0.05988767]]


In [10]:
print("There are", sum(sum(human_prediction != output['human'][0])), "differences between the human heads and", sum(sum(mouse_prediction != output['mouse'][0])), "differences in the mouse heads.")

There are 0 differences between the human heads and 0 differences in the mouse heads.


## Reference Test

```
conda activate enformer-predict-tools

python /Users/sabrinami/Github/enformer_epigenome_pipeline/enformer_predict.py --parameters /Users/sabrinami/Github/deep-learning-in-genomics/posts/2023-08-31-testing-multiple-heads-in-pipeline/local_test_reference.json

```

### Check Predictions

In [8]:
import h5py
f = h5py.File("/Users/sabrinami/Desktop/2022-23/tutorials/enformer_pipeline_test/predictions_folder/reference_enformer_rat_single_gene/predictions_2023-08-31/enformer_predictions/reference_enformer_rat/haplotype0/chr20_12118762_12118762_predictions.h5", "r")
import kipoiseq 
from kipoiseq import Interval
import EnformerVCF
import numpy as np
fasta_file = '/Users/sabrinami/Desktop/2022-23/tutorials/enformer_pipeline_test/rn7_data/rn7_genome.fasta'
fasta_extractor = EnformerVCF.FastaStringExtractor(fasta_file)
human_prediction1 = f['human'][()]
mouse_prediction1 = f['mouse'][()]

In [9]:
SEQUENCE_LENGTH = 393216
target_interval = kipoiseq.Interval("chr20", 12118762, 12118762)
sequence_one_hot = EnformerVCF.one_hot_encode(fasta_extractor.extract(target_interval.resize(SEQUENCE_LENGTH)))
output = EnformerVCF.model.predict_on_batch(sequence_one_hot[np.newaxis])
mouse_prediction2 = output['mouse'][0]
human_prediction2 = output['human'][0]

In [10]:
print("There are", sum(sum(human_prediction1 != human_prediction2)), "differences between human predictions and", sum(sum(human_prediction1 != human_prediction2)), "differences between mouse predictions.")

There are 0 differences between human predictions and 0 differences between mouse predictions.
