---
title: Testing the Enformer pipeline option to output both human and mouse head together
author: Sabrina Mi
date: 8/31/23
---

## Personalized Test

We chose `ENSRNOG00000054549`, centered at the TSS chr20:12118762.

```
module load conda
conda activate /lus/grand/projects/TFXcan/imlab/shared/software/conda_envs/enformer-predict-tools

python /home/s1mi/Github/enformer_epigenome_pipeline/enformer_predict.py --parameters /home/s1mi/Github/deep-learning-in-genomics/posts/2023-08-31-testing-multiple-heads-in-pipeline/local_test_personalized.json

```

In [11]:
import h5py
import numpy as np
f = h5py.File("/home/s1mi/Br_predictions/predictions_folder/personalized_enformer_rat_single_gene/predictions_2023-09-01/enformer_predictions/000789972A/haplotype0/chr20_12118762_12118762_predictions.h5", "r")
human_prediction = f['human'][()]
mouse_prediction = f['mouse'][()]
print(human_prediction)
print(mouse_prediction)

[[0.23274943 0.2965444  0.52038455 ... 0.19658908 1.1122763  0.25645596]
 [0.15580618 0.20222025 0.3758387  ... 0.04369472 0.2502851  0.08527476]
 [0.1537556  0.21700647 0.45132023 ... 0.05224136 0.21457893 0.08480939]
 ...
 [0.17944263 0.22469036 0.2953365  ... 0.01106308 0.02653556 0.03387162]
 [0.16953152 0.20453896 0.26224014 ... 0.01689028 0.04074223 0.06032879]
 [0.15270805 0.20201151 0.2229496  ... 0.02438593 0.03898373 0.05988253]]
[[0.07024036 0.5618067  0.39709392 ... 0.38007784 1.7112566  1.046375  ]
 [0.05916372 0.31586623 0.268665   ... 0.45626092 1.6155146  1.5257939 ]
 [0.06996798 0.21288775 0.19785598 ... 0.87312496 2.7714255  2.7897897 ]
 ...
 [0.06064176 0.09356464 0.12812886 ... 0.19675446 0.2861318  0.3974655 ]
 [0.05536072 0.11563224 0.1446574  ... 0.2367305  0.40365022 0.4892622 ]
 [0.05442411 0.12119047 0.14838645 ... 0.24355783 0.44523096 0.4760301 ]]


In [4]:
import EnformerVCF
import kipoiseq
fasta_file = '/home/s1mi/enformer_rat_data/reference_genome/rn7_genome.fasta'
fasta_extractor = EnformerVCF.FastaStringExtractor(fasta_file)

In [8]:
target_interval = kipoiseq.Interval("chr20", 12118762, 12118762)
chr20_vcf = EnformerVCF.read_vcf("/home/s1mi/enformer_rat_data/genotypes/BrainVCFs/chr20.vcf.gz")
haplo1, haplo2 = EnformerVCF.vcf_to_seq(target_interval, '000789972A', chr20_vcf, fasta_extractor)
haplo1_enc = EnformerVCF.one_hot_encode("".join(haplo1))[np.newaxis]
haplo2_enc = EnformerVCF.one_hot_encode("".join(haplo2))[np.newaxis]

In [12]:
mean_haplo = (haplo1_enc + haplo2_enc) / 2
output = EnformerVCF.model.predict_on_batch(mean_haplo)

In [23]:
print(human_prediction)
print(output['human'][0])

[[0.23274943 0.2965444  0.52038455 ... 0.19658908 1.1122763  0.25645596]
 [0.15580618 0.20222025 0.3758387  ... 0.04369472 0.2502851  0.08527476]
 [0.1537556  0.21700647 0.45132023 ... 0.05224136 0.21457893 0.08480939]
 ...
 [0.17944263 0.22469036 0.2953365  ... 0.01106308 0.02653556 0.03387162]
 [0.16953152 0.20453896 0.26224014 ... 0.01689028 0.04074223 0.06032879]
 [0.15270805 0.20201151 0.2229496  ... 0.02438593 0.03898373 0.05988253]]
[[0.2328074  0.29660922 0.5203649  ... 0.19664825 1.1117275  0.25627536]
 [0.15581839 0.20223683 0.3758044  ... 0.04368182 0.24998024 0.08520468]
 [0.15374774 0.21699725 0.4512595  ... 0.05223367 0.21435499 0.0847122 ]
 ...
 [0.17940459 0.22463025 0.29519853 ... 0.01106335 0.02654492 0.03387169]
 [0.16953702 0.20454739 0.26222825 ... 0.01689191 0.04074373 0.06033437]
 [0.15271626 0.20200802 0.22293249 ... 0.02438697 0.03900735 0.05990075]]


In [16]:
print("There are", sum(sum(human_prediction == output['human'][0])), "differences between the human heads and", sum(sum(mouse_prediction != output['mouse'][0]))), "differences in the mouse heads."

There are 4757949 differences between the human heads and 1471367


(None, 'differences in the mouse heads.')

## Reference Test

```
conda activate enformer-predict-tools

python /Users/sabrinami/Github/enformer_epigenome_pipeline/enformer_predict.py --parameters /Users/sabrinami/Github/deep-learning-in-genomics/posts/2023-08-31-testing-multiple-heads-in-pipeline/local_test_reference.json

```

### Check Predictions

In [8]:
import h5py
f = h5py.File("/Users/sabrinami/Desktop/2022-23/tutorials/enformer_pipeline_test/predictions_folder/reference_enformer_rat_single_gene/predictions_2023-08-31/enformer_predictions/reference_enformer_rat/haplotype0/chr20_12118762_12118762_predictions.h5", "r")
import kipoiseq 
from kipoiseq import Interval
import EnformerVCF
import numpy as np
fasta_file = '/Users/sabrinami/Desktop/2022-23/tutorials/enformer_pipeline_test/rn7_data/rn7_genome.fasta'
fasta_extractor = EnformerVCF.FastaStringExtractor(fasta_file)
human_prediction1 = f['human'][()]
mouse_prediction1 = f['mouse'][()]

In [9]:
SEQUENCE_LENGTH = 393216
target_interval = kipoiseq.Interval("chr20", 12118762, 12118762)
sequence_one_hot = EnformerVCF.one_hot_encode(fasta_extractor.extract(target_interval.resize(SEQUENCE_LENGTH)))
output = EnformerVCF.model.predict_on_batch(sequence_one_hot[np.newaxis])
mouse_prediction2 = output['mouse'][0]
human_prediction2 = output['human'][0]

In [10]:
print("There are", sum(sum(human_prediction1 != human_prediction2)), "differences between human predictions and", sum(sum(human_prediction1 != human_prediction2)), "differences between mouse predictions.")

There are 0 differences between human predictions and 0 differences between mouse predictions.
