---
title: Computing Haplotype for Probabilities 8K HS Rats
author: Sabrina Mi
date: 7/31/2025
---
# Preparing 8K rat genotypes with body lenght, BMI data

## Write samples file

First, we obtain a list of samples with body length and BMI data, then intersect with genotypes rats.

In [5]:
import pandas as pd
all_pheno = pd.read_csv("/home/s1mi/enformer_rat_data/phenotypes/ALLTRAITSALLNORMALIZES_19jul24.csv", 
                        usecols = ['rfid', 'dissection:regressedlr_length_w_tail_cm', 'dissection:regressedlr_bmi_w_tail'],
                        index_col = 'rfid')
all_pheno.columns = ['bodylen', 'bmi']
pheno_rats = all_pheno[all_pheno['bodylen'].notna()].index


  all_pheno = pd.read_csv("/home/s1mi/enformer_rat_data/phenotypes/ALLTRAITSALLNORMALIZES_19jul24.csv",


In [None]:
import pysam
vcf = pysam.VariantFile("/home/s1mi/enformer_rat_data/genotypes/ratgtex_v3_round10_5.rn7.vcf.gz")
geno_rats = list(vcf.header.samples)
samples = pheno_rats.intersection(geno_rats)
# with open("samples.txt", "w") as f:
    # f.write("\n".join(samples))



I downloaded the [bed, bim, fam files](https://library.ucsd.edu/dc/object/bb5610743d), then filtered to the 8K rats with phenotype data.

```
curl -L -o bb5610743d.bed https://library.ucsd.edu/dc/object/bb5610743d/_2_1.bed/download
```

## Filter Samples

```
plink2 --bfile bb5610743d --keep-id samples.txt --export vcf bgz --out bodylen_bmi_8K_samples
bcftools index -t bodylen_bmi_8K_samples.vcf.gz
```

## Split VCF
```
mkdir ~/enformer_rat_data/genotypes/bodylen_bmi_VCFs
# Split VCF by chromosome
vcf_in=bodylen_bmi_8K_samples.vcf.gz

vcf_out_prefix=~/enformer_rat_data/genotypes/bodylen_bmi_VCFs/chr

for i in {1..20}
do
    echo "Working on chromosome ${i}..."
    bcftools view ${vcf_in} --regions ${i} -o ${vcf_out_prefix}${i}.vcf.gz -Oz
done


# Index VCFs
for i in {1..20}
do
    echo "Indexing chromosome ${i}..."
    bcftools index -t ${vcf_out_prefix}${i}.vcf.gz
done
```

### Compute Haplotype Probabilities

```
conda activate genomics
cd ~/Github/deep-learning-in-genomics/posts/2023-11-22-qtl2-founder-haps-pbs-job-array
GENO_DIR=/Users/sabrinami/Desktop/HS_genotypes
DATA_DIR=/Users/sabrinami/Desktop/qtl2_data
for CHR in {1..20}
do
    python make_qtl2_inputs.py $GENO_DIR/bodylen_bmi_VCFs/chr${CHR}.vcf.gz $GENO_DIR/FounderVCFs/chr${CHR}.vcf.gz --working-dir $DATA_DIR/chr${CHR}_qtl2_founder_haps --gmap-dir $DATA_DIR/genetic_map
done

```