## Data source

Download data from the paper "Genetics of trans-regulatory variation in gene expression" eLife 2018;7:e35471 doi: 10.7554/eLife.35471. We will use Additional files 1 (gene expression data), 2 (covariates) and 3 (genotypes).

In [4]:
%matplotlib notebook
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os
import zipfile

# location of downloaded files
path = '/Home/ii/tomm/Projects/data-public/bayes-net-sysgen/eQTL_BYxRM/'

# relevant file names
f_expr = 'SI_Data_01_expressionValues.txt'
f_cov = 'elife-35471-data2-v2.xlsx'
f_geno = 'SI_Data_03_genotypes.txt'
f_hotspot = 'elife-35471-data8-v2.xlsx'

Read expression data

In [2]:
df_expr = pd.read_table(path + f_expr)

# truncate row names to match genotype data
df_expr.index = [x.split("-")[0] for x in df_expr.index]

Read covariate data

In [11]:
df_cov = pd.read_excel(path + f_cov, index_col='segregant')
# truncate row names to match genotype data
df_cov.index = [x.split("-")[0] for x in df_cov.index]

Read genotype data

In [12]:
df_geno = pd.read_table(path + f_geno)

Read hotspot file

In [12]:
df_hotspot = pd.read_excel(path + f_hotspot,usecols="A")

Export to csv

In [6]:
df_expr.to_csv('../../data/BYxRM_1000/elife-35471-expression.csv')

In [7]:
df_cov.to_csv('../../data/BYxRM_1000/elife-35471-covariates.csv')

In [18]:
df_geno.to_csv('../../data/BYxRM_1000/elife-35471-genotypes.csv')

In [13]:
df_hotspot.to_csv('../../data/BYxRM_1000/elife-35471-hotspots.csv')

In [13]:
df_expr.shape

(1012, 5720)