# Tutorial

Simulate genes under various levels of selection coefficient (from 0 to 799 in 2Ne units for the heterozygous state), with different demographic models (1,2,3-epoch) and with selection time at 15kya from de novo mutation. Each scenario is run 125 times and the whole process is repeated 10 times for a total of $800 \times 125 \times 10 \times 3 = 3e6$ simulated genes.

In [None]:
# bash generate_data.sh

In [1]:
import os
import gzip

import numpy as np
import scipy.stats

import skimage.transform
from keras import models, layers, optimizers, regularizers
from keras.utils import to_categorical, plot_model

import matplotlib.pyplot as plt
import pymc3 # this will be removed
import pydot # optional

Using TensorFlow backend.


In [None]:
%run -i ImaGene.py

As an illustration, we use the 1-epoch model for the first repetition taking only 5 simulations per scenario.

Read simulations and store in object.

In [None]:
myfile = ImaFile(simulations_folder='/home/mfumagal/Data/ImaGene/Simulations1.Epoch1', nr_samples=128, model_name='Marth-3epoch-CEU')

In [None]:
mypop = myfile.read_simulations(parameter_name='selection_coeff_hetero', max_nrepl=1)

(optional) Save description of files in .json file

In [None]:
import json

with open('/home/mfumagal/Data/ImaGene/simulations1.Epoch1.json', 'w') as fp:
    json.dump(mypop.description, fp, sort_keys=True, allow_nan=False, indent=4)

(optional) Save the original data object.

In [None]:
import _pickle as pickle

In [None]:
# to save, create folder
os.mkdir('/home/mfumagal/Data/ImaGene/Images1.Epoch1')

In [None]:
# save
with open('/home/mfumagal/Data/ImaGene/Images1.Epoch1/mypop','wb') as fp:
    pickle.dump(mypop, fp)

In [None]:
# to load:
with open('/home/mfumagal/Data/ImaGene/Images1.Epoch1/mypop','rb') as fp:
    mypop = pickle.load(fp)

Explore the object.

In [None]:
mypop.summary()

In [None]:
mypop.plot(0)
print(mypop.description[0])

Switch to major/minor allele polarisation.

In [None]:
mypop.majorminor()
mypop.plot(0)

Filter out rare variants.

In [None]:
mypop.filter_freq(0.01)
mypop.plot(0)
mypop.data[0].shape

In [None]:
mypop.summary()

In [None]:
# optional
with open('/home/mfumagal/Data/ImaGene/Images1.Epoch1/mypop_majorminor_filtered','wb') as fp:
    pickle.dump(mypop, fp)

Sort images using different orderings.
Then resize to same dimensions 128x128.
Then convert them to float numpy arrays and shuffle using the same order.

In [None]:
shuffle_index = np.random.permutation(len(mypop.data))

In [None]:
# sort by rows
mypop.sort('rows_freq')
mypop.plot(0)

In [None]:
# sort by cols too
mypop.sort('rows_freq')
mypop.plot(0)

In [None]:
# resize
mypop.resize((128, 128))
mypop.plot(0)

In [None]:
# once happy with data processing, convert image into float array and shuffle it
mypop.convert()
mypop.shuffle(shuffle_index)
mypop.plot(0)

In [None]:
# save
with open('/home/mfumagal/Data/ImaGene/Images1.Epoch1/mypop_sortedrowsfreq','wb') as fp:
    pickle.dump(mypop, fp, protocol=4)

In [None]:
mypop.classes

In [None]:
mypop.dimensions[0][0]

In [None]:
mynet = ImaNet(mypop.data)