# Running MetaSIPSim simulations

Samuel Barnett

### Introduction

This notebook makes the configuration files and the commands to run the simulations. 


## 1) Initialization

First I need to import the python modules I'll use, set some variables, initiate R magic, and create/get into the working directory.

In [12]:
import os
workDir = '/home/sam/data/SIPSim2_data/RealWorld_study3/'
genomeDir = '/home/sam/databases/ncbi_genomes/ncbi-genomes-2019-01-25/'

nprocs = 20

In [13]:
import sys
import pandas as pd
import numpy as np
import ConfigParser


In [14]:
# making directories
## working directory
if not os.path.isdir(workDir):
    print("Working directory does not exist!!!")
%cd $workDir

## genome directory
if not os.path.isdir(genomeDir):
    print("Genome directory does not exist!!!")
else:
    print(genomeDir)

/home/sam/data/SIPSim2_data/RealWorld_study3
/home/sam/databases/ncbi_genomes/ncbi-genomes-2019-01-25/


### Make new directories to store reads

In [None]:
# making directories
lowreadDir = os.path.join(workDir, 'low_GC_skew')
if not os.path.isdir(lowreadDir):
    os.makedirs(lowreadDir)
    os.makedirs(os.path.join(lowreadDir, 'depth5MM'))
    os.makedirs(os.path.join(lowreadDir, 'depth10MM'))
    
medreadDir = os.path.join(workDir, 'medium_GC')
if not os.path.isdir(medreadDir):
    os.makedirs(medreadDir)
    os.makedirs(os.path.join(medreadDir, 'depth5MM'))
    os.makedirs(os.path.join(medreadDir, 'depth10MM'))
    
highreadDir = os.path.join(workDir, 'high_GC_skew')
if not os.path.isdir(highreadDir):
    os.makedirs(highreadDir)
    os.makedirs(os.path.join(highreadDir, 'depth5MM'))
    os.makedirs(os.path.join(highreadDir, 'depth10MM'))


## 2) Configuration file

Now I'll make the file containing all the configurations needed to run the MetaSIPSim simulations. In this case I need a separate one for each simulation.

### Initial config file
This configuration file will contain the configurations that will be the same across all simulations. I'll will modify them after for each individual simulation. For each individual simulation I need to separately set:

* Genome index (genome_index_file)
* Community composition table (community_file)
* Incorporator table (incorporator_file)
* Read depth (final_number_of_sequences)
* Number of reads per iteration, which is equal to read depth as I only one one iteration (number_of_sequences_per_iteration)
* Logfile name just to keep things straight (logfile) 


In [15]:
config = ConfigParser.SafeConfigParser()

## Other parameters
config.add_section('Other')
config.set('Other', 'temp_directory', './tmp')
config.set('Other', 'threads', str(nprocs))
#config.set('Other', 'logfile', 'simulation.log')
#config.set('Other', 'endpoint', 'fragment_list')
#config.set('Other', 'endpoint', 'read_list')
config.set('Other', 'endpoint', 'read_sequences')

## Library parameters
config.add_section('Library')
config.set('Library', 'library_list', '1, 2, 3, 4, 5, 6')

config.set('Library', 'window_or_fraction', 'window')
config.set('Library', 'min_bouyant_density_sequenced', '1.72')
config.set('Library', 'max_bouyant_density_sequenced', '1.77')

## Fragment parameters
config.add_section('Fragment')
#config.set('Fragment', 'genome_index_file', 'genome_index.txt')
config.set('Fragment', 'genomeDir', genomeDir)
config.set('Fragment', 'frag_length_distribution', 'skewed-normal,9000,2500,-5')
config.set('Fragment', 'coverage_of_fragments', '100')
config.set('Fragment', 'temp_fragment_file', 'tmp.frags')

## Gradient parameters
config.add_section('Gradient')
config.set('Gradient', 'temperature', '293.15')
config.set('Gradient', 'avg_density', '1.69')
config.set('Gradient', 'angular_velocity', '33172837')
config.set('Gradient', 'min_rotation_radius', '2.6')
config.set('Gradient', 'max_rotation_radius', '4.85')
config.set('Gradient', 'tube_angle', '28.6')
config.set('Gradient', 'tube_radius', '0.66')
config.set('Gradient', 'tube_height', '4.7')
config.set('Gradient', 'fraction_frag_in_DBL', '0.001')
config.set('Gradient', 'isotope', 'C')

## Model parameters
config.add_section('Model')
config.set('Model', 'min_bouyant_density', '1.67')
config.set('Model', 'max_bouyant_density', '1.775')
config.set('Model', 'bouyant_density_step', '0.0001')
config.set('Model', 'fraction_table_file', os.path.join(workDir, 'fractions.txt'))

## Community parameters
#config.add_section('Community')
#config.set('Community', 'community_file', 'full_comm.txt')
#config.set('Community', 'incorporator_file', 'incorporators.txt')


## Sequencing parameters
config.add_section('Sequencing')
config.set('Sequencing', 'max_read_length', '151')
config.set('Sequencing', 'avg_insert_size', '1000')
config.set('Sequencing', 'stddev_insert_size', '5')
#config.set('Sequencing', 'final_number_of_sequences', '10000000')
#config.set('Sequencing', 'number_of_sequences_per_iteration', '10000000')

# Writing our configuration file to 'example.cfg'
with open('initial_parameters.cfg', 'wb') as configfile:
    config.write(configfile)

### Low GC community

5,000,000 reads

In [17]:
config = ConfigParser.SafeConfigParser()
config.read(os.path.join(workDir, 'initial_parameters.cfg'))


config.set('Other', 'logfile', 'low_GC_skew_simulation.log')
config.set('Fragment', 'genome_index_file', os.path.join(workDir, 'low_GC_skew_genome_index.txt'))

## Community parameters
config.add_section('Community')
config.set('Community', 'community_file', os.path.join(workDir, 'low_GC_skew_comm.txt'))
config.set('Community', 'incorporator_file', os.path.join(workDir, 'low_GC_skew_incorporators.txt'))

## Sequencing parameters
config.set('Sequencing', 'final_number_of_sequences', '5000000')
config.set('Sequencing', 'number_of_sequences_per_iteration', '5000000')

# Writing our configuration file to 'example.cfg'
with open(os.path.join(lowreadDir, 'depth5MM/low_GC_skew_parameters.cfg'), 'wb') as configfile:
    config.write(configfile)

10,000,000 reads

In [18]:
config = ConfigParser.SafeConfigParser()
config.read(os.path.join(workDir, 'initial_parameters.cfg'))


config.set('Other', 'logfile', 'low_GC_skew_simulation.log')
config.set('Fragment', 'genome_index_file', os.path.join(workDir, 'low_GC_skew_genome_index.txt'))

## Community parameters
config.add_section('Community')
config.set('Community', 'community_file', os.path.join(workDir, 'low_GC_skew_comm.txt'))
config.set('Community', 'incorporator_file', os.path.join(workDir, 'low_GC_skew_incorporators.txt'))

## Sequencing parameters
config.set('Sequencing', 'final_number_of_sequences', '10000000')
config.set('Sequencing', 'number_of_sequences_per_iteration', '10000000')

# Writing our configuration file to 'example.cfg'
with open(os.path.join(lowreadDir, 'depth10MM/low_GC_skew_parameters.cfg'), 'wb') as configfile:
    config.write(configfile)

### Medium GC community
5,000,000 reads

In [19]:
config = ConfigParser.SafeConfigParser()
config.read(os.path.join(workDir, 'initial_parameters.cfg'))


config.set('Other', 'logfile', 'medium_GC_simulation.log')
config.set('Fragment', 'genome_index_file', os.path.join(workDir, 'medium_GC_genome_index.txt'))

## Community parameters
config.add_section('Community')
config.set('Community', 'community_file', os.path.join(workDir, 'medium_GC_comm.txt'))
config.set('Community', 'incorporator_file', os.path.join(workDir, 'medium_GC_incorporators.txt'))

## Sequencing parameters
config.set('Sequencing', 'final_number_of_sequences', '5000000')
config.set('Sequencing', 'number_of_sequences_per_iteration', '5000000')

# Writing our configuration file to 'example.cfg'
with open(os.path.join(medreadDir, 'depth5MM/medium_GC_parameters.cfg'), 'wb') as configfile:
    config.write(configfile)

10,000,000 reads

In [20]:
config = ConfigParser.SafeConfigParser()
config.read(os.path.join(workDir, 'initial_parameters.cfg'))


config.set('Other', 'logfile', 'medium_GC_simulation.log')
config.set('Fragment', 'genome_index_file', os.path.join(workDir, 'medium_GC_genome_index.txt'))

## Community parameters
config.add_section('Community')
config.set('Community', 'community_file', os.path.join(workDir, 'medium_GC_comm.txt'))
config.set('Community', 'incorporator_file', os.path.join(workDir, 'medium_GC_incorporators.txt'))

## Sequencing parameters
config.set('Sequencing', 'final_number_of_sequences', '10000000')
config.set('Sequencing', 'number_of_sequences_per_iteration', '10000000')

# Writing our configuration file to 'example.cfg'
with open(os.path.join(medreadDir, 'depth10MM/medium_GC_parameters.cfg'), 'wb') as configfile:
    config.write(configfile)

### High GC community
5,000,000 reads

In [21]:
config = ConfigParser.SafeConfigParser()
config.read(os.path.join(workDir, 'initial_parameters.cfg'))


config.set('Other', 'logfile', 'high_GC_skew_simulation.log')
config.set('Fragment', 'genome_index_file', os.path.join(workDir, 'high_GC_skew_genome_index.txt'))

## Community parameters
config.add_section('Community')
config.set('Community', 'community_file', os.path.join(workDir, 'high_GC_skew_comm.txt'))
config.set('Community', 'incorporator_file', os.path.join(workDir, 'high_GC_skew_incorporators.txt'))

## Sequencing parameters
config.set('Sequencing', 'final_number_of_sequences', '5000000')
config.set('Sequencing', 'number_of_sequences_per_iteration', '5000000')

# Writing our configuration file to 'example.cfg'
with open(os.path.join(highreadDir, 'depth5MM/high_GC_skew_parameters.cfg'), 'wb') as configfile:
    config.write(configfile)

10,000,000 reads

In [22]:
config = ConfigParser.SafeConfigParser()
config.read(os.path.join(workDir, 'initial_parameters.cfg'))


config.set('Other', 'logfile', 'high_GC_skew_simulation.log')
config.set('Fragment', 'genome_index_file', os.path.join(workDir, 'high_GC_skew_genome_index.txt'))

## Community parameters
config.add_section('Community')
config.set('Community', 'community_file', os.path.join(workDir, 'high_GC_skew_comm.txt'))
config.set('Community', 'incorporator_file', os.path.join(workDir, 'high_GC_skew_incorporators.txt'))

## Sequencing parameters
config.set('Sequencing', 'final_number_of_sequences', '10000000')
config.set('Sequencing', 'number_of_sequences_per_iteration', '10000000')

# Writing our configuration file to 'example.cfg'
with open(os.path.join(highreadDir, 'depth10MM/high_GC_skew_parameters.cfg'), 'wb') as configfile:
    config.write(configfile)

## 3) Get fragments

Since this takes a while to run, I'm going to run this in my terminal rather than through jupyter notebook. Each of these is run while in the appropriate working directory

### Low GC community


#### SIP simulation
screen -L -S SIPSim2_low_GC_skew python /home/sam/notebooks/SIPSim_metagenome/bin/SIPSim_metagenome.py low_GC_skew_parameters.cfg

#### non-SIP simulation
screen -L -S SIPSim2_low_GC_skew python /home/sam/notebooks/SIPSim_metagenome/bin/nonSIP_metagenome.py low_GC_skew_parameters.cfg

### Medium GC community


#### SIP simulation
screen -L -S SIPSim2_medium_GC python /home/sam/notebooks/SIPSim_metagenome/bin/SIPSim_metagenome.py medium_GC_parameters.cfg

#### non-SIP simulation
screen -L -S SIPSim2_medium_GC python /home/sam/notebooks/SIPSim_metagenome/bin/nonSIP_metagenome.py medium_GC_parameters.cfg

### High GC community


#### SIP simulation
screen -L -S SIPSim2_high_GC_skew python /home/sam/notebooks/SIPSim_metagenome/bin/SIPSim_metagenome.py high_GC_skew_parameters.cfg

#### non-SIP simulation
screen -L -S SIPSim2_high_GC_skew python /home/sam/notebooks/SIPSim_metagenome/bin/nonSIP_metagenome.py high_GC_skew_parameters.cfg