# Running MetaSIPSim for the follow-up simulations

Sam Barnett

### Introduction

This notebook makes the configuration files and commands to run the followup simulations. These simulations are as follows:

* Varying number of incorporators: In these two simulations, I'll take the low GC skewed community and have either 25 or 100 incorporators per sample with a total of 50 or 200 incorporators.

* Heavier buoyant density window: In this simulation I'll take a lighter and heavier BD window with the high GC skewed community.

For both of these followup simulations I will only generate 5,000,000 reads. 


## 1) Initialization

First I need to import the python modules I'll use, set some variables, initiate R magic, and create/get into the working directory.

In [4]:
import os
workDir = '/home/sam/data/SIPSim2_data/RealWorld_study3/'
genomeDir = '/home/sam/databases/ncbi_genomes/ncbi-genomes-2019-01-25/'

nprocs = 20

In [5]:
import sys
import pandas as pd
import numpy as np
import ConfigParser


In [6]:
# making directories
## working directory
if not os.path.isdir(workDir):
    print("Working directory does not exist!!!")
%cd $workDir

## genome directory
if not os.path.isdir(genomeDir):
    print("Genome directory does not exist!!!")
else:
    print(genomeDir)

/home/sam/data/SIPSim2_data/RealWorld_study3
/home/sam/databases/ncbi_genomes/ncbi-genomes-2019-01-25/


### Make new directories to store reads for the followup experiments

In [None]:
# making directories
followupDir = os.path.join(workDir, 'followup_sims')
if not os.path.isdir(followupDir):
    os.makedirs(followupDir)
    os.makedirs(os.path.join(followupDir, 'incorp25_lowGC'))
    os.makedirs(os.path.join(followupDir, 'incorp100_lowGC'))
    os.makedirs(os.path.join(followupDir, 'lightwindow_highGC'))
    os.makedirs(os.path.join(followupDir, 'mediumwindow_highGC'))
    os.makedirs(os.path.join(followupDir, 'heavywindow_highGC'))

## 2) Configuration file

Now I'll make the file containing all the configurations needed to run the MetaSIPSim simulations. I'll need a separate one for each simulation.

### Initial configuration file

This configuration file will contain the configurations that will be the same across all simulations. I'll will modify them after for each individual simulation. For each individual simulation I need to separately set:

* Genome index (genome_index_file)
* Community composition table (community_file)
* Incorporator table (incorporator_file)
* minimum and maximum bouyant densities for sequenced window (min_bouyant_density_sequenced, max_bouyant_density_sequenced)
* minimum and maximum bouyant densities for gradient model (min_bouyant_density, max_bouyant_density)
* Logfile name just to keep things straight (logfile) 

In [23]:
config = ConfigParser.SafeConfigParser()

## Other parameters
config.add_section('Other')
config.set('Other', 'temp_directory', './tmp')
config.set('Other', 'threads', str(nprocs))
#config.set('Other', 'logfile', 'simulation.log')
#config.set('Other', 'endpoint', 'fragment_list')
#config.set('Other', 'endpoint', 'read_list')
config.set('Other', 'endpoint', 'read_sequences')

## Library parameters
config.add_section('Library')
config.set('Library', 'library_list', '1, 2, 3, 4, 5, 6')

config.set('Library', 'window_or_fraction', 'window')
#config.set('Library', 'min_bouyant_density_sequenced', '1.72')
#config.set('Library', 'max_bouyant_density_sequenced', '1.77')

## Fragment parameters
config.add_section('Fragment')
#config.set('Fragment', 'genome_index_file', 'genome_index.txt')
config.set('Fragment', 'genomeDir', genomeDir)
config.set('Fragment', 'frag_length_distribution', 'skewed-normal,9000,2500,-5')
config.set('Fragment', 'coverage_of_fragments', '100')
config.set('Fragment', 'temp_fragment_file', 'tmp.frags')

## Gradient parameters
config.add_section('Gradient')
config.set('Gradient', 'temperature', '293.15')
config.set('Gradient', 'avg_density', '1.69')
config.set('Gradient', 'angular_velocity', '33172837')
config.set('Gradient', 'min_rotation_radius', '2.6')
config.set('Gradient', 'max_rotation_radius', '4.85')
config.set('Gradient', 'tube_angle', '28.6')
config.set('Gradient', 'tube_radius', '0.66')
config.set('Gradient', 'tube_height', '4.7')
config.set('Gradient', 'fraction_frag_in_DBL', '0.001')
config.set('Gradient', 'isotope', 'C')

## Model parameters
config.add_section('Model')
#config.set('Model', 'min_bouyant_density', '1.67')
#config.set('Model', 'max_bouyant_density', '1.775')
config.set('Model', 'bouyant_density_step', '0.0001')
config.set('Model', 'fraction_table_file', os.path.join(workDir, 'fractions.txt'))

## Community parameters
#config.add_section('Community')
#config.set('Community', 'community_file', 'full_comm.txt')
#config.set('Community', 'incorporator_file', 'incorporators.txt')


## Sequencing parameters
config.add_section('Sequencing')
config.set('Sequencing', 'max_read_length', '151')
config.set('Sequencing', 'avg_insert_size', '1000')
config.set('Sequencing', 'stddev_insert_size', '5')
config.set('Sequencing', 'final_number_of_sequences', '5000000')
config.set('Sequencing', 'number_of_sequences_per_iteration', '5000000')

# Writing our configuration file to 'example.cfg'
with open(os.path.join(followupDir, 'initial_followupDir_parameters.cfg'), 'wb') as configfile:
    config.write(configfile)

### Config files for varying number of incorporators

Simulation with 25 incorporators/sample

In [24]:
config = ConfigParser.SafeConfigParser()
config.read(os.path.join(followupDir, 'initial_followupDir_parameters.cfg'))

config.set('Other', 'logfile', 'incorp25_lowGC_simulation.log')
config.set('Fragment', 'genome_index_file', os.path.join(workDir, 'low_GC_skew_genome_index.txt'))

## Model parameters
config.set('Model', 'min_bouyant_density', '1.67')
config.set('Model', 'max_bouyant_density', '1.775')

## Library parameters
config.set('Library', 'min_bouyant_density_sequenced', '1.72')
config.set('Library', 'max_bouyant_density_sequenced', '1.77')

## Community parameters
config.add_section('Community')
config.set('Community', 'community_file', os.path.join(workDir, 'low_GC_skew_comm.txt'))
config.set('Community', 'incorporator_file', os.path.join(workDir, 'low_GC_skew_I25_incorporators.txt'))

# Writing our configuration file to 'example.cfg'
with open(os.path.join(followupDir, 'incorp25_lowGC', 'incorp25_lowGC_parameters.cfg'), 'wb') as configfile:
    config.write(configfile)

Simulation with 100 incorporators/sample

In [25]:
config = ConfigParser.SafeConfigParser()
config.read(os.path.join(followupDir, 'initial_followupDir_parameters.cfg'))

config.set('Other', 'logfile', 'incorp100_lowGC_simulation.log')
config.set('Fragment', 'genome_index_file', os.path.join(workDir, 'low_GC_skew_genome_index.txt'))

## Model parameters
config.set('Model', 'min_bouyant_density', '1.67')
config.set('Model', 'max_bouyant_density', '1.775')

## Library parameters
config.set('Library', 'min_bouyant_density_sequenced', '1.72')
config.set('Library', 'max_bouyant_density_sequenced', '1.77')

## Community parameters
config.add_section('Community')
config.set('Community', 'community_file', os.path.join(workDir, 'low_GC_skew_comm.txt'))
config.set('Community', 'incorporator_file', os.path.join(workDir, 'low_GC_skew_I100_incorporators.txt'))

# Writing our configuration file to 'example.cfg'
with open(os.path.join(followupDir, 'incorp100_lowGC', 'incorp100_lowGC_parameters.cfg'), 'wb') as configfile:
    config.write(configfile)

### Config files for different BD windows

Simulation with the light BD window (1.70-1.75 g/ml)

In [26]:
config = ConfigParser.SafeConfigParser()
config.read(os.path.join(followupDir, 'initial_followupDir_parameters.cfg'))

config.set('Other', 'logfile', 'lightwindow_highGC_simulation.log')
config.set('Fragment', 'genome_index_file', os.path.join(workDir, 'high_GC_skew_genome_index.txt'))

## Model parameters
config.set('Model', 'min_bouyant_density', '1.67')
config.set('Model', 'max_bouyant_density', '1.80')

## Library parameters
config.set('Library', 'min_bouyant_density_sequenced', '1.70')
config.set('Library', 'max_bouyant_density_sequenced', '1.75')

## Community parameters
config.add_section('Community')
config.set('Community', 'community_file', os.path.join(workDir, 'high_GC_skew_comm.txt'))
config.set('Community', 'incorporator_file', os.path.join(workDir, 'high_GC_skew_incorporators.txt'))

# Writing our configuration file to 'example.cfg'
with open(os.path.join(followupDir, 'lightwindow_highGC', 'lightwindow_highGC_parameters.cfg'), 'wb') as configfile:
    config.write(configfile)

Simulation with the previously used "medium" BD window (1.72-1.77 g/ml)

In [28]:
config = ConfigParser.SafeConfigParser()
config.read(os.path.join(followupDir, 'initial_followupDir_parameters.cfg'))

config.set('Other', 'logfile', 'mediumwindow_highGC_simulation.log')
config.set('Fragment', 'genome_index_file', os.path.join(workDir, 'high_GC_skew_genome_index.txt'))

## Model parameters
config.set('Model', 'min_bouyant_density', '1.67')
config.set('Model', 'max_bouyant_density', '1.80')

## Library parameters
config.set('Library', 'min_bouyant_density_sequenced', '1.72')
config.set('Library', 'max_bouyant_density_sequenced', '1.77')

## Community parameters
config.add_section('Community')
config.set('Community', 'community_file', os.path.join(workDir, 'high_GC_skew_comm.txt'))
config.set('Community', 'incorporator_file', os.path.join(workDir, 'high_GC_skew_incorporators.txt'))

# Writing our configuration file to 'example.cfg'
with open(os.path.join(followupDir, 'mediumwindow_highGC', 'mediumwindow_highGC_parameters.cfg'), 'wb') as configfile:
    config.write(configfile)

Simulation with the heavy BD window (1.74-1.79 g/ml)

In [29]:
config = ConfigParser.SafeConfigParser()
config.read(os.path.join(followupDir, 'initial_followupDir_parameters.cfg'))

config.set('Other', 'logfile', 'heavywindow_highGC_simulation.log')
config.set('Fragment', 'genome_index_file', os.path.join(workDir, 'high_GC_skew_genome_index.txt'))

## Model parameters
config.set('Model', 'min_bouyant_density', '1.67')
config.set('Model', 'max_bouyant_density', '1.80')

## Library parameters
config.set('Library', 'min_bouyant_density_sequenced', '1.74')
config.set('Library', 'max_bouyant_density_sequenced', '1.79')

## Community parameters
config.add_section('Community')
config.set('Community', 'community_file', os.path.join(workDir, 'high_GC_skew_comm.txt'))
config.set('Community', 'incorporator_file', os.path.join(workDir, 'high_GC_skew_incorporators.txt'))

# Writing our configuration file to 'example.cfg'
with open(os.path.join(followupDir, 'heavywindow_highGC', 'heavywindow_highGC_parameters.cfg'), 'wb') as configfile:
    config.write(configfile)

## 3) Get fragments

Since this takes a while to run, I'm going to run this in my terminal rather than through jupyter notebook. Each of these is run while in the appropriate working directory

### a) Simulating varying number of incorporators

#### i) 25 incorporators
##### SIP simulation
screen -L -S SIPSim2_incorp25 python /home/sam/notebooks/SIPSim_metagenome/bin/SIPSim_metagenome.py incorp25_lowGC_parameters.cfg
##### non-SIP simulation
screen -L -S SIPSim2_incorp25 python /home/sam/notebooks/SIPSim_metagenome/bin/nonSIP_metagenome.py incorp25_lowGC_parameters.cfg

#### ii) 100 incorporators
##### SIP simulation
screen -L -S SIPSim2_incorp100 python /home/sam/notebooks/SIPSim_metagenome/bin/SIPSim_metagenome.py incorp100_lowGC_parameters.cfg
##### non-SIP simulation
screen -L -S SIPSim2_incorp100 python /home/sam/notebooks/SIPSim_metagenome/bin/nonSIP_metagenome.py incorp100_lowGC_parameters.cfg

### b) Simulating different BD windows

#### i) light BD window
##### SIP simulation
screen -L -S SIPSim2_lightwindow python /home/sam/notebooks/SIPSim_metagenome/bin/SIPSim_metagenome.py lightwindow_highGC_parameters.cfg
##### non-SIP simulation
screen -L -S SIPSim2_lightwindow python /home/sam/notebooks/SIPSim_metagenome/bin/nonSIP_metagenome.py lightwindow_highGC_parameters.cfg

#### ii) medium BD window
##### SIP simulation
screen -L -S SIPSim2_mediumwindow python /home/sam/notebooks/SIPSim_metagenome/bin/SIPSim_metagenome.py mediumwindow_highGC_parameters.cfg
##### non-SIP simulation
screen -L -S SIPSim2_mediumwindow python /home/sam/notebooks/SIPSim_metagenome/bin/nonSIP_metagenome.py mediumwindow_highGC_parameters.cfg

#### iii) heavy BD window
##### SIP simulation
screen -L -S SIPSim2_heavywindow python /home/sam/notebooks/SIPSim_metagenome/bin/SIPSim_metagenome.py heavywindow_highGC_parameters.cfg
##### non-SIP simulation
screen -L -S SIPSim2_heavywindow python /home/sam/notebooks/SIPSim_metagenome/bin/nonSIP_metagenome.py heavywindow_highGC_parameters.cfg