# Assembling contigs for follow-up simulations

Samuel Barnett

Introduction
Assembling contigs from simulating reads using megahit. These reads come from the follow-up simulations. I will run separate co-assemblies for each experiment type and simulation test (6 libraries per assembly).

## 1) Initialization

First I need to import the python modules I'll use, set some variables, initiate R magic, and create/get into the working directory.

In [1]:
import os
workDir = '/home/sam/data/SIPSim2_data/RealWorld_study3/followup_sims'
nprocs = 10

In [2]:
if not os.path.isdir(workDir):
    print("Working directory does not exist!!!")
%cd $workDir

/home/sam/data/SIPSim2_data/RealWorld_study3/followup_sims


## 2) Assembly

Co-assemblies will be done separately for each experiment type (SIPS and nonSIP) and simulation test (25 incorporators, 100 incorporators, light window, medium window, and heavy window)

In [None]:
exp_dict = {'SIP': 'window', 'nonSIP': 'nonSIP'}

assemblyDir = os.path.join(workDir, 'coassembly')

if not os.path.exists(assemblyDir):
    os.makedirs(assemblyDir)

for followup_set in ['incorp25_lowGC', 'incorp100_lowGC', 
                     'lightwindow_highGC', 'mediumwindow_highGC', 'heavywindow_highGC']:
    subDir = os.path.join(workDir, followup_set)
    %cd $subDir

    for exp_type in ['SIP', 'nonSIP']:
        print(' '.join(['\nGenerating assembly command for followup simulation', 
                        followup_set, exp_type, 'experiment\n']))

        F_filelist = [f for f in os.listdir(subDir) if 'f.fastq.gz' in f if exp_dict[exp_type] in f]
        R_filelist = [f for f in os.listdir(subDir) if 'r.fastq.gz' in f if exp_dict[exp_type] in f]
        outputDir = '_'.join([followup_set, exp_type])

        outputDir = os.path.join(assemblyDir, outputDir)

        cmd = ' '.join(['megahit', 
                        '-1', ','.join(sorted(F_filelist)), 
                        '-2', ','.join(sorted(R_filelist)), 
                        '-t', str(nprocs),
                        '-m', '0.8',
                        '-o', outputDir])
        print(cmd)
        print('\n')
        os.system(cmd)
        

/home/sam/data/SIPSim2_data/RealWorld_study3/followup_sims/incorp25_lowGC

Generating assembly command for followup simulation incorp25_lowGC SIP experiment

megahit -1 library_1_window_1.72-1.77_reads_f.fastq.gz,library_2_window_1.72-1.77_reads_f.fastq.gz,library_3_window_1.72-1.77_reads_f.fastq.gz,library_4_window_1.72-1.77_reads_f.fastq.gz,library_5_window_1.72-1.77_reads_f.fastq.gz,library_6_window_1.72-1.77_reads_f.fastq.gz -2 library_1_window_1.72-1.77_reads_r.fastq.gz,library_2_window_1.72-1.77_reads_r.fastq.gz,library_3_window_1.72-1.77_reads_r.fastq.gz,library_4_window_1.72-1.77_reads_r.fastq.gz,library_5_window_1.72-1.77_reads_r.fastq.gz,library_6_window_1.72-1.77_reads_r.fastq.gz -t 10 -m 0.8 -o /home/sam/data/SIPSim2_data/RealWorld_study3/followup_sims/coassembly/incorp25_lowGC_SIP




In [4]:
print("done")

done
