# Assembling contigs

Samuel Barnett

### Introduction
Assembling contigs from simulating reads using megahit. I will run separate co-assemblies for each experiment type, genome set, and read depth, just like a real experiment (6 libraries per assembly).

## 1) Initialization

First I need to import the python modules I'll use, set some variables, initiate R magic, and create/get into the working directory.

In [1]:
import os
workDir = '/home/sam/data/SIPSim2_data/RealWorld_study3/'
nprocs = 20

In [2]:
if not os.path.isdir(workDir):
    print("Working directory does not exist!!!")
%cd $workDir

/home/sam/data/SIPSim2_data/RealWorld_study3


## 2) Assembly

Co-assemblies will be done separately for each experiment type (SIPS and nonSIP), reference genome set (lowGC, medGC, highGC) and sequencing depth (5MM and 10MM)

In [None]:
genset_dict = {'low_GC_skew': 'lowGC', 
               'medium_GC': 'medGC', 
               'high_GC_skew': 'highGC'}
depth_dict = {'depth5MM': '5MM', 
              'depth10MM': '10MM'}
exp_dict = {'SIP': 'window', 'nonSIP': 'nonSIP'}

assemblyDir = os.path.join(workDir, 'coassembly')
if not os.path.exists(assemblyDir):
    os.makedirs(assemblyDir)

for genome_set in ['low_GC_skew', 'medium_GC', 'high_GC_skew']:
    for depth in ['depth5MM', 'depth10MM']:
        subDir = os.path.join(workDir, genome_set, depth)
        %cd $subDir
                
        for exp_type in ['SIP', 'nonSIP']:
            print(' '.join(['\nGenerating assembly command for', genset_dict[genome_set], 
                            depth_dict[depth], exp_type, 'experiment\n']))
            
            F_filelist = [f for f in os.listdir(subDir) if 'f.fastq.gz' in f if exp_dict[exp_type] in f]
            R_filelist = [f for f in os.listdir(subDir) if 'r.fastq.gz' in f if exp_dict[exp_type] in f]
            outputDir = '_'.join([genset_dict[genome_set], depth_dict[depth], exp_type])

            outputDir = os.path.join(assemblyDir, outputDir)

            cmd = ' '.join(['megahit', 
                            '-1', ','.join(sorted(F_filelist)), 
                            '-2', ','.join(sorted(R_filelist)), 
                            '-t', str(nprocs),
                            '-m', '0.8',
                            '-o', outputDir])
            os.system(cmd)
            
            print(cmd)
            print('\n')


/home/sam/data/SIPSim2_data/RealWorld_study3/low_GC_skew/depth5MM

Generating assembly command for lowGC 5MM SIP experiment



In [4]:
print("done")

done
