# Converting fasta files to fastq for followup simulations

Samuel Barnett

### Introduction

The MetaSIPSim follow-up simulations generated reads in fasta format. I need them in fastq format so I'll do the conversion here. This requires InSilicoSeq https://github.com/HadrienG/InSilicoSeq.

## 1) Initialization

First I need to import the python modules I'll use, set some variables, initiate R magic, and create/get into the working directory.

In [2]:
import os
workDir = '/home/sam/data/SIPSim2_data/RealWorld_study3/followup_sims/'
nprocs = 15

In [3]:
if not os.path.isdir(workDir):
    print("Working directory does not exist!!!")
%cd $workDir

/home/sam/data/SIPSim2_data/RealWorld_study3/followup_sims


## 2) Converting fasta to fastq

This conversion seeds to be done for each individual read file and is direction dependent.

In [None]:
for followup_set in ['incorp25_lowGC', 'incorp100_lowGC', 
                     'lightwindow_highGC', 'mediumwindow_highGC', 'heavywindow_highGC']:
    subDir = os.path.join(workDir, followup_set)
    %cd $subDir
    print ' '.join(['Converting read files in', subDir, '\n'])

    filelist = [f for f in os.listdir(subDir) if 'fasta.gz' in f]
    filelist.sort()
    for fastafile in filelist:
        # Unzip fasta files
        cmd = 'pigz -d -k -p ' + str(nprocs) + ' ' + fastafile
        os.system(cmd)
        fastafile = fastafile.replace(".gz", "")

        if fastafile.endswith('reads_f.fasta'):
            direction = 'forward'
        elif fastafile.endswith('reads_r.fasta'):
            direction = 'reverse'

        # Convert fasta to fastq with NovaSeq model
        cmd = ' '.join(['python /home/sam/notebooks/SIPSim_metagenome/bin/fasta2fastq.py', 
                        fastafile, direction, 
                        '/home/sam/data/SIPSim2_data/ISS_error_models/NovaSeq 151 tmp', 
                        str(nprocs)])
        print cmd
        os.system(cmd)

        # Cleanup
        cmd = 'rm ' + fastafile
        print cmd + '\n'
        os.system(cmd)
    print '---\n'
print '------\n'


/home/sam/data/SIPSim2_data/RealWorld_study3/followup_sims/incorp25_lowGC
Converting read files in /home/sam/data/SIPSim2_data/RealWorld_study3/followup_sims/incorp25_lowGC 

python /home/sam/notebooks/SIPSim_metagenome/bin/fasta2fastq.py library_1_window_1.72-1.77_reads_f.fasta forward /home/sam/data/SIPSim2_data/ISS_error_models/NovaSeq 151 tmp 15
rm library_1_window_1.72-1.77_reads_f.fasta

python /home/sam/notebooks/SIPSim_metagenome/bin/fasta2fastq.py library_1_window_1.72-1.77_reads_r.fasta reverse /home/sam/data/SIPSim2_data/ISS_error_models/NovaSeq 151 tmp 15
rm library_1_window_1.72-1.77_reads_r.fasta

python /home/sam/notebooks/SIPSim_metagenome/bin/fasta2fastq.py library_2_window_1.72-1.77_reads_f.fasta forward /home/sam/data/SIPSim2_data/ISS_error_models/NovaSeq 151 tmp 15
rm library_2_window_1.72-1.77_reads_f.fasta

python /home/sam/notebooks/SIPSim_metagenome/bin/fasta2fastq.py library_2_window_1.72-1.77_reads_r.fasta reverse /home/sam/data/SIPSim2_data/ISS_error_models/N

rm nonSIP_library_1_reads_f.fasta

python /home/sam/notebooks/SIPSim_metagenome/bin/fasta2fastq.py nonSIP_library_1_reads_r.fasta reverse /home/sam/data/SIPSim2_data/ISS_error_models/NovaSeq 151 tmp 15
rm nonSIP_library_1_reads_r.fasta

python /home/sam/notebooks/SIPSim_metagenome/bin/fasta2fastq.py nonSIP_library_2_reads_f.fasta forward /home/sam/data/SIPSim2_data/ISS_error_models/NovaSeq 151 tmp 15


In [6]:
print('Done')

Done
