# Working with Loops
Let's kick it up another notch - we have six pilot samples, let's run our analysis on all of them!  

## Shell Variables
Assign the variables in this notebook.

In [2]:
source bioinf_intro_config.sh

## A Brief journey into `for` loops
Most of the steps in our pipeline aren't so simple.  To apply our pipeline to multiple sample files, we need to change things in multiple places.  For example, just to run trimming with fastq-mcf, we need to change things in two places between each run: the input FASTQ and the output FASTQ.  If we were doing this with paired-end reads, we would have to change four things. Doing this by hand is not only tedious, but error prone.  Doing almost the same thing repeatedly is something that people are bad at, but computers are very good at!  So let's get the computer to do the hard work.  Because the Bash shell is almost magical (it is a full fledged programming language), we can do this.  We will use a `for loop`.  This is analogous to how you would teach a child to set the table: "FOR each place at the table, put a plate . . .,
At the shell you phrase it like this:

    for PERSON in Alice Bob Carol Dave Eve
    do
    put plate at PERSON's place
    put napkin at PERSON's place
    put fork at PERSON's place
    put spoon at PERSON's place
    put knife at PERSON's place
    done

Here is a real example:

In [None]:
for SAMPLE in A B C D E F
    do
       echo "______${SAMPLE}________"
    done

The `for` loop in Bash is conceptually the same as in any other programming language, although the syntax may be different.  The `do` and `done` are essential - `do` needs to be before the "loop body" (what is going to be repeated) and `done` needs to be after it.

So let's try something almost useful:

In [None]:
for SAMPLE in 27_MA_P_S38_L002_R1
    do
        echo "RUNNING SAMPLE: ${SAMPLE}"
    done

## Now for the real thing . . .
### Let's run fastq-mcf in a loop:

In [None]:
for SAMPLE in 27_MA_P_S38_L002_R1
    do
        echo "DEMUXING: $SAMPLE"
        fastq-mcf $MYINFO/neb_e7600_adapters.fasta \
            $RAW_FASTQS/${SAMPLE}_001.fastq.gz \
            -q 20 -x 0.5 \
            -o $TRIMMED/${SAMPLE}_001.trim.fastq.gz
    done

### Now let's do the same thing for STAR

In [3]:
for SAMPLE in 27_MA_P_S38_L002_R1
    do
        echo "MAPPING: $SAMPLE"
        STAR \
            --runMode alignReads \
            --twopassMode None \
            --genomeDir $GENOME_DIR \
            --readFilesIn $TRIMMED/${SAMPLE}_001.trim.fastq.gz \
            --readFilesCommand gunzip -c \
            --outFileNamePrefix ${STAR_OUT}/${SAMPLE}_ \
            --quantMode GeneCounts
    done

MAP: 27_MA_P_S38_L002_R1
Jul 22 20:11:33 ..... started STAR run
Jul 22 20:11:33 ..... loading genome
Jul 22 20:11:34 ..... started mapping
Jul 22 20:12:35 ..... finished successfully


In [None]:
ls /data/hts2018_pilot/Granek_4837_180427A5/27_MA_P_S38_L002_R1_001.fastq.gz


### And let's check the result

In [7]:
ls ${STAR_OUT}

27_MA_P_S38_L002_R1_Aligned.out.sam   27_MA_P_S38_L002_R1_ReadsPerGene.out.tab
27_MA_P_S38_L002_R1_Log.final.out     27_MA_P_S38_L002_R1_SJ.out.tab
27_MA_P_S38_L002_R1_Log.out           genome_Log.out
27_MA_P_S38_L002_R1_Log.progress.out


In [8]:
head ${STAR_OUT}/27_MA_P_S38_L002_R1_ReadsPerGene.out.tab

N_unmapped	27178	27178	27178
N_multimapping	44107	44107	44107
N_noFeature	7990	1438633	11639
N_ambiguous	108226	1155	480
CNAG_04548	0	0	0
CNAG_07303	0	0	0
CNAG_07304	12	0	12
CNAG_00001	0	0	0
CNAG_07305	0	0	0
CNAG_00002	51	0	51
