# Bash for loops

## Intro

Like most programming languages, Bash has a flow control mechanism called a `for` loop for repeating a command or set of commands on multiple values or files.
Instead of executing the same statement over and over again...

In [1]:
echo 1
echo 2
echo 3
echo 4
echo 5

1
2
3
4
5


...you can write a `for` loop so that you only have to invoke the statement once.

In [2]:
for i in 1 2 3 4 5; do echo $i; done

1
2
3
4
5


The cell above shows how I typically write for loops on the command line—compressed on a single line.
But if you're writing the loop in a script, you can improve readability by speading the loop across multiple lines, as shown here.

In [3]:
for i in 1 2 3 4 5; do
    echo $i
done

1
2
3
4
5


The first line of the loop specifies the inputs.
The indented body of the loop declares the operation(s) to be performed on the inputs.
The final line `done` indicates the end of the loop.

> *Note 1: the indentation of the loop body is optional, but improves readability*

> *Note 2: it's common for the body of the loop to include multiple lines/operations on the input*

The for loop above is a trivial example in a couple of ways.
First, using a for loop for 5 inputs doesn't save you *that* much time or typing.
Second, the "operation" we're performing on this input (printing the value to the terminal) is about as trivial as it gets.
However, the for loop is a huge time saver when you need to operate on dozens, hundreds, or even thousands of inputs, or when the operation you're performing takes minutes or hours instead of milliseconds.

## Nested for loops

Description to go here.

In [4]:
for i in 1 2 3 4 5; do for j in a b c; do echo ${i}${j}; done; done

1a
1b
1c
2a
2b
2c
3a
3b
3c
4a
4b
4c
5a
5b
5c


In [5]:
for i in 1 2 3 4 5; do
    for j in a b c; do
        echo ${i}${j}
    done
done

1a
1b
1c
2a
2b
2c
3a
3b
3c
4a
4b
4c
5a
5b
5c


## Exercise

To go here!

## Parallel execution with the for loop

Normally, a for loop performs operations on the input *sequentially*.
That is, it executes operation(s) in the body of the loop on the first input, then it moves to the second input, and so on until all inputs have been processed.
It is possible to have a for loop execute operations on all inputs simultaneously.

The example above is too trivial to make an effective demonstration, so we'll define a new operation here, which I have affectionately called `pipeline`.
It simply rests for 4 seconds, and then prints `Pipeline $1 complete`, where `$1` is a placeholder for the input.

> Note: this example doesn't work well in the Jupyter notebook.
> I recommend running it in the Jupyter terminal.

In [6]:
pipeline() {
    sleep 4
    echo Pipline $1 complete
}

If we execute the for loop as we did before, it should take exactly 20 seconds to process 5 inputs.

In [7]:
for i in 1 2 3 4 5; do pipeline $i; done

Pipline 1 complete
Pipline 2 complete
Pipline 3 complete
Pipline 4 complete
Pipline 5 complete


But we can replace the second `;` symbol with a `&` symbol, indicating that the command should be run in the background.
This allows the for loop to move on to the second input before the first input is done being processed, and so on until processing of all inputs has been initiated.
Since all inputs are being processed simultaneously, it should only take 4 seconds to process the 5 inputs.

In [8]:
# for i in 1 2 3 4 5; do pipeline $i & done

> **IMPORTANT NOTE**: This approach should **NEVER** be used when the number of inputs exceeds the number of processors on your computer.
> If you have 100 inputs and 8 processors, executing a for loop in parallel will cause Bad Things™ to happen.
> In cases where the number of inputs exceeds the number of available processors, the GNU `parallel` is the recommended solution.
> See `parallel.ipynb` for more details.