# Calculating Multiple Fibonacci Sequences with LLMapReduce

When most people hear Map Reduce, they usually think Hadoop, Java, Big Data, etc. Map Reduce is actually a parallel programming model that has been around for a long time. It consists of two steps: a map step where an operation is executed on a number of files in parallel (like the throughput example we just talked about), and a reduce step where the output of the map step is combined into a single output.

We have a file of multiple numbers we'd like to use as inputs to our Fibonacci code. We can use LLMapReduce to execute our code on each of these inputs. The map step calculates the first `n` values in the sequence and writes this to a text file. The reduce step combines those individual text files into a single file. We delete the intermediate files since they are no longer needed.

Below shows an example of LLMapReduce for a word count problem. The map step is reading multiple files and calculating a word count on each, and the reduce step is merging the counts to get an overall count. Although the problem is different, the process is similar.

<img src="images/MapReduce.png" alt="Map Reduce" style="height: 200px;"/>

## MapReduce with LLMapReduce

By providing an input file, you can use LLMapReduce to run your executable on multiple inputs without having edit and recompile your code. All you need to do is write two short wrapper bash scripts.

LLMapReduce is a command that is available on LLSC Systems and the MIT Supercloud. It is a bit like running a Job Array for the map step, followed by a serial job for the reduce step.

### Starting Point: Submission Script for Serial Job

For reference, here is the starting submission (wrapper) script.

```bash
#!/bin/bash

# Specify Input Number
INPUTNUM=10

echo "Input number: " $INPUTNUM

# Run the executable
../bin/fibonacci $INPUTNUM
```

### Identify Map and Reduce Steps

First we want to identify what we are doing as our map step and what the reduce step will be. The Map step can usually be identified by something you could do in an independent for loop. The Reduce step should take the result of the Map step and produce a single result. You may not always have a reduce step.

Here we have a single operation (our fibonacci executable) that we want to map across many inputs. This is our map step. The map step will create a file for each input, so we can use a reduce step to combine these files into one single output file. This is easy enough to do on your own after the job runs, but it's a nice thing to automate.

We'll start with our original submission script and renaming it `mapper.sh`. For now `reducer.sh` can be an empty bash script.

### Edit mapper.sh: Pass Arg1 to `fibonacci`

LLMapReduce will take the input file and split it up for you and call your script, passing in two inputs: one will be an input from your original file, the other will be a filename where you can write our the result of the map step.

The original script defines an `$INPUTNUM` environment variable and passes that into our `fibonacci` executable. Instead we want to pass in the first argument that is passed into this script. This is in the environment variable `$1`. Note we are also printing the first argument to the log file, this can help with debugging.

```bash
#!/bin/bash

# Print out args
echo 'My arg 1: ' $1

# Run the executable
../bin/fibonacci $1
```

### Edit mapper.sh: Write Result of `fibonacci` to Arg2

Next we want `mapper.sh` to write the output of the `fibonacci` code to the second argument, `$2`. Remember, this is the output file designated for this input. The `>` redirects the output of `../fibonacci $1` to the file `$2`. Again, we are printing out the second argument to help with any debugging or troubleshooting we may have to do.

```bash
#!/bin/bash

# Print out args
echo 'My arg 1: ' $1
echo 'My arg 2: ' $2

# Run the executable
../bin/fibonacci $1 > $2
```

### Test `mapper.sh`

Now might be a good time to make sure you are getting what you expect from the Map step. Go ahead and run:

```bash
LLMapReduce --mapper mapper.sh --input ../../../data/fibonacci/inputFile_10 --output fib_intermediate --np=4 --keep=true
```

For testing any parallel program, it's good to run on a smaller set of inputs, here we're using a file of 10 inputs. We're also using the `--keep=true` flag, which will keep the log files after the job is completed. These log files are deleted by default.

Shortly after you run this, you should see two new directories. One called `fib_intermediate` and another with the prefix `MAPRED.`. The `fib_intermediate` directory should contain 10 files, one for each input in `inputFile_10`. Each file should have the expected output of one `fibonacci` run. If it doesn't you can check the log files in the `MAPRED.####/logs` directory for any errors.

### Create `reducer.sh`

When `LLMapReduce` runs your `reducer.sh` script, it will again pass in two inputs:
- `$1`: the directory where the map step wrote its results
- `$2`: the name of a file to write any output of the reduce step

In our `reducer.sh` script, we are first printing out these two arguments, like we did in the `mapper.sh` script. Then we are using the `cat` command to combine all the files in directory `$1`. Then we can use `>` again to redirect the output to the output file, `$2`. Finally, we can clean up after ourselves and remove the intermediate files. Feel free to comment out this last line the first few times you run or if you need to inspect the intermediate files for debugging purposes.

```bash
#!/bin/bash

# Echo out the inputs
echo 'My arg 1: ' $1
echo 'My arg 2: ' $2

# Combine all intermediate files into one output file
cat $1/* > $2

# Remove Intermediate Files
rm -r $1
```

### Call LLMapReduce

The final step is to call `LLMapReduce`:


```bash
LLMapReduce --mapper mapper.sh --reducer reducer.sh --input ../../../data/fibonacci/inputFile_10 --output fib_intermediate --np=4
```

After this runs you should see a file called `llmapreduce.out`. This will have the result of your reduce step. If you want it to have a different name, you can replace the `$2` in your reduce script with the name you want to give it.

If you would like to keep the output and intermeidate files, say for debugging, you can add the option `--keep=true`. This will keep the log files that are written out.

There are many options to `LLMapReduce`. You can see them with a short description by running `LLMapReduce -h` at the command line.

### Final Scripts

`mapper.sh`:
```bash
#!/bin/bash

# Print out args
echo 'My arg 1: ' $1
echo 'My arg 2: ' $2

# Run the executable
../bin/fibonacci $1 > $2
```

`reducer.sh`:
```bash
#!/bin/bash

# Echo out the inputs
echo Arg 1 is :$1
echo Arg 2 is :$2

# Combine all intermediate files into one output file
cat $1/* > $2

# Remove Intermediate Files
rm -r $1
```