# Building Software and Pipelines

For building (compiling) software that was written in a compiled language, most of the times one uses a so called build system.

For software written in C, C++ or Fortran, a tool called **make** is normally used.

## Building Software
### Example 
(taken from "A Simple Makefile Tutorial" <http://www.cs.colby.edu/maxwell/courses/tutorials/maketutor/> )

A small C-program could consist of these three files: 
* **hellofunc.c**:

```c
#include <stdio.h>
#include <hellomake.h>
void myPrintHelloMake(void) {
  printf("Hello makefiles!\n");
  return;
}
```

* **hellomake.c**:

```c
#include <hellomake.h>
int main() {
  // call a function in another file
  myPrintHelloMake();
  return(0);
}
```

* **hellomake.h**:

```c
/*
example include file
*/
void myPrintHelloMake(void);
```

This could then be built with the command:

```shell
$ gcc -o hellomake hellomake.c hellofunc.c -I.
```

This will build the compiled executable **hellomake** (`-o hellomake`) from the source files `hellomake.c` and `hellofunc.c` while looking for futher include-files (also called header files) in the current directory (`-I.`).

While this can work for smaller software projects, the command for compiling a progam consisting of dozends of file will become very long and complicated and will cause all files to be re-compiled every time, even if only a single file has changed.


## Enter Make

The developer usually creates a Makefile, which decribes the components and steps of the build process.  When running **make**, it reads the Makefile and builds the software based on the *targets* defined in there.

A Makefile for above example might look like this:

#### Makefile 1
```Makefile
hellomake: hellomake.c hellofunc.c
	gcc -o hellomake hellomake.c hellofunc.c -I.

```
**Important:** The indentation in Makefiles has to use tab characters (not spaces)!!!

Now one can build `hellomake` with one command:

```shell
$ make -f Makefile1
make: 'hellomake' is up to date.
```

1. If the makefile would be called just `Makefile` (not `Makefile1`), one would only need to type `make`.
2. Make notices that hellomake has already been compiled and is up to date and ends up doing nothing.


### Only compile the files that have changed

In `Makefile1` the first line defines `hellomake` as a **target** for which `hellomake.c` and `hellofunc.c` are dependencies. If the target does not already exist or at least one of the dependencies has a newer timestamp than the target, make will run the indented block of commands to create (build) the target.

We can split the build process in pieces, creating a compiled object file from each of the .c files and linking these to the final `hellomake` executable.

#### Makefile2
```Makefile
hellomake: hellomake.o hellofunc.o 
	gcc -o hellomake hellomake.o hellofunc.o -I.
    
hellomake.o: hellomake.c
	gcc -c -o hellomake.o hellomake.c -I.

hellofunc.o: hellofunc.c
	gcc -c -o hellofunc.o hellofunc.c -I.

```

The -c option of the C-compiler makes it build only the intermediary object files.

In addition to that we can introduce variables for our C-compiler and compiler-flags:

#### Makefile2b
```makefile
CC=gcc
CFLAGS=-I.

hellomake: hellomake.o hellofunc.o 
	$(CC) -o hellomake hellomake.o hellofunc.o $(CFLAGS)

hellomake.o: hellomake.c
	$(CC) -c -o hellomake.o hellomake.c $(CFLAGS)

hellofunc.o: hellofunc.c
	$(CC) -c -o hellofunc.o hellofunc.c $(CFLAGS)
```



We can avoid writing (and maintaining) a new target for every single object (.o) file that we want to create from a .c file by defining a general macro:

#### Makefile3
```makefile
CC=gcc
CFLAGS=-I.
DEPS = hellomake.h

hellomake: hellomake.o hellofunc.o 
	$(CC) -o hellomake hellomake.o hellofunc.o $(CFLAGS)

%.o: %.c $(DEPS)
	$(CC) -c -o  $@  $<  $(CFLAGS)
```

* The line **`%.o: %.c $(DEPS)`** says: Any target that ends in **`.o`** depends on a file with the same base name and ending in **`.c`** in addition to what is listed in the variable called **`DEPS`**
* In the compiler command the **`$@`** macro is replaced with the full name of target (before the `:`) and
* the **`$<`** macro is replaced with the first item of the dependency list (after the `:`)



Following the **DRY** Principle (**D**on't **R**epeat **Y**ourself), we can simplify a bit more, by defining the list of objects that hellomake depends on in one place and using the **`$^`** macro, that is replaced by the full list of dependencies of a target:

#### Makefile4
```Makefile
CC=gcc
CFLAGS=-I.
DEPS = hellomake.h
OBJ= hellomake.o hellofunc.o

hellomake: $(OBJ) 
	$(CC) -o  $@  $^  $(CFLAGS)

%.o: %.c $(DEPS)
	$(CC) -c -o  $@  $<  $(CFLAGS)
```


Finally we add a "phony" target called "clean" that deletes all objects and the execuable:

#### Makefile
```Makefile
CC=gcc
CFLAGS=-I.
DEPS = hellomake.h
OBJ= hellomake.o hellofunc.o

hellomake: $(OBJ) 
	$(CC) -o  $@  $^  $(CFLAGS)

%.o: %.c $(DEPS)
	$(CC) -c -o  $@  $<  $(CFLAGS)

.PHONY: clean

clean:
	rm $(OBJ)
	rm hellomake
```

The .PHONY rule tells make that it should not expect a file named `clean`.

```shell
$ make clean
rm hellomake.o hellofunc.o
rm hellomake

$ make
gcc -c -o  hellomake.o  hellomake.c  -I.
gcc -c -o  hellofunc.o  hellofunc.c  -I.
gcc -o  hellomake  hellomake.o hellofunc.o  -I.

```

## How can Make be used to automate an analysis workflow?

Imagine you have:

1. several files of raw data,
2. a script processes the raw data and writes the processed data into a differnent file,
3. a script that generates a plot/figure/image from the processed data,
4. one or more LaTeX files and bibliography files for a thesis, report, manuscript, etc.
5. And you want to quickly re-generate your report any time you get new data.


In [1]:
import os
os.chdir("make_report")
import numpy as np
import pandas as pd


```python
Generate some data:
import numpy as np
x = np.arange(0, 2*np.pi, np.pi/100)
y1 = np.sin(x)
y2 = np.cos(x)
y3 = np.tan(x)
np.savetxt("data/data1.txt", y1, delimiter=',' )
np.savetxt("data/data2.txt", y2, delimiter=',' )
np.savetxt("data/data3.txt", y3, delimiter=',' )
```