# Week 3 - September 13 - Code reusability, argument parsing, configuration files

## Recap:

- A **script** is a `.py` file with a sequence of instructions that are executed each time the script is called.
- A **module** is a collection of related code (variables, functions, classes) saved in a `.py` file that can be imported and re-used.
- A **package** is a directory of a collection of related modules. Each package (or subpackage) must have an `__init.py__` file.
- A **library** is a collection  of packages, but is often also used interchangably with package, or as an umbrella term for reusable code.

You can `import` from the Python Standard Library, installed third-party libraries, or from your own user-defined modules/pakages. You can also publish your packages to online repositories so others can use them.

***Remember:***
- Sets of instructions (especially those that are called several times) should be written inside functions for better code reusability.
- More generally, avoid writing code that isn't wrapped in a function, to avoid global variables.
- Functions (or other bits of code) that are called from several scripts should be written inside a module, so that only the module is imported in the different scripts (do not copy-and-paste your functions in the different scripts!).

***Note:* For this class, we will be writing a lot of the code to `.py` files, so that we can run them from the command line. This is not possible for Jupyter notebooks, so a lot of the code here will have to be copy-pasted into scripts when you are running it at home.**

### Docstrings

We have seen that multiline strings that are not assigned to a variable are discarded, and *can*, but shouldn't be used as comments. However, they *should* be used to document functions and classes.

In [None]:
def multiply(a, b):
    """Multiply two numbers
    
    Args:   a (int): First number
            b (int): Second number
            
    Returns:    int: Product of a and b
    """
    
    return a * b

The `help` function reads these docstrings

In [None]:
help(multiply)

## Argument parsing

Once you've written your script/module, it would be ideal if you could run it without haivng to open Jupyter, Spyder, or any other IDE.

To do this, any user inputs, or parameters that are allowed to change, should not be hard-coded into your scripts.

We have already seen one way of receiving user input:

In [None]:
def get_name():
    """Gets user's name"""
    name = input("What is your name? ").strip().capitalize()
    
    # setting a default value if the user provides no input.
    if name == "":
        name = "user"
        
    return name


def main():
    """Says hello to the user"""
    name = get_name()
    print(f"\nHello, {name}!")
    

if __name__ == "__main__":
    main()

The same code above can be saved in a script (named `hello.py` here) and run from the command line with 

`python hello.py`

But this can get tedious when you have multiple inputs, and will get quite verbose to include default values for every input.

In [None]:
from datetime import date

def get_name():
    """Gets user's name"""
    name = input("What is your name? ").strip().capitalize()
    
    # setting a default value if the user provides no input.
    if name == "":
        name = "user"
        
    return name


def get_birthyear():
    """Gets user's birth year"""
    year = input("What is your birthyear? ").strip()
    
    # setting a default value if the user provides no input.
    if year == "":
        year = 1970
        print("Using default birthyear of 1970.")
    else:
        year = int(year)
        
    return year


def main():
    """Says hello to the user"""
    name = get_name()
    
    birthyear = get_birthyear()
    age = date.today().year - birthyear
    
    print(f"\nHello, {name}! You are {age} years old today.")
    
    
if __name__ == "__main__":
    main()

Instead, it would better to get these user inputs at the same time the script is run from the command. Something like:

`python hello.py [name] [birthyear]`

In this case `[name]` is a command line argument, and Python can parse these arguments with the package **`argparse`** which is included in the Standard Library.

The code below can also be saved to a script. Let's name it `hello_arg.py`

In [None]:
import argparse
from datetime import date

def get_args():
    """Gets command line arguments"""
    
    # Create an ArgumentParser object
    parser = argparse.ArgumentParser()
    
    # Add arguments
    parser.add_argument("name", help="User's name")
    parser.add_argument("birthyear", type=int, help="User's birth year")
    
    # Parse arguments
    args = parser.parse_args()
    
    return args


def main():
    """Says hello to the user"""
    args = get_args()
    name = args.name.strip().capitalize()
    age = date.today().year - args.birthyear
    
    print(f"\nHello, {name}! You are {age} years old today.")


if __name__ == "__main__":
    main()

You can now pass the arguments in the command line as:

`python hello_arg.py name birthyear`

In this case, `name` and `birthyear` are positional arguments. Similar to positional arguments, they have to be passed in the correct order/position.

Just like functions, you can also have optional arguments and give them default values. Optional arguments are denoted by `--`

In [None]:
import argparse
from datetime import date

def get_args():
    """Gets command line arguments"""
    
    # Create an ArgumentParser object
    parser = argparse.ArgumentParser()
    
    # Add OPTIONAL arguments
    parser.add_argument("--name", default="user", help="User's name")
    parser.add_argument("--birthyear", default=1970, type=int, help="User's birth year")
    
    # Parse arguments
    args = parser.parse_args()
    
    return args


def main():
    """Says hello to the user"""
    args = get_args()
    name = args.name.strip().capitalize()
    age = date.today().year - args.birthyear
    
    print(f"\nHello, {name}! You are {age} years old today.")


if __name__ == "__main__":
    main()

You can also provide single letter options for each arguments using `-`.

In [None]:
import argparse
from datetime import date

def get_args():
    """Gets command line arguments"""
    
    # Create an ArgumentParser object
    parser = argparse.ArgumentParser()
    
    # Add arguments
    parser.add_argument("--name", "-n", default="user", help="User's name")
    parser.add_argument("--birthyear", "-y", default=1970, type=int, help="User's birth year")
    
    # Parse arguments
    args = parser.parse_args()
    
    return args


def main():
    """Says hello to the user"""
    args = get_args()
    name = args.name.strip().capitalize()
    age = date.today().year - args.birthyear
    
    print(f"\nHello, {name}! You are {age} years old today.")


if __name__ == "__main__":
    main()

You don't have to memorize anything. When you use `argparse`, some documentation is automatically created for you. You can access it with the `--help` or `-h` flag.

`python hello_arg.py -h`

which will print out:
```
usage: hello_arg.py [-h] [--name NAME] [--birthyear BIRTHYEAR]

optional arguments:
  -h, --help            show this help message and exit
  --name NAME, -n NAME  User's name
  --birthyear BIRTHYEAR, -y BIRTHYEAR
                        User's birth year
```

There are much more functionality in `parser.add_argument` that I will let you discover yourself. For example:
- `nargs` will let you specify how many inputs each argument should get. For multiple inputs, you will get a list of all the values.
- `action` will let you specify how the input is stored. We saw the default above, but you can also have a function like `sum` act on the inputs, or store a `True` or `False` value to create a flag.
- `const` will let you store a constant value to the variable, regardless of input.
- `metavar` and `dest` allow you to use different names for the variable and the input argument.

In [None]:
import argparse

def get_args():
    """Gets command line arguments"""
    
    # Create an ArgumentParser object
    parser = argparse.ArgumentParser()
    
    # Add argument with any number of inputs
    parser.add_argument("numbers", nargs="*", type=int, help="Numbers to be added")
    
    # Add argument as a flag
    parser.add_argument("--verbose", action="store_true", default=False, help="Flag for verbose print")
    
    # Parse arguments
    args = parser.parse_args()
    
    return args


def main():
    """Says hello to the user"""
    args = get_args()
    numbers_sum = sum(args.numbers)
    
    if args.verbose:
        print(f"\nThe sum of all the entered numbers is {numbers_sum}.")
    else:
        print(f"\nSum = {numbers_sum}")


if __name__ == "__main__":
    main()

`argparse` is not the only package to offer this functionality. `click` is another package, which is not in the standard library but is included with Anaconda, that I personally prefer.

## Configuration files

While command line arguments offer some nice functionality, it can still be limiting when you have a large number of parameters that may change.

Let's assume we have a bunch of CSV files from a simple compression tests. Assume that we have written a script to
- import the CSV file from each experiment
- cleaned the data
- calculated stress and strain (from the diameter and length of the specimen)
- plotted these and saved the figure to a new file
- calculated the Young's modulus (E)
- calculated the material wavespeed, $C_0 = \sqrt{E/\rho}$

We'll see how to do all this later in the course; but for now, let's focus on the following problem. You wrote the code a month ago for aluminum specimens with the following parameters: $d = 10 \, mm, l = 30 \, mm, \rho = 2700 \, kg/m^3$.

But today, you performed an experiment on steel specimens with the different parameters: $d = 8 \, mm, l = 25 \, mm, \rho = 7800 \, kg/m^3$.

You *could* have this at the top of your code:

In [None]:
diameter = 10e-3 # m
length = 30e-3 # m
density = 2700 # kg/m3

but we're trying to avoid having to edit our code each time we run it.

We saw how we can use command line arguments, and that's definitely and option, but in some cases, there are way too many parameters that need to be passed. This is where configuration files are useful.

There are a few options for these files.
- INI
- TOML
- YAML
- JSON

### TOML

A TOML file looks like this

```toml
name = "Compression test specimen parameters"
material = "Aluminum"
sample_numbers = [101, 105, 117, 125, 142]

[material_properties]
density = 2700  # kg/m3

[geometric_parameters]
diameter = 10e-3  # m
length = 30e-3  # Distance from the strain gage to the junction (m)

[oscilloscope_channels]
2 = "Input"
3 = "Intermediate"
4 = "Output"
```

Each parameter is stored as a `key = value` pair. You can separate paramters into collections with the `[...]` header.

The file can be imported as a dictionary with the `toml` package

In [1]:
import toml

parameters = toml.load("parameters.toml")
parameters

{'name': 'Compression test specimen parameters',
 'material': 'Aluminum',
 'sample_numbers': [101, 105, 117, 125, 142],
 'material_properties': {'density': 2700},
 'geometric_parameters': {'diameter': 0.01, 'length': 0.03},
 'oscilloscope_channels': {'2': 'Input', '3': 'Intermediate', '4': 'Output'}}

Each parameter/collection is imported as a `key: value` pair in a dictionary. For collections, the key is the name of the collection and the calue is another nested dictionary of the parameters within the collection.

In [None]:
parameters["sample_numbers"]

In [None]:
parameters["geometric_parameters"]["diameter"]

In [None]:
parameters["material_properties"]

So all your parameters can be easily imported and assigned to variables.

In [None]:
import argparse
import toml

def get_params():
    """Gets command line arguments"""
    
    # ArgumentParser with a single argument to get the TOML file
    parser = argparse.ArgumentParser()
    parser.add_argument("parameterfile", help="Path to parameter file.")
    args = parser.parse_args()
    
    # Import parameters
    parameters = toml.load(args.parameterfile)

    return parameters


def main():
    parameters = get_params()
    sample_numbers = parameters["sample_numbers"]
    diameter = parameters["geometric_parameters"]["diameter"]
    length = parameters["geometric_parameters"]["length"]
    density = parameters["material_properties"]["density"]
    
    print(f"Sample numbers = {sample_numbers}"
          f"Diameter = {diameter} m"
          f"Length = {length} m"
          f"Density = {density} kg/m3")
    
    
if __name__ == "__main__":
    main() 

### Other file types

Configuration files can also be saved. So let's use this features to convert from one file type to the other, so you can see the differences

In [None]:
import toml

with open("parameters_toml.toml", "w") as toml_file:
    toml.dump(parameters, toml_file)

with open("parameters_toml.toml", "r") as toml_file:
    parameters_toml = toml.load(toml_file)
    
parameters_toml

**YAML**

In [None]:
import yaml

with open("parameters.yaml", "w") as yaml_file:
    yaml.safe_dump(parameters, yaml_file)
    
with open("parameters.yaml", "r") as yaml_file:
    parameters_yaml = yaml.safe_load(yaml_file)
    
parameters_yaml

**JSON**

In [3]:
import json

with open("parameters.json", "w") as json_file:
    json.dump(parameters, json_file)
    
with open("parameters.json", "r") as json_file:
    parameters_json = json.load(json_file)
    
parameters_json

{'name': 'Compression test specimen parameters',
 'material': 'Aluminum',
 'sample_numbers': [101, 105, 117, 125, 142],
 'material_properties': {'density': 2700},
 'geometric_parameters': {'diameter': 0.01, 'length': 0.03},
 'oscilloscope_channels': {'2': 'Input', '3': 'Intermediate', '4': 'Output'}}

For a more human-readable JSON:

In [4]:
with open("parameters.json", "w") as json_file:
    json.dump(parameters, json_file, indent=4)


I personally prefer TOML for simple configurations, YAML for more complex cases. I use JSON for storing objects, as it's the least human-readable, but the closest to a python dictionary.