# Command-Line Programs

## Questions

- How can I write Python programs that will work like Unix command-line tools?

## Objectives

- Use the values of command-line arguments in a program.

- Handle flags and files separately in a command-line program.

The Jupyter Notebook and other interactive tools are great for prototyping code and exploring data, but sooner or later we will want to use our program in a pipeline or run it in a batch script to process thousands of data files. To do that, we need to make our programs work like other command-line tools (or even give it a Graphical User Interface - GUI - but that is well beyond the scope of this course). For example, we may want a program that reads a luminance dataset and prints the mean in Cd/m^2.


This program does exactly what we want - it prints the average luminance for a given dataset over the specified range.

###### Command Line
```
> python ./luminance_processor.py --mean simulation01.txt 
```
###### Output
```
343000
```
(actual values may differ)

We might also want to look at luminance values in several files one after another:

###### Command Line

```
> python ./luminance_processor.py --mean simulation01.txt simulation02.txt 
```

Our scripts should do the following:

1. If one or more filenames are given, read data from them and report statistics for each file separately.
2. Use the --mean, or --std, to determine what statistics to print.
3. Use an --experiment flag to signal that we wish to compare the given simulation with a given experiment file

To make this work, we need to know how to handle command-line arguments in a program.

# Command-Line Arguments

Using the text editor of your choice (*e.g.* Notepad++ or VSCode), save the following in a text file called `sys_version.py`:

```Python
import sys
print('version is', sys.version)
```

The first line imports a library called `sys`, which is short for “system”. It defines values such as `sys.version`, which describes which version of Python we are running. We can run this script from the command line like this:

###### Command Line
```
> python sys_version.py
```

###### Output
```
version is 3.4.3+ (default, Jul 28 2015, 13:17:50)
[GCC 4.9.3]
```

Create another file called `argv_list.py` and save the following text to it.

```Python
import sys
print('sys.argv is', sys.argv)
```

The strange name `argv` stands for “argument values”. Whenever Python runs a program, it takes all of the values given on the command line and puts them in the list `sys.argv` so that the program can determine what they were. If we run this program with no arguments:

###### Command Line
```
> python argv_list.py
```

###### Output
```
sys.argv is ['argv_list.py']
```

the only thing in the list is the full path to our script, which is always `sys.argv[0]`. If we run it with a few arguments, however:

###### Command Line
```
> python argv_list.py first second third
```

###### Output
```
sys.argv is ['argv_list.py', 'first', 'second', 'third']
```

then Python adds each of those arguments to that magic list.

With this in hand, let’s build a version of `luminance_processor.py` that always prints the luminance mean of a single data file over the range of -10 mm to 10 mm. The first step is to write a function that outlines our implementation, and a placeholder for the function that does the actual work. By convention this function is usually called `main`, though we can call it whatever we want. Let's call this new script `luminance.py`. Rather than retyping the code from previous lessons, let's copy/paste the two functions we've used in previous lessons: `find_data_cross_section` and `calculate_average_luminance`.

```Python
import sys
import numpy as np


def find_data_cross_section(simulation_filename):
    simulation = np.loadtxt(fname=simulation_filename, skiprows=52)

    x_simulation = simulation[:, 0]
    y_simulation = simulation[:, 1]
    L_simulation = simulation[:, 2]
    smallest_y = np.amin(abs(y_simulation))
    x_cross_section = x_simulation[y_simulation==smallest_y]
    luminance_cross_section = L_simulation[y_simulation==smallest_y]
    
    return x_cross_section, luminance_cross_section


def calculate_average_luminance(x, luminance):
    boolean_array = np.logical_and(x<=10., x>= -10.)
    average_luminance = np.mean(luminance[boolean_array])
    return average_luminance


def main():
    filename = sys.argv[1]
    x, luminance = find_data_cross_section(filename)
    mean = calculate_average_luminance(x, luminance)
    print('Average luminance for ' + filename + ' = ' + str(mean) + ' Cd/m^2')
```

Here’s a simple test:

```
python ./luminance.py simulation01.txt
```

There is no output because we have defined a function but haven’t actually called it. Let’s add a call to main:

```Python
import sys
import numpy as np


def find_data_cross_section(simulation_filename):
    simulation = np.loadtxt(fname=simulation_filename, skiprows=52)

    x_simulation = simulation[:, 0]
    y_simulation = simulation[:, 1]
    L_simulation = simulation[:, 2]
    smallest_y = np.amin(abs(y_simulation))
    x_cross_section = x_simulation[y_simulation==smallest_y]
    luminance_cross_section = L_simulation[y_simulation==smallest_y]
    
    return x_cross_section, luminance_cross_section


def calculate_average_luminance(x, luminance):
    boolean_array = np.logical_and(x<=10., x>= -10.)
    average_luminance = np.mean(luminance[boolean_array])
    return average_luminance


def main():
    filename = sys.argv[1]
    x, luminance = find_data_cross_section(filename)
    mean = calculate_average_luminance(x, luminance)
    print('Average luminance for ' + filename + ' = ' + str(mean) + ' Cd/m^2')
    
    
if __name__ == '__main__':
   main()
```

and run that:

###### Command Line
```
> python ./luminance.py simulation01.txt
```
###### Output
```
Average luminance for simulation01.txt = 315570.128125 Cd/m^2
```

> **Running vs Importing:** Running a Python script on the command line is very similar to importing that file in Python. The biggest difference is that we don’t expect anything to happen when we import a file, whereas when running a script, we expect to see some output printed to the console.
>
> In order for a Python script to work as expected when imported or when run as a script, we typically put the part of the script that produces output in the following if statement:
> ```Python
> if __name__ == '__main__':
>     main()  # Or whatever function produces output
> ```
> When you import a Python file, `__name__` is set to the name of that file (e.g., when importing readings.py, `__name__` is 'readings'). However, when running a script on the command line, `__name__` is always set to `'__main__'` in that script so that you can determine if the file is being imported or run as a script.


# The "Right" Way to do it

If our programs can take complex parameters or multiple filenames, we should not handle `sys.argv` directly. Instead, we should use Python’s `argparse` library, which handles common cases in a systematic way, and also makes it easy for us to provide sensible error messages for our users. We will not cover this module in this lesson, but you can go to Tshepang Lekhonkhobe’s [argparse tutorial](http://docs.python.org/3/howto/argparse.html) that is part of Python’s Official Documentation.

# Handling Multiple Files

The next step is to teach our program how to handle multiple files. 

We want our program to process each file separately, so we need a loop that executes once for each filename. If we specify the files on the command line, the filenames will be in `sys.argv`, but we need to be careful: `sys.argv[0]` will always be the name of our script, rather than the name of a file. We also need to handle an unknown number of filenames, since our program could be run for any number of files.

The solution to both problems is to loop over the contents of `sys.argv[1:]`. The ‘1’ tells Python to start the slice at location 1, so the program’s name is not included; since we have left off the upper bound, the slice runs to the end of the list, and includes all the filenames. Here is our changed program:

```Python
import sys
import numpy as np


def find_data_cross_section(simulation_filename):
    simulation = np.loadtxt(fname=simulation_filename, skiprows=52)

    x_simulation = simulation[:, 0]
    y_simulation = simulation[:, 1]
    L_simulation = simulation[:, 2]
    smallest_y = np.amin(abs(y_simulation))
    x_cross_section = x_simulation[y_simulation==smallest_y]
    luminance_cross_section = L_simulation[y_simulation==smallest_y]
    
    return x_cross_section, luminance_cross_section


def calculate_average_luminance(x, luminance):
    boolean_array = np.logical_and(x<=10., x>= -10.)
    average_luminance = np.mean(luminance[boolean_array])
    return average_luminance


def main():
    filenames = sys.argv[1:]
    for filename in filenames:
        x, luminance = find_data_cross_section(filename)
        mean = calculate_average_luminance(x, luminance)
        print('Average luminance for ' + filename + ' = ' + str(mean) + ' Cd/m^2')
    
    
if __name__ == '__main__':
   main()
```
and here it is in action:

###### Command Line
```
> python ./luminance.py simulation01.txt simulation02.txt
```

###### Output
```
Average luminance for simulation01.txt = 315570.128125 Cd/m^2
Average luminance for simulation02.txt = 67531.74755859375 Cd/m^2
```


# Handling Command-Line Flags

The next step is to teach our program to pay attention to the `--mean`, and `--std` flags. These always appear before the names of the files, so we can do this:

```Python
import sys
import numpy as np


def find_data_cross_section(simulation_filename):
    simulation = np.loadtxt(fname=simulation_filename, skiprows=52)

    x_simulation = simulation[:, 0]
    y_simulation = simulation[:, 1]
    L_simulation = simulation[:, 2]
    smallest_y = np.amin(abs(y_simulation))
    x_cross_section = x_simulation[y_simulation==smallest_y]
    luminance_cross_section = L_simulation[y_simulation==smallest_y]
    
    return x_cross_section, luminance_cross_section


def calculate_average_luminance(x, luminance):
    boolean_array = np.logical_and(x<=10., x>= -10.)
    average_luminance = np.mean(luminance[boolean_array])
    return average_luminance


def calculate_std_luminance(x, luminance):
    boolean_array = np.logical_and(x<=10., x>= -10.)
    std_luminance = np.std(luminance[boolean_array])
    return std_luminance


def main():
    action = sys.argv[1]
    filenames = sys.argv[2:]
    for filename in filenames:
        x, luminance = find_data_cross_section(filename)
        if action == '--mean':
            mean = calculate_average_luminance(x, luminance)
            print('Average luminance for ' + filename + ' = ' + str(mean) + ' Cd/m^2')
        elif action == '--std':
            std = calculate_std_luminance(x, luminance)
            print('Standard deviation on the luminance for ' + filename + ' = ' + str(std) + ' Cd/m^2')
        else:
            # this raises an error, and provides the message given as argument.
            raise TypeError('No valid action supplied (allowed actions are "--mean" or "--std"')

    
if __name__ == '__main__':
   main()
```
This works:

###### Command Line
```
> python .\luminance.py --std simulation01.txt simulation02.txt
```
###### Output
```
Standard deviation on the luminance for simulation01.txt = 25504.704660288367 Cd/m^2
Standard deviation on the luminance for simulation02.txt = 6609.0550659955 Cd/m^2
```

OK, we're nearly there. Our `main()` function is getting a bit messy, plus luminance calculations look like they have a lot of extra code. Let's simplify our code a bit. The following two functions contain very similar code, is there any way we could combine them?

```Python

# Let's take these two functions and combine them, by using a new argument
def calculate_average_luminance(x, luminance):
    boolean_array = np.logical_and(x<=10., x>= -10.)
    average_luminance = np.mean(luminance[boolean_array])
    return average_luminance


def calculate_std_luminance(x, luminance):
    boolean_array = np.logical_and(x<=10., x>= -10.)
    std_luminance = np.std(luminance[boolean_array])
    return std_luminance

```

As it happens, yes! We can pass the action to the function and depending on the value of `action` we can do one of the two different calculations we need.

```Python
# This new function also incorporates the conditional from the main loop as well
def calculate_luminance_stats(x, luminance, action):
    boolean_array = np.logical_and(x<=10., x>= -10.)
    if action == '--mean':
        statistic = np.mean(luminance[boolean_array])
    elif action == '--std':
        statistic = np.std(luminance[boolean_array])
    else:
        raise TypeError('No valid action supplied (allowed actions are "--mean" or "--std")')
    return statistic

```

The code now looks like this:

```Python
import sys
import numpy as np


def find_data_cross_section(simulation_filename):
    simulation = np.loadtxt(fname=simulation_filename, skiprows=52)

    x_simulation = simulation[:, 0]
    y_simulation = simulation[:, 1]
    L_simulation = simulation[:, 2]
    smallest_y = np.amin(abs(y_simulation))
    x_cross_section = x_simulation[y_simulation==smallest_y]
    luminance_cross_section = L_simulation[y_simulation==smallest_y]
    
    return x_cross_section, luminance_cross_section


def calculate_luminance_stats(x, luminance, action):
    boolean_array = np.logical_and(x<=10., x>= -10.)
    if action == '--mean':
        statistic = np.mean(luminance[boolean_array])
    elif action == '--std':
        statistic = np.std(luminance[boolean_array])
    else:
        raise TypeError('No valid action supplied (allowed actions are "--mean" or "--std")')
    return statistic


def main():
    action = sys.argv[1]
    filenames = sys.argv[2:]
    for filename in filenames:
        x, luminance = find_data_cross_section(filename)
        simulation_statistic = calculate_luminance_stats(x, luminance, action)
        print(action[2:] + ' luminance for ' + filename + ' = ' + str(simulation_statistic) + ' Cd/m^2')

    
if __name__ == '__main__':
   main()
```

Which is simpler than before! We can even create a dynamic print statement by slicing `action` .

Our code is a bit tidier now. So finally, we need to incorporate optional comparison with experimental data.

To do this we are going to introduce a "flag - value" pairing. The user must provide `--experiment filename` as a pair for it to work. We shall incorporate this after the first action, and before the filenames.


```Python
import sys
import numpy as np


def find_data_cross_section(simulation_filename):
    simulation = np.loadtxt(fname=simulation_filename, skiprows=52)

    x_simulation = simulation[:, 0]
    y_simulation = simulation[:, 1]
    L_simulation = simulation[:, 2]
    smallest_y = np.amin(abs(y_simulation))
    x_cross_section = x_simulation[y_simulation==smallest_y]
    luminance_cross_section = L_simulation[y_simulation==smallest_y]
    
    return x_cross_section, luminance_cross_section


# we introduce a new function to load the experimental data
def load_experiment_data(experiment_filename):
    experiment = np.loadtxt(fname=experiment_filename, delimiter=',')
    return experiment[:,0], experiment[:,1]


def calculate_luminance_stats(x, luminance, action):
    boolean_array = np.logical_and(x<=10., x>= -10.)
    if action == '--mean':
        statistic = np.mean(luminance[boolean_array])
    elif action == '--std':
        statistic = np.std(luminance[boolean_array])
    else:
        raise TypeError('No valid action supplied (allowed actions are "--mean" or "--std")')
    return statistic


def main():
    action1 = sys.argv[1]
    action2 = sys.argv[2]
    
    if action2 == '--experiment':
        experiment_filename = sys.argv[3]
        x_experiment, luminance_experiment = load_experiment_data(experiment_filename)
        filenames = sys.argv[4:]
    else:
        experiment_filename = ''
        filenames = sys.argv[2:]
        
    for filename in filenames:
        x, luminance = find_data_cross_section(filename)
        simulation_statistic = calculate_luminance_stats(x, luminance, action1)
        print(action1[2:] + ' luminance for ' + filename + ' = ' + str(simulation_statistic) + ' Cd/m^2')
        if bool(experiment_filename):
            experiment_statistic = calculate_luminance_stats(x_experiment, luminance_experiment, action1)
            perc_diff = (experiment_statistic - simulation_statistic)*100/experiment_statistic
            print('Percentage difference with experimental value = ' + str(perc_diff) + ' %')

    
if __name__ == '__main__':
   main()
```

The code checks if `action2` equals `'--experiment'`. If it does then the next value is parsed as the experiment file name, otherwise the rest of the arguments are assumed to be simulation file names.

Then after the statistics have been calculated we use `bool()` to check if an experimental file has been passed. If it has, we do the experimental difference calculation and print the difference.

That’s better. In fact, that’s done: The program now does everything we set out to do.

# Exercise 1* - Arithmetic on the Command Line

Write a command-line program that does addition and subtraction:

###### Command Line
```
> python arith.py add 1 2
```
###### Output
```
3
```

###### Command Line
```
> python arith.py subtract 3 4
```
###### Output
```
-1
```

In [None]:
# Create your solution in a text editor, but feel free to use this space to test things out!

# Exercise 2* - Finding Particular Files

Using the `glob` module introduced earlier, write a simple version of the unix command `ls` that shows files in the current directory with a particular suffix. A call to this script should look like this:

###### Command Line
```
> python my_ls.py py
```
###### Output
```
left.py
right.py
zero.py
```

In [None]:
# Create your solution in a text editor, but feel free to use this space to test things out!

# Exercise 3 - Adding a Help Message

Separately, modify `luminance.py` so that if no parameters are given (i.e., no action is specified and no filenames are given), it prints a message explaining how it should be used.

In [2]:
# Create your solution in a text editor, but feel free to use this space to test things out!

# Exercise 4 - Adding a Default Action

Separately, modify `luminance.py` so that if no action is given it displays the means of the data.

In [2]:
# Create your solution in a text editor, but feel free to use this space to test things out!

# Exercise 5* - Generate an Error Message

Write a program called `check_arguments.py` that prints the intended usage then exits the program if no arguments are provided. (Hint: You can use `sys.exit()` to exit the program.)

###### Command Line
```
> python check_arguments.py
```
###### Output
```
usage: python check_argument.py filename.txt
```
###### Command Line
```
> python check_arguments.py filename.txt
```
###### Output
```
Thanks for specifying arguments!
```

In [None]:
# Create your solution in a text editor, but feel free to use this space to test things out!

# Key Points

- The `sys` library connects a Python program to the system it is running on.

- The list `sys.argv` contains the command-line arguments that a program was run with.

This work is derived from work that is Copyright © [Software Carpentry](http://software-carpentry.org/), under the CC-by [license](https://creativecommons.org/licenses/by/4.0/). The text has been paraphrased partially in some locations, with some additional exercises and images included, but the vast majority of the content is derived from the Software Carpentry lesson.