# `Python for academics` : Postprocessing results

by **Kamila Zdybał**

[`https://kamilazdybal.github.io`](https://kamilazdybal.github.io)

In this notebook, we explore various ways in which Python can help us postprocess your research results.

<a id=top-page></a>

***

## Table of contents

- [**Running postprocessing scripts**](#running-scripts)
    - [Exercise 1](#running-scripts-ex-1)
    - [Exercise 2](#running-scripts-with-argparse-ex-2)
    - [Exercise 3](#running-scripts-with-argparse-ex-3)
***

In [1]:
import numpy as np

<a id=large-headers></a>
***

## Running postprocessing scripts

[**Go to the top ↑**](#top-page)

<a id=running-scripts-ex-1></a>
***
### Exercise 1

[**Go to the top ↑**](#top-page)

<a href="https://youtu.be/fUt7Eshf0lU">
  <img src="https://img.shields.io/badge/youtube-firebrick?style=for-the-badge&logo=youtube&logoColor=white" alt="YouTube Badge"/>
</a>

In this exercise, we're preparing a Python script for postprocessing asynchronously arriving data.

We want to postprocess a set of results that have finished running thus far and run postprocessing only once on each case:

```
Hydrogen-case-008.csv
Hydrogen-case-015.csv
Hydrogen-case-007.csv
Hydrogen-case-005.csv
Hydrogen-case-013.csv
Hydrogen-case-014.csv
Hydrogen-case-006.csv
Hydrogen-case-009.csv
Hydrogen-case-003.csv
Hydrogen-case-019.csv
```

Generate empty results files:

In [2]:
n_cases = 20

In [3]:
completed_runs = np.random.choice(range(1,n_cases+1), 10, replace=False)

In [5]:
sorted(completed_runs)

[3, 4, 5, 7, 9, 11, 16, 17, 19, 20]

In [6]:
for i in completed_runs:

    f = open('../data/Hydrogen-case-' + str(i).zfill(2) + '.csv', 'x')

In [7]:
import os

In [13]:
for case in range(1,n_cases+1):

    if os.path.exists('../data/Hydrogen-case-' + str(case).zfill(2) + '.csv'):

        if not os.path.exists('../data/postprocessed-Hydrogen-case-' + str(case).zfill(2) + '.csv'):

            print('Postprocessing case ' + str(case))
        
            #
            #
            #

            np.savetxt('../data/postprocessed-Hydrogen-case-' + str(case).zfill(2) + '.csv', ([case]), delimiter=',', fmt='%.16e')

        else:

            print('Case ' + str(case) + ' already postprocessed.')

    else:

        print('Case ' + str(case) + ' not there yet!')

Postprocessing case 1
Postprocessing case 2
Case 3 already postprocessed.
Case 4 already postprocessed.
Case 5 already postprocessed.
Postprocessing case 6
Case 7 already postprocessed.
Postprocessing case 8
Case 9 already postprocessed.
Postprocessing case 10
Case 11 already postprocessed.
Postprocessing case 12
Postprocessing case 13
Postprocessing case 14
Postprocessing case 15
Case 16 already postprocessed.
Case 17 already postprocessed.
Postprocessing case 18
Case 19 already postprocessed.
Case 20 already postprocessed.


In [10]:
remaining_cases = [i for i in range(1,n_cases+1) if i not in completed_runs]

In [11]:
remaining_cases

[1, 2, 6, 8, 10, 12, 13, 14, 15, 18]

In [12]:
for i in remaining_cases:

    f = open('../data/Hydrogen-case-' + str(i).zfill(2) + '.csv', 'x')

<a id=running-scripts-with-argparse-ex-2></a>
***
### Exercise 2

[**Go to the top ↑**](#top-page)

<a href="">
  <img src="https://img.shields.io/badge/youtube-firebrick?style=for-the-badge&logo=youtube&logoColor=white" alt="YouTube Badge"/>
</a>

In this exercise, we're learning to set numeric parameters of a Python script from the command line using the `argparse` library.

The naive way to set parameters for your postprocessing script is by hardcoding them in the script:

```python
import numpy as np

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
# Case settings
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

# Set parameters:
input_variables = [0, 1, 2, 3, 4]
n_epochs = 1000
learning_rate = 0.001
initializer = 'GlorotUniform'
min_random_seed, max_random_seed = (0,10)

# List of random seeds to loop over:
random_seeds_list = [i for i in range(min_random_seed, max_random_seed)]

# Names of input variables:
input_names = np.array(['X' + str(i+1) for i in range(0,10)])

# Create a tag for this training session:
case_run_name = 'inputs-' + '-'.join(input_names[input_variables]) + '-n-epochs-' + str(n_epochs) + '-lr-' + str(learning_rate) + '-initializer-' + initializer

print(case_run_name)

print()

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
# Neural network training with the current parameters starts here...
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

# ...

# ...

# ...


```

With the use of the `argparse` library, we can create default placeholders for those parameters and change their values from the command line when executing the script:

```python
import argparse
import numpy as np

def argument_parser():

    parser = argparse.ArgumentParser()
    
    parser.add_argument('--input_variables',
                        type=int,
                        default=[0,1,2,3,4],
                        nargs="+",
                        help='Indices for the input variables')
    
    parser.add_argument('--n_epochs',
                        type=int,
                        default=1000,
                        help='Number of epochs')
    
    parser.add_argument('--learning_rate',
                        type=float,
                        default=0.001,
                        help='Learning rate')
    
    parser.add_argument('--initializer',
                        type=str,
                        default='GlorotUniform',
                        help='Initialization of weights')
    
    parser.add_argument('--random_seeds',
                        type=int,
                        default=[0,10],
                        nargs="+",
                        help='Min and max random seed')
    
    return parser

parser = argument_parser()

args = parser.parse_args()

print(args)

print()

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
# Case settings
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

# Populate parameter values from argument parser:
input_variables = vars(args).get('input_variables')
n_epochs = vars(args).get('n_epochs')
learning_rate = vars(args).get('learning_rate')
initializer = vars(args).get('initializer')
min_random_seed, max_random_seed = tuple(vars(args).get('random_seeds'))

# List of random seeds to loop over:
random_seeds_list = [i for i in range(min_random_seed, max_random_seed)]

# Names of input variables:
input_names = np.array(['X' + str(i+1) for i in range(0,10)])

# Create a tag for this training session:
case_run_name = 'inputs-' + '-'.join(input_names[input_variables]) + '-n-epochs-' + str(n_epochs) + '-lr-' + str(learning_rate) + '-initializer-' + initializer

print(case_run_name)

print()

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
# Neural network training with the current parameters starts here...
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

# ...

# ...

# ...


```

An example usage of the latter is by typing in the terminal:

```bash
python script.py --input_variables 0 2 4 6 8 --n_epochs 2000 --learning_rate 0.1 --initializer 'RandomUniform' --random_seeds 0 5
```

Executing the above script with these parameters, we should see in the terminal output:

```text
Namespace(input_variables=[0, 2, 4, 6, 8], n_epochs=2000, learning_rate=0.1, initializer='RandomUniform', random_seeds=[0, 5])

inputs-X1-X3-X5-X7-X9-n-epochs-2000-lr-0.1-initializer-RandomUniform

Training session for random seed 0...
Training session for random seed 1...
Training session for random seed 2...
Training session for random seed 3...
Training session for random seed 4...

```

<a id=running-scripts-with-argparse-ex-3></a>
***
### Exercise 3

[**Go to the top ↑**](#top-page)

<a href="">
  <img src="https://img.shields.io/badge/youtube-firebrick?style=for-the-badge&logo=youtube&logoColor=white" alt="YouTube Badge"/>
</a>

In this exercise, we're learning to set boolean parameters of a Python script from the command line using the `argparse` library.



***