## Earth Analytics Applications - Week 7 

| Time         | Topic                                                   | 
|--------------|---------------------------------------------------------|
| 12:45-1:30pm | Review of Functions and Modularization          | 
| 1:30-2:00pm  | Activity: Execute Function from Python Script (.py)      | 
| 2:00-2:30pm  | Activity: Expand Code to Add Checks/Tests  |
| 2:30-3:30pm  | Time slots available for Project Check-in Meetings      | 

## Functions - What Are They Again?

A unit of code that executes a specific, outlined task on command, using sets of input parameters to specify how the task is performed.

### input parameter(s) –> function does something –> output result(s)

Useful for organizing and executing generalizable code that you need to run frequently:
* importing data
* aggregating or summarizing data
* calculating indices such as normalized difference vegetation index, NDVI
* plotting data

You can use existing functions from commonly used packages (i.e. `geopandas.read_file()` or `rasterio.open()`) or <a href="https://www.earthdatascience.org/courses/earth-analytics-bootcamp/functions/intro-functions/" target="_blank">write your own functions</a> when one does not already exist.

## Key Components of Python Functions

1. `def` keyword
2. Function name
3. Input parameters/arguments
4. Docstring
5. `return` statement

In [1]:
# Example of a simple function
def add_five(x):
    """Adds the numeric value 5 to input value
    
    Parameters
    ----------
    x : numeric value (e.g. integer, float)
            
    Returns
    ----------
    input data with values increased by 5
    """
    
    return (x + 5)

In [2]:
# How do we call a Python function again?
add_five(5)

10

Now, check out the more complex <a href="https://github.com/geopandas/geopandas/blob/master/geopandas/io/file.py" target="_blank">geopandas.read_file() function</a> on GitHub and identify each of the key components.

**Do we have to fully understand all of the detailed code in order to use the function?**

Simply put - no. We only need to know the appropriate inputs and outputs, so we can provide the correct file to get back our GeoDataframe. 

This one of the primary benefits of writing and using functions!

## Benefits of Using and Writing Functions

1. Reusability (e.g. by yourself or others) 
2. Fewer variables (e.g. temporary variables not needed outside of the function are not stored)
3. Documentation/Reproducibility (e.g. for yourself or others) 
4. Easier updates to code (i.e. update only the function definition)
5. Testing (e.g. include checks and tests on inputs and outputs directly within function)
6. Modularity (i.e. stand-alone units of code that can executed independently and asynchronously)

Sometimes you may need to write custom functions to:
* complete tasks that do not already have published functions
* combine existing functions into one function for a specific task 

## Generalizing Custom Functions

The best functions complete one specific task but are generalizable for more than one application. 

Example: Goal is a function that can be used to calculate Normalized Difference Vegetation Index (NDVI)

```python
(Near_Infrared - Red) / (Near_Infrared + Red)
```

### How Can We Generalize This For Broader Use?

We know that other indices can be calculated using the same formula:

```python
# Normalized Burn Ratio (NBR)
(near_infrared - shortwave_infrared) / (near_infrared + shortwave_infrared)
```

```python
# Normalized Difference Water Index (NDWI)
(green - near_infrared) / (Green + near_infrared)
```

For example: 
```python
(band1 - band2) / (band1 + band2)
```

Even better for conciseness and readibility:
```python
(b1 - b2) / (b1 + b2)
```

In [3]:
def norm_diff(b1, b2):
    """Calculate the normalized difference of two arrays of same shape.
    Math will be calculated (b1-b2) / (b1+b2). 
    
    Parameters
    ----------
    b1, b2 : numpy arrays
        Two numpy arrays of same shape.
    
    Returns
    ----------
    n_diff : numpy array
        The element-wise result of (b1-b2) / (b1+b2) calculation. 
    """
    n_diff = (b1 - b2) / (b1 + b2)
        
    return n_diff

## What About Python Modules and Packages?

Modules are sets of code and functions that provide a suite of related functionality that can be imported into your code environment. 

In `Python`, modules can be imported from: 
* installed packages and libraries (e.g. `earthpy.spatial` where `spatial` is a module of `earthpy`)
* custom `.py` scripts

Users only need to import the module and call functions, rather than copy/paste the function definition into every notebook, script, etc, where it is needed. 

Note that the core modules of many Python packages/libraries provide a set of functions without the need to import additional modules (e.g. `import numpy`, `import geopandas`). 

In [4]:
import numpy as np

In [5]:
# Use the `.array` function to create two example arrays
nir_band = np.array([[6, 7, 8, 9, 10], [16, 17, 18, 19, 20]])
red_band = np.array([[1, 2, 3, 4, 5], [11, 12, 13, 14, 15]])

Note that we did not have to name a specific module from `numpy` in order to access the `.array` function. 

What if you wanted to work with masked arrays? Now, you can import the <a href="https://docs.scipy.org/doc/numpy/reference/maskedarray.generic.html" target="_blank">ma module</a> from `numpy` using the syntax:

`import package_name.module_name`

In [6]:
# Compare this import statement to previous one for numpy
import numpy.ma as ma

In [7]:
# Now we can call functions from ma
nir_band_masked = ma.masked_where(nir_band <= 8, nir_band)
print(nir_band_masked)

[[-- -- -- 9 10]
 [16 17 18 19 20]]


### Call norm_diff Function

We can call the `norm_diff()` function directly in this notebook because we have defined it in a previously executed cell. 

In [10]:
# Create a variable ndvi by calling the norm_diff
# with nir_band and red_band arrays previously created
ndvi=norm_diff(nir_band, red_band)
print (ndvi)

[[ 0.71428571  0.55555556  0.45454545  0.38461538  0.33333333]
 [ 0.18518519  0.17241379  0.16129032  0.15151515  0.14285714]]


What if you know that you will be using this function a lot, especially now that it is written in a generalized manner? 

It would be repetitive to simply copy and paste this function definition into every notebook in which we want to use it.

How can we create a module from which we can import and call the `norm_diff` function?

## Modularize Code With Scripts

Recall that in `Python`, custom modules are created as `.py` scripts. Thus, we can create our own modules by saving functions into Python scripts (`.py`) and importing them into our notebooks. 

These `.py` scripts can contain many functions that have been grouped together in the same file based on shared characteristics (e.g. all functions that calculate an index stored in `calc_indices.py`). 

### Differences Between Python Scripts and Jupyter Notebooks


| Characteristic | Python Scripts        | Jupyter Notebooks                                                   | 
|--------------|--------------|---------------------------------------------------------|
|File extension | .py | .ipynb         | 
|Created in | Text or Code Editor (e.g. Atom, Sublime, PyCharm) | Jupyter Notebook        | 
|Executed in | Terminal, Jupyter Notebook, other interactive development environments (IDEs) | Jupyter Notebook          | 
|Usage | To automate a task (terminal); to provide functions and classes (as an imported module) | To run code interactively; to organize and visualize reproducible results  | 
|Contains | Functions (e.g. processing, analysis) | Calls to functions, results, visualizations | 
|Best for... | Code that users do not need to interact with (e.g. branch workflows; building blocks that complete specific tasks and/or provide intermediary products/variables)  | Code that you want users to interact with functions and visualize results (e.g. primary workflow; the story/progression of your workflow) | 

## Activity: Execute Function from Python Script (.py)

### Create the Python Script (.py)

#### 1. Open a text editor (e.g. Atom, Sublime, PyCharm; Jupyter Notebook  even has a built-in text editor!)
* You can access the Jupyter Notebook built-in text editor by selecting `New` > `Text File` in the menu for the Jupyter Dashboard.

#### 2. Create a new file in the same working directory as this notebook.
* This is for ease of import. If the scripts are in other directories, there is <a href="https://stackoverflow.com/questions/34478398/import-local-function-from-a-module-housed-in-another-directory-with-relative-im" target="_blank">additional code needed </a> to provide Jupyter Notebook with the correct path. 

#### 3. Copy and paste your function definition for norm_diff() into this new file. 
* It is common to add a docstring to the top of the .py to explain the purpose of the file
* e.g. `"""A module for calculating normalized indices on arrays"""`

#### 4.  Save the new file with a clear, concise name and a `.py` file extension (e.g. `calc_indices.py`).
* Note: module names cannot have spaces or dashes, so use only underscores in the names.

### Import .py as Module in Jupyter Notebook

Now that you have a `.py` script (e.g. `calc_indices.py`) in the same working directory as this notebook, you can import this script as a module using the following syntax:

```python
import module_name
```
where the module_name is the name of the file without the extension type (e.g. `calc_indices`).

In [11]:
# Add new import for your .py script
# Your .py script should be in the same directory as this notebook
import os
import numpy as np
import calcsample

os.getcwd()

'/Users/shannonwhite/git/volcano-risk-analysis/notebooks'

In [12]:
# Call help on an imported module using the module name
help (calcsample)

Help on module calcsample:

NAME
    calcsample - Calculates differences for several different vegetatioin indecies

FUNCTIONS
    norm_diff(b1, b2)
        Calculate the normalized difference of two arrays of same shape.
        Math will be calculated (b1-b2) / (b1+b2). 
        
        Parameters
        ----------
        b1, b2 : numpy arrays
            Two numpy arrays of same shape.
        
        Returns
        ----------
        n_diff : numpy array
            The element-wise result of (b1-b2) / (b1+b2) calculation.

FILE
    /Users/shannonwhite/git/volcano-risk-analysis/notebooks/calcsample.py




In [13]:
# Call help on function using `module_name.function_name`
help (calcsample.norm_diff)

Help on function norm_diff in module calcsample:

norm_diff(b1, b2)
    Calculate the normalized difference of two arrays of same shape.
    Math will be calculated (b1-b2) / (b1+b2). 
    
    Parameters
    ----------
    b1, b2 : numpy arrays
        Two numpy arrays of same shape.
    
    Returns
    ----------
    n_diff : numpy array
        The element-wise result of (b1-b2) / (b1+b2) calculation.



In [14]:
# Create numpy array inputs for function
nir_band = np.array([[6, 7, 8, 9, 10], [16, 17, 18, 19, 20]])
red_band = np.array([[1, 2, 3, 4, 5], [11, 12, 13, 14, 15]])

In [15]:
# Call function from .py script using `module_name.function_name`
ndvi = calcsample.norm_diff(b1=nir_band, b2=red_band)
print(ndvi)

[[ 0.71428571  0.55555556  0.45454545  0.38461538  0.33333333]
 [ 0.18518519  0.17241379  0.16129032  0.15151515  0.14285714]]


Note that we used the syntax: `module_name.function_name` to call the function from the module.

**How is the following code different, and what is it doing?**

In [16]:
ndvi = norm_diff(b1=nir_band, b2=red_band)
print(ndvi)

[[ 0.71428571  0.55555556  0.45454545  0.38461538  0.33333333]
 [ 0.18518519  0.17241379  0.16129032  0.15151515  0.14285714]]


Delete cell 3 where you defined the function in this notebook. Then, select `Kernel` > `Restart & Run All`.

**Does this code below still work?**

In [17]:
ndvi = norm_diff(b1=nir_band, b2=red_band)
print(ndvi)

[[ 0.71428571  0.55555556  0.45454545  0.38461538  0.33333333]
 [ 0.18518519  0.17241379  0.16129032  0.15151515  0.14285714]]


## Activity: Expand Code to Add Checks and Tests

### Why Check Your Code? 

* Check that inputs are of correct type/format (e.g. both arrays are two-dimensional)
* Check that necessary preqrequisites have been executed or exist (e.g. an directory named output)
* Test assumptions of code (e.g. is it actually doing what you think it is?)
* Identify points of failure (e.g. where is the code failing - input data, processing/analysis, writing out data?)
* Identify something about the function that you did not consider (e.g. function is applicable to additional data types, function needs additional code to handle special circumstances)

## How Can You Check Your Code?

### Conditional Statements

This is frequently referred to as "asking permission" before code can execute. 

```python
if condition_1:
    action_1
elif condition_2:
    action_2
else: 
    action_3
```

Thinking about the `norm_diff` function: what is a condition that we might want to check for the code to run successfully? 

Take a look at the equation again, and think about what the intended inputs are for b1 and b2 ("arrays of same shape"). 

```python
(b1 - b2) / (b1 + b2)
```

In [18]:
# Create three dimensional numpy array for testing
nir_band_3d = np.array([[[6, 7, 8, 9, 10], [16, 17, 18, 19, 20]]])
nir_band_3d.ndim

3

In [19]:
# Note that the if/else is used when calling the function
if nir_band_3d.shape == red_band.shape:
    ndvi = calc_indices.norm_diff(b1=nir_band_3d, b2=red_band)
    print(ndvi)
else:
    print("Input arrays are not of same shape")

Input arrays are not of same shape


### `Try` and `except`

These statements allow the a code block to try to execute first, and then do something else if the code is not executed successfully. This is known as <a href="https://www.w3schools.com/python/python_try_except.asp" target="_blank">asking for forgiveness, rather than permission</a>.

```python
try:
    action_1
except: 
    print("something went wrong; action_1 not executed")
```

In [20]:
# Example of asking for permission
x = 5
y = 0

if y != 0:
    print(x/y)
else:
    print("Division by zero not allowed!")

Division by zero not allowed!


In [21]:
# Instead, ask for forgiveness
try:
    print(x/y)
except:
    print("Division by zero not allowed!")

Division by zero not allowed!


In [22]:
# In this example, try/except is used when calling the function
try:
    ndvi = calc_indices.norm_diff(b1=nir_band_3d, b2=red_band)
    print(ndvi)
except:
    print("Input arrays are not of same shape")
    
print(ndvi.ndim)

Input arrays are not of same shape
2


**Caveat: Do you see anything wrong with `try` and `except` statements?**

In [23]:
# Another example of try and except
print(os.getcwd())
directory = "test"

# Create folder if it does not exist in working directory
try:
    os.makedirs(directory)
except:
    print("Directory already exists!")

/Users/shannonwhite/git/volcano-risk-analysis/notebooks


**Caveat: How do you know that the failure to make a new directory was due to it already existing?**

### Exception Handling

Also commonly referred to as error handling, exceptions are used to check for specific types of errors that can occur when running code. 

Though it is more common to use exception handling when writing code that you want to publish as packages or libraries or when you are writing applications for end users, it is useful to know about these, so that you are familiar with them when you receive exception messages by others' code.

#### Commonly Used Exceptions

<img src="http://drive.google.com/uc?export=view&id=1tAonGpNy8IXWxgm4iFipVhGijsAwZt6t">

<a href="https://www.datacamp.com/community/tutorials/exception-handling-python" target="_blank">Data Camp</a>

In [24]:
# Same try and except with exception FileExistsError
directory = "test"

try:
    os.makedirs(directory)
except FileExistsError:
    print("Directory already exists!")
    
    # Uncomment line below to simply have except do nothing if no print
    #pass  

Directory already exists!


In [25]:
# You could also explore error handling in your function definitions
# Example of ValueError
def norm_diff(b1, b2):
    """Calculate the normalized difference of two arrays of same shape.
    Math will be calculated (b1-b2) / (b1+b2). 
    
    Parameters
    ----------
    b1, b2 : numpy arrays
        Two numpy arrays of same shape.
    
    Returns
    ----------
    n_diff : numpy array
        The element-wise result of (b1-b2) / (b1+b2) calculation. 
    """
    if not (b1.shape == b2.shape):
        raise ValueError("Inputs arrays should have the same dimensions")
        
    n_diff = (b1 - b2) / (b1 + b2)
        
    return n_diff

In [26]:
ndvi = norm_diff(b1=nir_band_3d, b2=red_band)

ValueError: Inputs arrays should have the same dimensions

### Old Fashioned Trial and Error

Maybe the code is executing just fine, but maybe you are not getting the output in the most optimal format. It may not be clear to you until you run the function many times with different inputs.

In [27]:
# Create numpy array inputs for function
# Note that we are creating a zero in the numerator with -15
nir_band = np.array([[6, 7, 8, 9, 10], [16, 17, 18, 19, -15]])
red_band = np.array([[1, 2, 3, 4, 5], [11, 12, 13, 14, 15]])

In [29]:
# Produces infinity values due to divide by zero
ndvi = calcsample.norm_diff(b1=nir_band, b2=red_band)
print(ndvi)

[[ 0.71428571  0.55555556  0.45454545  0.38461538  0.33333333]
 [ 0.18518519  0.17241379  0.16129032  0.15151515        -inf]]


  n_diff = (b1 - b2) / (b1 + b2)


What if we would rather have the output be a masked numpy array if there are any infinite values or nan values?

In [30]:
# From earthpy package
import warnings

def normalized_diff(b1, b2):
    """Take two numpy arrays and calculate the normalized difference.
    Math will be calculated (b1-b2) / (b1+b2). The arrays must be of the
    same shape.
    Parameters
    ----------
    b1, b2 : numpy arrays
        Two numpy arrays that will be used to calculate the normalized difference.
        Math will be calculated (b1-b2) / (b1+b2).
    Returns
    ----------
    n_diff : numpy array
        The element-wise result of (b1-b2) / (b1+b2) calculation. Inf values are set
        to nan. Array returned as masked if result includes nan values.
    """
    if not (b1.shape == b2.shape):
        raise ValueError("Both arrays should have the same dimensions")

    n_diff = (b1 - b2) / (b1 + b2)

    # Set inf values to nan and provide custom warning
    if np.isinf(n_diff).any():
        warnings.warn(
            "Divide by zero produced infinity values that will be replaced with nan values",
            Warning)
        n_diff[np.isinf(n_diff)] = np.nan

    # Mask invalid values
    if np.isnan(n_diff).any():
        n_diff = np.ma.masked_invalid(n_diff)

    return n_diff

In [31]:
ndvi = normalized_diff(b1=nir_band, b2=red_band)
print(ndvi)

[[0.7142857142857143 0.5555555555555556 0.45454545454545453
  0.38461538461538464 0.3333333333333333]
 [0.18518518518518517 0.1724137931034483 0.16129032258064516
  0.15151515151515152 --]]




Last, checking your code is not only about identifying potential issues and problems. Sometimes you find that your code is applicable to more situations that you originally thought.

Recall the `add_five()` we previously defined. According to the docstring, what are the appropriate inputs?

``` python 
def add_five(x):
    """Adds the numeric value 5 to input data
    
    Parameters
    ----------
    x : numeric value (e.g. integer, float)
            
    Returns
    ----------
    input data with values increased by 5
    """
    
    return (x + 5)
```    

In [32]:
# Works on numpy arrays as well!
print(nir_band)
nir_band_plus = add_five(nir_band)

print(nir_band_plus)

[[  6   7   8   9  10]
 [ 16  17  18  19 -15]]
[[ 11  12  13  14  15]
 [ 21  22  23  24 -10]]


In [33]:
# Update docstring to include numpy arrays
def add_five(x):
    """Adds the numeric value 5 to input data
    
    Parameters
    ----------
    x : numeric value (e.g. integer, float); numpy array
            
    Returns
    ----------
    input data type with values increased by 5
    """
    
    return (x + 5)

## Additional Resources

#### Writing Custom Modules
* https://www.digitalocean.com/community/tutorials/how-to-write-modules-in-python-3
* https://www.youtube.com/watch?v=CqvZ3vGoGs0
* https://www.python-course.eu/python3_modules_and_modular_programming.php
    
#### Review of Functions and Modularization
* https://www.oreilly.com/library/view/head-first-python/9781491919521/ch04.html

#### Try and Except
https://www.w3schools.com/python/python_try_except.asp 
https://www.youtube.com/watch?v=NIWwJbo-9_8

#### Exception Handling
https://www.datacamp.com/community/tutorials/exception-handling-python
https://www.python-course.eu/python3_exception_handling.php
https://www.geeksforgeeks.org/built-exceptions-python/

#### Jupyter Notebook Cell Magic for Scripts
* https://stackoverflow.com/questions/21034373/how-to-load-edit-run-save-text-files-py-into-an-ipython-notebook-cell
* https://nbviewer.jupyter.org/github/ipython/ipython/blob/1.x/examples/notebooks/Cell%20Magics.ipynb

#### Programming for reusability
* https://intermediate-and-advanced-software-carpentry.readthedocs.io/en/latest/structuring-python.html

#### Creating Python Applications and Packages
* https://realpython.com/python-modules-packages/
* https://realpython.com/python-application-layouts/
* https://medium.com/small-things-about-python/lets-talk-about-python-packaging-6d84b81f1bb5

#### Python Programming Styles
* https://blog.newrelic.com/engineering/python-programming-styles/

## Assignment for March 8th, 2019 by noon

This submission consists of updating your GitHub repository with a focus on the code files. 

To update your GitHub repository, you need to add at least one script/notebook in the appropriate directory (e.g. scripts, notebooks). This script/notebook needs to contain:
* at least one custom function (e.g. a function you have defined for your workflow; not imported from an installed package/library; can be a combination of imported functions)
* appropriate documentation for the function (e.g. Python docstring, comments) to indicate purpose, inputs, and outputs
* some documentation at the top of the script/notebook (e.g. Python docstring, comments, Markdown) that explains the purpose of the script/notebook 
* documentation throughout the code (e.g. comments, Markdown) to walk users through the code

More details available on CANVAS.

In [34]:
import vegetation
