# <span style="color:purple">Writing Resilient Code</span>

## Data Skills for Empirical Research

### Winter, 2021

## <span style="color:purple">What can go wrong with your code? </span>

<br>
<br>

<center><img src="../figures/wrong.png" width="70%" style='border:5px solid #000000'/></center>
<center> https://www.xkcd.com </center>

## <span style="color:purple">What can go wrong with your code? </span>
<br>
* Bugs (code crashes, brittle to unexpected inputs)
<br>
<br>
* Code "works", but gives incorrect results
<br>
<br>
* Cannot reliably and automatically generate the same results each time
<br>
<br>
* External resources, like code dependencies and data change outside your control
<br>
<br>
* Code is slow and/or uses a lot of memory
<br>
<br>
* Code is hard to understand
<br>
<br>
* Code is hard to change

## <span style="color:purple">Ways to make your code resilient </span>

<center><img src="../figures/resilience.jpg" width="25%"/></center>

<br>

 * Protect Against System Crashes (save intermediate output, elegant restarts)

* Creating and Reproducing Environments (conda, docker, singularity)

* Error Handling

* Unit Testing

* Version Control with Github

## <span style="color:purple">Oh No! A KLC Node Crashed. </span>
<br>
Any suggestions for how to make your code resilient to a node crash?

## <span style="color:purple">Save Intermediate Output </span>
<br>

* Print statements or logs to a file to tell you where you left off.

* Save html output one at a time.

* Save output to csv file or database as it is collected.

## <span style="color:purple">Save Intermediate Output </span>
sleeper.py (from Week 1)

In [None]:
#######################################
# Download Yahoo Finance Summary Pages
#######################################

# libraries used
import time
import requests

# Input file
tickerList = ["AMZN", "AAPL", "FB"]

# Open page and save html
for tick in tickerList:
    tick = tick.strip()
    page = 'https://finance.yahoo.com/quote/' + str(tick)
    path = str(tick) + '.html'
    time.sleep(3)
    page = requests.get(page)
    with open(path, "wb") as f:
        f.write(page.content)
    print("At " + time.strftime("%X") + ", we successfully saved " + str(path) + ".")            

## <span style="color:purple">Elegant Restarts to Python Code </span>

In [None]:
# forever shell script
#!/usr/bin/python
from subprocess import Popen
import sys

filename = sys.argv[1]
while True:
    print("\nStarting " + filename)
    p = Popen("python " + filename, shell=True)
    p.wait()

## <span style="color:purple">Create and Launch Shell Script</span>

```bash
$ nano forever

< Enter file contents >
< Control+x to save file >

$ chmod +x forever

$ ./forever <example_file>.py
```

## <span style="color:purple">Creating and Reproducing Environments</span>

<br>
Conda environments are easy and cheap to create and delete.

In [1]:
! conda env list

# conda environments:
#
base                     /Users/willthompson/anaconda3
airflow                  /Users/willthompson/anaconda3/envs/airflow
cafrs                    /Users/willthompson/anaconda3/envs/cafrs
cycling                  /Users/willthompson/anaconda3/envs/cycling
dataproc                 /Users/willthompson/anaconda3/envs/dataproc
edgar-env                /Users/willthompson/anaconda3/envs/edgar-env
gcp                      /Users/willthompson/anaconda3/envs/gcp
interview-env            /Users/willthompson/anaconda3/envs/interview-env
mlearn                   /Users/willthompson/anaconda3/envs/mlearn
nlp-env                  /Users/willthompson/anaconda3/envs/nlp-env
ocr-review               /Users/willthompson/anaconda3/envs/ocr-review
patent                   /Users/willthompson/anaconda3/envs/patent
practical-nlp            /Users/willthompson/anaconda3/envs/practical-nlp
test-env                 /Users/willthompson/anaconda3/envs/test-env
textractor

Notice how many packages there are, so many opportunities for something to change and potentially break your code! If you're using a package, try to find ones with a sizable support community, not one-offs from an undergraduate class project.

In [None]:
! conda list

Tip: export your (pinned) dependencies to a file. You can use this to re-create your environment reproducibly, anwhere, and any number of times.

In [None]:
! conda env export --from-history

In [None]:
! conda env export --from-history | grep -v "^prefix: " > environment.yml
! sed -i '' 's/workshop-env/test-env/g' environment.yml
! cat environment.yml

In [None]:
! conda env create -f environment.yml

In [None]:
! conda env list

In [None]:
! conda env remove -n test-env

## <span style="color:purple">Configuring with Containers</span>

<br>
<center><img src="../figures/reproducibility.png" width="100%"/></center>

<center><img src="../figures/docker.png" width="100%"/></center>
<br>
<center> https:docker.com </center>

## <span style="color:purple">Error/Exception Handling</span>
Error handling increases the robustness of your code guarding against uncontrolled exits
<br>
<br>
How do you implement error handling in your code? Any examples? 


## <span style="color:purple">Error Handling in Python </span>

<center><img src="../figures/py_error.png" width="85%"/></center>

<br>

## <span style="color:purple">Python Error Handling Example 1</span>

In [23]:
def python_fun(num):
    try:
        print("trying to divide by 0")
        num/0
        print("Infinity and beyond!")
    except ZeroDivisionError:
        print("Can't do that.")
    finally:
        print("Time to clean up this mess")
        
python_fun(0)

trying to divide by 0
Can't do that.
Time to clean up this mess


## <span style="color:purple">Python Error Handling Example 2</span>

In [9]:
# from yahoo_main.py file
import os

coList = ["AAPL", "AMZN","MSFT"]
for co in coList:

    # create directories for each ticker
    #dir_path = os.path.dirname(os.path.realpath(__file__))
    #dir_path = str(dir_path) + "/" + str(co)
    dir_path = "company/" + str(co)
    print(dir_path)

    try:
        os.mkdir(dir_path)
    except OSError:
        print ("Creation of the directory failed. %s already exists" % dir_path)
    else:
        print ("Successfully created the directory %s " % dir_path)

company/AAPL
Creation of the directory failed. company/AAPL already exists
company/AMZN
Creation of the directory failed. company/AMZN already exists
company/MSFT
Creation of the directory failed. company/MSFT already exists


## <span style="color:purple">Error Handling in R </span>

<center><img src="../figures/r_error.png" width="85%"/></center>

<br>

## <span style="color:purple">Error Handling in R</span>



In [3]:
import rpy2

In [4]:
%load_ext rpy2.ipython

In [10]:
%%R
inputs <- list(1, 2, -5, "oops", 10, 0)
for(input in inputs) {
  print(paste("log of", input, "=", log(input)))
  }

[1] "log of 1 = 0"
[1] "log of 2 = 0.693147180559945"
[1] "log of -5 = NaN"


R[write to console]: Error in log(input) : non-numeric argument to mathematical function

R[write to console]: In addition: 

R[write to console]: In log(input) :
R[write to console]:  NaNs produced




Error in log(input) : non-numeric argument to mathematical function


## <span style="color:purple">Error Handling in R - Try</span>



In [11]:
%%R
for(input in inputs) {
  try(print(paste("log of", input, "=", log(input))))
  }

[1] "log of 1 = 0"
[1] "log of 2 = 0.693147180559945"
[1] "log of -5 = NaN"


R[write to console]: Error in log(input) : non-numeric argument to mathematical function

R[write to console]: In addition: 

R[write to console]: In log(input) :
R[write to console]:  NaNs produced



[1] "log of 10 = 2.30258509299405"
[1] "log of 0 = -Inf"


## <span style="color:purple">Error Handling in R - Trycatch</span>



In [13]:
%%R
log_calculator <- function(x){
  tryCatch(
    # [Required] R code(s) to be evaluated
    expr = {
      message(log(x))
      message("Successfully executed log(x).")
    },
    # [Optional] what should run if error occured while evaluating expr
    error = function(e){
      message('Caught an error!')
      print(e)
    },
    # [Optional] what should run if warning occured while evaluating expr
    warning = function(w){
      message('Caught a warning!')
      print(w)
    },
    # [Optional] what should run before quitting Trycath call, regardless of what happens
    finally = {
      message('All done, quitting.')
    }
  )    
}

In [14]:
%%R
log_calculator(10)

R[write to console]: 2.30258509299405

R[write to console]: Successfully executed log(x).

R[write to console]: All done, quitting.



In [15]:
%%R
log_calculator(-10)






R[write to console]: All done, quitting.



In [16]:
%%R
log_calculator("a")

R[write to console]: Caught an error!



<simpleError in log(x): non-numeric argument to mathematical function>


R[write to console]: All done, quitting.



## <span style="color:purple">Unit Testing</span>
<br>
<br>
Testing writes code (separate from application code) that invokes the code it tests to determine if there are any errors.  
<br>
<br>
It does not prove code is correct, but rather reports if the conditions provided in the tester are handled correctly
<br>
<br>
Unit tests specifically test a single “unit” of the code in isolation.  For instance, testing a single class, function, module etc. 



## <span style="color:purple">Unit Tests in Python</span>

In [11]:
def test_extract_doclevel_form3_collection(test_form3_collection):
    """
    Validate Form3 extraction code against a random sample of documents
    :param test_form3_collection:
    :return:
    """
    for file in test_form3_collection.glob("*.txt"):
        doc = Form3(file)
        assert doc.filename == file.name
        fields = doc.doc_info
        assert len(fields) == 19
        assert fields["filename"] == file.name
        assert fields["schema_version"] == "X0206"
        assert fields["document_type"] == "3"


## <span style="color:purple">Unit Tests in R</span>
Create a file named increment.R

In [17]:
%%R
increment <- function(value) {
  value + 1
}

In [19]:
%%R
#install.packages("testhat", repos='http://cran.us.r-project.org', quiet=TRUE)
library(testthat)

In [20]:
%%R
source("../examples/increment.R", chdir = TRUE)

In [21]:
%%R
test_that("single number", {
  expect_equal(increment(-1), 0)
  expect_equal(increment(0), 1)
})

test_that("vectors", {
  expect_equal(increment(c(0,1)), c(1,2))
})

test_that("empty vector", {
  expect_equal(increment(c()), c())
})

test_that("test NA", {
  expect_true(is.na(increment(NA)))
})


[32mTest passed[39m 🎉
[32mTest passed[39m 🎊
── [1m[33mFailure[39m (???): empty vector[22m ─────────────────────────────────────────────────
increment(c()) not equal to c().
Types not compatible: double is not NULL

[32mTest passed[39m 🌈


Inside Rstudio
<br>
test_file("../examples/unit_testing.R")

## <span style="color:purple">Version Control using Git and Github</span>

<center><img src="../figures/git-workflow.png" width="100%"/></center>