# <span style="color:blue">Bulletproofing Code</span>

## What could go wrong? Unfortunately, lots!

![](images/worst_thing_that_could_happen.png)

## <span style="color:darkorange">Potential Problems: </span>
- Bugs (code crashes, brittle to unexpected inputs)
- Code "works", but gives incorrect results
- Cannot reliably and automatically generate the same results each time
- External resources, like code dependencies and data change outside your control
- Code is slow and/or uses a lot of memory
- Your code is hard to understand
- Your code is hard to change

## <span style="color:darkorange">Make your code work first before trying to optimize it</span>

![](images/knuth.jpg)

# <span style="color:blue">Example: Extracting Information from Earnings Calls</span>

In [1]:
from kelloggrs.load_data import *
from sqlalchemy import Table, MetaData, inspect
from pathlib import Path
import pandas as pd

In [2]:
with open(Path("../tests/config.yaml")) as conf_file:
    config = yaml.load(conf_file, Loader=yaml.FullLoader)
engine = create_database(config)
engine = insert_records(Path(config["data_path"]) / "Tesla", engine)
engine = insert_records(Path(config["data_path"]) / "GM", engine)

In [3]:
inspector = inspect(engine)

# Get table information
print(inspector.get_table_names())

# Get column information
columns = inspector.get_columns('Transcript')
for c in columns:
    print(c)

['Component', 'Transcript']
{'name': 'transcriptid', 'type': BIGINT(), 'nullable': True, 'default': None, 'autoincrement': 'auto', 'primary_key': 1}
{'name': 'keydevid', 'type': BIGINT(), 'nullable': False, 'default': None, 'autoincrement': 'auto', 'primary_key': 0}
{'name': 'companyid', 'type': BIGINT(), 'nullable': False, 'default': None, 'autoincrement': 'auto', 'primary_key': 0}
{'name': 'companyname', 'type': VARCHAR(length=100), 'nullable': False, 'default': None, 'autoincrement': 'auto', 'primary_key': 0}
{'name': 'transcriptcreationdate', 'type': VARCHAR(length=20), 'nullable': False, 'default': None, 'autoincrement': 'auto', 'primary_key': 0}
{'name': 'mostimportantdate', 'type': VARCHAR(length=20), 'nullable': True, 'default': None, 'autoincrement': 'auto', 'primary_key': 0}


result = engine.execute("select * from Component")
for row in result:
    print(row)

In [4]:
transcript_df = pd.read_sql_table("Transcript", con=engine)
component_df = pd.read_sql_table("Component", con=engine)

In [5]:
transcript_df

Unnamed: 0,transcriptid,keydevid,companyid,companyname,transcriptcreationdate,mostimportantdate
0,1579439,587893061,27444752,"Tesla, Inc.",2018-10-26,2018-10-24
1,1528908,574506795,27444752,"Tesla, Inc.",2018-08-09,2018-08-01
2,892475,314636729,27444752,"Tesla, Inc.",2015-11-04,2015-11-03
3,163185,130475124,27444752,"Tesla, Inc.",2011-08-26,2011-08-10
4,1193927,427742222,27444752,"Tesla, Inc.",2017-05-06,2017-05-03
...,...,...,...,...,...,...
132,837822,304527258,61206100,General Motors Company,2015-07-24,2015-07-23
133,1637234,590848330,61206100,General Motors Company,2019-01-18,2019-01-16
134,1807136,632163214,61206100,General Motors Company,2019-08-17,2019-08-13
135,21969,6597551,61206100,General Motors Company,2009-12-10,2009-05-11


In [6]:
component_df

Unnamed: 0,transcriptid,componentid,componenttypename,text,componentorder,personname,companyofperson
0,1579439,62319690,Presenter Speech,Correct. Correct. The company here had a 4-mon...,35,Laurie Shelby,
1,1579439,62319781,Answer,It's the same amount of service. So same cost.,126,Deepak Ahuja,
2,1579439,62319772,Question,"I have one for Deepak and then a follow-up, pl...",117,Toni Sacconaghi,
3,1579439,62319777,Question,"And then to follow up, I was just wondering if...",122,Toni Sacconaghi,
4,1579439,62319666,Presenter Speech,"Yes, exactly. So if you look at there, the tri...",11,Madan Gopal,
...,...,...,...,...,...,...,...
11440,1207869,49473055,Question,How does GM protect its brands when you're tal...,17,Rod Lache,
11441,1207869,49473046,Answer,"Yes, I'd say 2 things, back to my point about ...",8,Peter Kosak,
11442,1207869,49473061,Answer,"Yes, not at all. In fact, I think that a lot o...",23,Peter Kosak,
11443,1207869,49473048,Answer,"Yes. I mean, purely as a matter of control and...",10,Peter Kosak,


# <span style="color:blue">Testing, Error Detection, and Profiling</span>

![](images/165-minor-change.png)

## Testing

In [7]:
result = engine.execute("select count(*) as cnt from Transcript")
row = result.fetchone()
print(f"Number of Transcripts: {row['cnt']}")
assert row["cnt"] == 137

Number of Transcripts: 137


In [8]:
result = engine.execute("select count(*) as cnt from Component")
row = result.fetchone()
print(f"Number of Components: {row['cnt']}")
assert row["cnt"] == 11445

Number of Components: 11445


![](images/project-structure.png)

## Exception Handling

In [9]:
try:
    print("trying divide by 0")
    100/0
    print("Infinity and beyond!")
except ZeroDivisionError:
    print("Can't do that.")
finally:
    print("Time to clean up this mess")

trying divide by 0
Can't do that.
Time to clean up this mess


## Profiling

In [10]:
import spacy
nlp = spacy.load("en_core_web_md")

In [11]:
answers_df = component_df.loc[component_df['componenttypename'] == "Answer"]
texts = list(answers_df['text'])[:500]

In [12]:
%%time
docs = []
for text in texts:
    docs.append(nlp(text))

CPU times: user 6.39 s, sys: 315 ms, total: 6.71 s
Wall time: 6.71 s


In [13]:
%%time
docs = []
with nlp.disable_pipes('tagger', 'parser'):
    for text in texts:
        docs.append(nlp(text))

CPU times: user 1.84 s, sys: 148 ms, total: 1.99 s
Wall time: 1.99 s


### Example from: https://realpython.com/numpy-array-programming/

In [14]:
import numpy as np
np.random.seed(444)

In [15]:
x = np.random.choice([False, True], size=100000)
x

array([ True, False,  True, ...,  True, False,  True])

In [16]:
def count_transitions(x) -> int:
    count = 0
    for i, j in zip(x[:-1], x[1:]):
        if j and not i:
            count += 1
    return count

In [17]:
%timeit count_transitions(x)

5.63 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [18]:
%timeit np.count_nonzero(x[:-1] < x[1:])

75.8 µs ± 555 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [19]:
%load_ext memory_profiler

In [20]:
%%memit 
import numpy as np
np.count_nonzero(x[:-1] < x[1:])

peak memory: 946.52 MiB, increment: -0.49 MiB


## Configuring your Code Dependencies

Conda environments are cheap to create and easy to delete

In [21]:
! conda env list

# conda environments:
#
base                     /Users/willthompson/anaconda3
aws                      /Users/willthompson/anaconda3/envs/aws
cafrs                    /Users/willthompson/anaconda3/envs/cafrs
cluedo                   /Users/willthompson/anaconda3/envs/cluedo
corelogic                /Users/willthompson/anaconda3/envs/corelogic
edx                      /Users/willthompson/anaconda3/envs/edx
gcp                      /Users/willthompson/anaconda3/envs/gcp
mturk                    /Users/willthompson/anaconda3/envs/mturk
ocr-review               /Users/willthompson/anaconda3/envs/ocr-review
patent-data              /Users/willthompson/anaconda3/envs/patent-data
pytorch-nlp              /Users/willthompson/anaconda3/envs/pytorch-nlp
textract                 /Users/willthompson/anaconda3/envs/textract
voilatest                /Users/willthompson/anaconda3/envs/voilatest
workshop-env          *  /Users/willthompson/anaconda3/envs/workshop-env



Notice how many packages there are, so many opportunities for something to change and potentially break your code! If you're using a package, try to find ones with a sizable support community, not one-offs from an undergraduate class project.

In [22]:
! conda list

# packages in environment at /Users/willthompson/anaconda3/envs/workshop-env:
#
# Name                    Version                   Build  Channel
appnope                   0.1.0                 py37_1000    conda-forge
asn1crypto                1.3.0                    py37_0    conda-forge
attrs                     19.3.0                     py_0    conda-forge
backcall                  0.1.0                      py_0    conda-forge
bleach                    3.1.1                      py_0    conda-forge
bzip2                     1.0.8                h0b31af3_2    conda-forge
ca-certificates           2019.11.28           hecc5488_0    conda-forge
cachetools                4.0.0                    pypi_0    pypi
catalogue                 1.0.0                      py_0    conda-forge
certifi                   2019.11.28               py37_0    conda-forge
cffi                      1.13.2           py37h33e799b_0    conda-forge
chardet                   3.0.4                 py37_1003

Tip: export your (pinned) dependencies to a file. You can use this to re-create your environment reproducibly, anwhere, and any number of times.

In [23]:
! conda env export --from-history

name: workshop-env
channels:
  - conda-forge
  - defaults
dependencies:
  - python==3.7
  - spacy
  - jupyterlab
  - sqlalchemy
  - pytest
  - pyyaml
  - memory_profiler
  - pandas
prefix: /Users/willthompson/anaconda3/envs/workshop-env



In [24]:
! conda env export --from-history | grep -v "^prefix: " > environment.yml
! sed -i '' 's/workshop-env/test-env/g' environment.yml

In [25]:
! conda env create -f environment.yml

Collecting package metadata (repodata.json): done
Solving environment: done
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate test-env
#
# To deactivate an active environment, use
#
#     $ conda deactivate



In [26]:
! conda env list

# conda environments:
#
base                     /Users/willthompson/anaconda3
aws                      /Users/willthompson/anaconda3/envs/aws
cafrs                    /Users/willthompson/anaconda3/envs/cafrs
cluedo                   /Users/willthompson/anaconda3/envs/cluedo
corelogic                /Users/willthompson/anaconda3/envs/corelogic
edx                      /Users/willthompson/anaconda3/envs/edx
gcp                      /Users/willthompson/anaconda3/envs/gcp
mturk                    /Users/willthompson/anaconda3/envs/mturk
ocr-review               /Users/willthompson/anaconda3/envs/ocr-review
patent-data              /Users/willthompson/anaconda3/envs/patent-data
pytorch-nlp              /Users/willthompson/anaconda3/envs/pytorch-nlp
test-env                 /Users/willthompson/anaconda3/envs/test-env
textract                 /Users/willthompson/anaconda3/envs/textract
voilatest                /Users/willthompson/anaconda3/envs/voilatest
workshop-env          *  /Users/willt

In [27]:
! conda env remove -n test-env


Remove all packages in environment /Users/willthompson/anaconda3/envs/test-env:



## Configuring the Entire Environment

![](images/horizontal-logo-monochromatic-white.png)

![](images/container-what-is-container.png)

https://www.docker.com/

In [28]:
! docker build .. -t workshop:latest
# Open terminal to docker image: docker run -i -t workshop:latest /bin/bash

Sending build context to Docker daemon  11.38MB
Step 1/10 : FROM continuumio/miniconda3
 ---> 406f2b43ea59
Step 2/10 : WORKDIR /home/root/work
 ---> Using cache
 ---> b0a37c074537
Step 3/10 : ADD environment.yml /tmp/environment.yml
 ---> Using cache
 ---> 8d02f12f4f46
Step 4/10 : RUN conda update -n base -c defaults conda
 ---> Using cache
 ---> 91bd6194f6ba
Step 5/10 : RUN conda env create -f /tmp/environment.yml
 ---> Using cache
 ---> 53d0844f16d2
Step 6/10 : RUN echo "source activate $(head -1 /tmp/environment.yml | cut -d' ' -f2)" > ~/.bashrc
 ---> Using cache
 ---> 526272c02216
Step 7/10 : ENV PATH /opt/conda/envs/$(head -1 /tmp/environment.yml | cut -d' ' -f2)/bin:$PATH
 ---> Using cache
 ---> 4a3bb0e2421b
Step 8/10 : COPY --chown=root:root ./kelloggrs /home/root/work/kelloggrs
 ---> Using cache
 ---> 6b2634d23808
Step 9/10 : COPY --chown=root:root ./setup.py /home/root/work/setup.py
 ---> Using cache
 ---> 82dceb43d3a6
Step 10/10 : RUN pip install .
 ---> Using cache
 ---> 8e3