This lesson focus on:

* Handling erros
* Writing test and logs
* Model drift
* Automated vs. non-automated retraining

#### Catching Errors

In [1]:
import pandas as pd

def read_data(file_path):
    try:
        df = pd.read_csv(file_path)
        return df
    except FileNotFoundError:
        # do something else
        print('We were not able to find that file')
        # log there is an error
        # let user know something went wrong


In [2]:
df = read_data('somet_path')

We were not able to find that file


In [3]:
# Exercise: Catching Errors
def divide_vals(numerator, denominator):
    '''
    Args:
        numerator: (float) numerator of fraction
        denominator: (float) denominator of fraction

    Returns:
        fraction_val: (float) numerator/denominator
    '''
    try:
        fraction_val = numerator/denominator
        return fraction_val
    except ZeroDivisionError:
        return "denominator cannot be zero"


def num_words(text):
    '''
    Args:
        text: (string) string of words

    Returns:
        num_words: (int) number of words in the string
    '''
    try:
        num_words = len(text.split())
        return num_words
    except AttributeError:
        return "text argument must be a string"

In [7]:
print(divide_vals(10,5))
print(divide_vals(10,0))
print(num_words("this is a string"))
print(num_words(10))

2.0
denominator cannot be zero
4
text argument must be a string


#### Testing
Testing your code is essential before deployment. It helps you catch errors and
faulty conclusions before they make any major impact. Today, employers are
looking for data scientists with the skills to properly prepare their code for an
industry setting, which includes testing their code.

* Problems that could occur in data science aren’t always easily detectable; you
 might have values being encoded incorrectly, features being used inappropriately,
 or unexpected data breaking assumptions.

* To catch these errors, you have to check for the quality and accuracy of your analysis
in addition to the quality of your code. Proper testing is necessary to avoid unexpected
surprises and have confidence in your results.

* Test-driven development (TDD): A development process in which you write tests for tasks
before you even write the code to implement those tasks.

* Unit test: A type of test that covers a “unit” of code—usually a single
function—independently from the rest of the program.


#### Unit Testing: Advantages and Disadvantages
The advantage of unit tests is that they are isolated from the rest of your program,
and thus, no dependencies are involved. They don't require access to databases, APIs, or
other external sources of information. However, passing unit tests isn’t always enough
to prove that our program is working successfully. To show that all the parts of our
program work with each other properly, communicating and transferring data between them
correctly, we use integration tests. In this lesson, we'll focus on unit tests; however,
when you start building larger programs, you will want to use integration tests as well.

Using pytest:
You have to write a script that contains all your test which its name starts with
'test_...' to be recognized by pytest. Example `test_nearest.py`. All the tests functions
inside this script must start withe the word 'test_...' as well.

Summary:

* Create a test file starting with `test_`.
* Define unit test functions that start with `test_` inside the test file.
* Enter `pytest` into your terminal in the directory of your test file, and
it detects these tests for you.

#### Logging
Logging is valuable for understanding the events that occur while running your program.
For example, if you run your model overnight, and the results the following morning
are not what you expect, log messages can help you understand more about the context
in those results occurred.

Tips for Log Messages:

* Be professional and clear
```
Bad: Hmmm... this isn't working???
Bad: idk.... :(
Good: Couldn't parse file.
```
* Be concise and user normal capitalization
```
Bad: Start Product Recommendation Process
Bad: We have completed the steps necessary and will now proceed with the
recommendation process for the records in our product database.
Good: Generating product recommendations.
```
* Choose the appropriate level for logging
```
Debug: Use this level for anything that happens in the program.
Error: Use this level to record any error that occurs.
Info: Use this level to record all actions that are user driven or
system specific, such as regularly scheduled operations.
```
* Provide any useful information
```
Bad: Failed to read location data
Good: Failed to read location data: store_id 8324971
```

#### Model drift

When the features or parameters of a model no longer produce the results you
want on production data (model degradation occurs).

In these cases, you may need to retrain your model and launch a new version
of it to replace your existing model. This might mean:

* Finding new features
* Tuning your hyper-parameters
* Finding a new model altogether

#### Automated vs. Non-Automated retraining

Automated Retraining
* New models to replace existing on a timeline (week, month, or year)
* Fraud model is a good example. Fraudsters constantly updating tactics,
so you will need to update your models frequently to continue to catch them.
* Useful for simple updates

Non-Automated
* Stable results over time, so this model is unlikely to be updated vere often.
* Used for introducing new features or model architectures


### Key Terms
* Try-except blocks - are used to check code for errors.
Try will execute if no errors occur.
* Testing - checking that the outcome of your software matches the expected requirements
* Logging - tracking your production code for informational, warning, and error
 catching purposes
* Model drift - the change in inputs and outputs of a machine learning model
over time
* Automated retraining - the automated process of updating production machine
learning models
* Non-automated retraining - a human-centered process of updating production
machine learning models