
# Testing


<div class="slide-title">
    
# Testing

</div>

## Testing your code

<div class="group">
    <div class="text_70">
        

Testing production grade code is hard. And you can never cover all outcomes.
        
There are multiple types of tests, some rough definitions:
        
→ **Unit Tests**: Testing "small" and encapsulated parts of the
application. Most of the time this refers to explicitly testing a
function written as part of the package
        
→ **Integration Tests / Service Tests**: Testing dependencies
between parts of the Software. For example the interaction
with databases, APIs etc.
        
→ **End2End / UI Test**: Testing the working application starting
from the end. For example checking that an interaction in the UI
will lead to the expected results
    </div>
    <div class="images_30">
        <img src="../images/testing/img_p1_1.png">
    </div>
</div>

Notes: the difference between unit tests and integration tests is very fluid

## Why should we test?

<div class="group">
    <div class="text">
        
**→ Time spend with debugging**
        
**→ Reduce technical debt**

**→ Enable refactoring** 
</div>
    
<div class="text">
    
**→ Writing better code**
        
**→ Document what code really does**

**→ Improve the deployment workflow**
    </div>
</div>

**Time:** 

Debugging: Finding errors in your code is very time consuming for you and for other developers in your team that look over your pull request.

**Better code:**

Test let you think about your code before you write it. What is the functionality? What parameters should you expect? Why are you writing the code in a specific way? How do you think the code could break?

**Documentation:**

Tests must necessarily be kept up-to-date with code modifications. Else they cannot pass.

**Refactor:**

Tests are a safety net for refactoring. If somebody at some point want to refactor code without breaking it you need tests to be sure that it kept the intended purpose.

**Deployment:**

When you start to track how the new and refactored code is actually tested, you relieve yourself from a lot of stress and frictions once the code is deployed.



### Finding test cases

<div class="group">
    <div class="text_70">
                
How to come up with tests:
        
→ Think about how your function will be **used**

→ Think about how it could be **misused (edge cases)**

→ And don’t forget to test the **“happy”** cases as well (**“perfect” inputs**)
    </div>
    <div class="images_30">
        <img src="../images/testing/img_p3_2.png">
    </div>
</div>

<div class="alert alert-block alert-info">
<b>Note:</b> 

You should write your tests with all outcomes in mind: Good and bad. You will probably never cover
all edge cases!

</div>

## TDD



Notes: test driven development, most of the time functions

### Writing tests that would check the functionality of your code prior to writing the actual code

### Test Driven Development

<div class="group">
    <div class="text">
        
It helps you plan out the work ahead

You can divide it into 3 Phases:

→ **Red Phase** - write the test or tests to validate the functionality

→ **Green Phase** - implement the simplest code that will make the failed test pass

→ **Refactor Phase** - improve the code without changing the functionality
    </div>
    <div class="images">
        <img src="../images/testing/img_p6_2.png">
    </div>
</div>

<div class="alert alert-block alert-info">
<b>Note:</b> 
    
When you start writing code follow the KISS principle: Keep it Simple, Stupid.

</div>

## Unit Tests


### Testing "small" and encapsulated parts of the application.



Notes: most of the times functions

### Unit Testing

Tests should tell you the expected behavior of the unit. Keep them short and to the point.

The GIVEN, WHEN, THEN structure can help with this:

→ **GIVEN** - what are the initial conditions for the test?

→ **WHEN** - what is occurring that needs to be tested?

→ **THEN** - what is the expected response?


<div class="alert alert-block alert-info">
<b>Note:</b> 

You should prepare your environment for testing, execute the behavior, and check that output meets
expectations.

</div>

### Unit Testing

<div class="group">
    <div class="text_70">
        
Each piece of behavior should be tested once -- and only once.
        
Why is that?
        
→ If you make a small change to your code base and then twenty tests break, how do you know which functionality is broken?
        
→ When only a single test fails, it's much easier to find the bug.
    </div>
    <div class="images_30">
        <img src="../images/testing/img_p10_1.png">
    </div>
</div>

<div class="alert alert-block alert-info">
<b>Note:</b> 

Write tests for each piece of code, to check if it gives back the expected results and to make it easier
to find mistakes.

</div>


### Unit Testing

Each test must be independent from other tests.

Rules for creating tests:

→ Must be able to run alone

→ The order of the tests should not matter

→ Use descriptive names for testing functions.

<div class="alert alert-block alert-info">
<b>Note:</b> 

These imply that each test must be loaded with a fresh dataset and may have to do some cleanup
afterwards

</div>

### Why is it important that tests are independent from each other?

### Unit Test Example
Let's look at an example:


<center>
    <img src="../images/testing/img_p13_1.png" width=800>
</center>

<div class="alert alert-block alert-info">
<b>Note:</b> 

Assertions let you write sanity checks in your code. You can use these to test if certain assumptions
are true or false.

</div>


Notes: What do you think the name of the function is that we want to test? What do you think is the purpose of the function?

### Unit Test Example

Let's look at an example:

<center>
    <img src="../images/testing/example_unit_test.png" width=1000>
</center>



<div class="alert alert-block alert-info">
<b>Note:</b> 

Based on the test we could assume how the code looked like or we could write a function based on
the test that solves the same problem.

</div>


Notes: What do you think the name of the function is that we want to test? What do you think is the purpose of the function?

### Libraries for testing

<div class="group">
    <div class="text_70">
        
Libraries that we can use
        
There are a few:
        
→ **unittest**: Comes as standard library with python
        
→ **doctest**: Comes as standard library with python
        
→ **pytest**: The most used testing library
    </div>
    <div class="images_30">
        <img src="../images/testing/img_p15_2.png">
        <img src="../images/testing/img_p15_1.png">
    </div>
</div>

<div class="alert alert-block alert-info">
<b>Note:</b> 

In the repo we will show you examples for pytest!

</div>

### Naming your tests

<div class="group">
    <div class="text">
        
Pytest is looking for tests by subfolder name, file name and function name
        
→ **Subfolder name**: tests
        
→ **File name**: test_something.py
        
→ **Function name**: test_your_function_name()
    </div>
    <div class="images">
        <img src="../images/testing/img_p14_2.png">
        <img src="../images/testing/img_p16_1.png">
    </div>
</div>

<div class="alert alert-block alert-info">
<b>Note:</b> 

You have to import the functions you want to test into your test python file!

</div>

### Running pytest 

<div class="group">
    <div class="text_70">      
How passed and failed tests look like:
        
`python -m pytest -q tests/test_something.py`
    </div>
    <div class="images_30">
        <img src="../images/testing/img_p17_3.png" width=250>
    </div>
</div>

<center>
    <img src="../images/testing/passed_failed_test.png">
</center>

Notes: -q -> -s for more output

## Integration Test


### Testing dependencies and interactions between parts of the Software


### Integration Test

<div class="group">
    <div class="text_70">
        
What does integration testing involve?

Characteristics of creating integration tests:

→ integrating the various modules of an application

→ testing their behaviour as a combined, or integrated, unit

→ Verifying if the individual units are communicating with each other properly
    </div>
    <div class="images_30">
        <img src="../images/testing/img_p20_1.png" width=300>
    </div>
</div>



<div class="alert alert-block alert-info">
<b>Note:</b> 

To perform integration testing, testers use dummy programs that act as substitutes for any missing
modules and simulate data communications between modules for testing purposes.

</div>

<center>
    <img src="../images/testing/integration_tests.png" width=1000>
</center>

<div class="group">
    <div class="text">
        
**Unit tests**
        
* Smallest piece of code, or unit, is tested
* Each unit can be logically isolated
* **Individual modules are tested**
    </div>
    <div class="text">
        
**Integration tests**
        
* check the functionality of the overall application
* combined, or integrated, unit        
* **Modules are tested as a combined unit**
    </div>
</div>

### Reasons for Integration Testing

<div class="group">
    <div class="text_70">
        

Why integration testing is essential:

→ Integrating different modules into a working application

→ Ensuring that changing requirements are incorporated into the application

→ Eliminating common issues missed during unit testing
    </div>
    <div class="images_30">
        <img src="../images/testing/img_p22_1.png">
    </div>
</div>

<div class="alert alert-block alert-info">
<b>Note:</b> 

Even when each module of the application is unit-tested, some errors may still exist. To identify these
errors and ensure that the modules work well together after integration, integration testing is crucial.

</div>

Notes: 1. When different developers work on different modules, individuals bring their own understanding and logic to the development effort 2.In many real-time application scenarios, requirements can and do change often. These new requirements may not be unit-tested every time, which may lead to missed defects or missing product features. 3.Some modules that interact with third-party application program interfaces (APIs) need to be tested to ensure they function properly.

## Testing Data


### Why should Data Scientists write tests?

### Testing in Data Science

Is it unit testing or integration testing?

What can we write tests for in Data Science?

→ Code that we turn from notebook cells into python files 

→ Data integrity tests: Do our data transformations introduce any errors in the data?

→ Data quality tests: Does the data in our system meet our needs?

<div class="alert alert-block alert-info">
<b>Note:</b> 

Data tests can be unit tests where we test the functions that we use to transform our data but
quickly become integration tests when we look at the outcome of a collection of transformations.

</div>

<center>
    <img src="../images/testing/testing_data.png" width=1000>
</center>

<div class="group">
    <div class="text">
        
**Code tests**
        
Test whether your functions, classes, modules, or services do what you want them to.
    </div>
    <div class="text">
**Data tests**
        
Test whether your data is in the right format and your data values are correct.
    </div>
</div>    
         

### Testing in Data Science Example

<div class="group">
    <div class="text">
        
Let's say you want to impute NaN values with the mean:
        
We are calculating the mean of a pandas series and fill the NaN values with it using fillna().
    </div>
    <div class="images">
        <img src="../images/testing/img_p28_1.png">
    </div>
</div>


### Testing in Data Science Example

<div class="group">
    <div class="text">
        
We can now (or before) write a test that checks if the returned series looks as expected:
        
pandas has a build in module called testing, which we can use to compare two pandas data frames or two
pandas series.
    </div>
    <div class="images">
        <img src="../images/testing/img_p29_1.png">
    </div>
</div>

### Typical Edge Cases in Data Science

What are typical edge cases?

→ NaN and None as input values

→ 0 values and empty strings

→ Minimum and maximum values

→ numbers that have special meaning in the function (e.g. constants)

→ Invalid inputs (e.g. int vs float)

→ Negative numbers

... and all sorts of random combinations of edge cases together



Notes: keep it simple: Every function should only do one thing: Can you describe what it does in one sentence without using “AND” or “OR”? Use only a few input parameters (0 - 2) and simple inputs (no 1000 rows, 20 columns dataframe!) Keep the function code short: 1-10 lines is ideal. If your function ends up being longer, break it down into subfunctions and test them separately

### Why and when would you use tests in your workflow?

<center>
    <img src="../images/testing/img_p31_1.png" width=800>
</center>