# Software engineering practices part II

In [1]:
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
from IPython.core.interactiveshell import InteractiveShell 
InteractiveShell.ast_node_interactivity = "all"

Adding robustness with 

- Testing 
- Logging
- Code reviews

# Testing 

Skills to properly prepare code for an industry setting includes testing code
- **Test driven development:** A development process where you write tests for tasks before writing the code to implement those tasks. 
- **Unit Test:** A type of test that covers a "unit" of code, usually a single function, independently from the rest of the program.

Resources: 

- [Blog Post](https://www.predictiveanalyticsworld.com/patimes/four-ways-data-science-goes-wrong-and-how-test-driven-data-analysis-can-help/6947/)
- Getting started Testing: [Slide Deck](https://speakerdeck.com/pycon2014/getting-started-testing-by-ned-batchelder) and [Presentation Vide](https://www.youtube.com/watch?v=FxSsnHeWQBY)

#### Unit Tests

- Test functions in a way that is repeatable and automated. 
- Ideally, we'd run a test program that runs all our unit tests and cleanly lets us know which ones failed and which ones succeeded. 
- Advantage: Unit tests are isolated from the rest of the program
- Note that passing unit tests isn't always enough to prove that our program is working successfully. 
- [Integration Testing](https://www.fullstackpython.com/integration-testing.html)

#### Unit Testing Tools 

- [pytest: Installation and getting started](https://docs.pytest.org/en/latest/getting-started.html)

> **Task:** run `pytest` in command line to evaluate all files with `test_` 

Structure of the tests: 

- Write a function that has a specific task in one file
```python
def days_until_launch(current_day, launch_day):
    return launch_day - current_day
```
- add another file which defines a new function in which there is an `assert` statement to test the function 

```python
# import function to be tested
from compute_launch import days_until_launch

def test_days_until_launch_4():
    assert(days_until_launch(22,26) == 4)
```

- run `pytest`in console.

---


# Quiz : Unit test

In [2]:
### Workflow example ###

# valid email
def email_validator(email):
    if email.count("@") != 0 and email.count(".") != 0:
        return True
    else:
        return False
    
# start testing
print(email_validator("tmeiendresch@gmx.de"))
print(email_validator("tmeiendresch@gmx"))
print(email_validator("tmei@endresch@gmx..de")) # <- wait!

# ...Correct function to account for this error...

# valid email v2
def email_validator(email):
    if email.count("@") == 1 and email.count(".") == 1:
        return True
    else:
        return False
    
# start testing v2
print(email_validator("tmeiendresch@gmx.de"))
print(email_validator("tmeiendresch@gmx"))
print(email_validator("tmei@endresch@gmx..de")) # <- wait!

# ... and so forth.

True
False
True
True
False
False


In [33]:
def email_validator(email):
    if email.count("@") == 1 and email.count(".") == 1:
        return True
    else:
        return False
def test_email_validator():
    assert email_validator("tmeiendresch@gmx.de")==True
    assert email_validator("tmeiendresch@gmx")==False
    assert email_validator("tmei@endresch@gmx..de")==False
    print ("Ok")
test_email_validator()
    

Ok


# Test Driven Development and Data Science

**Test Driven Development:** Writing tests before you write the code that's being tested. Your test would fail at first, and you'll know you've finished implementing a task when this test passes. 
- Tests ought to check for all the differnt scenarios and edge cases you can think of, before even starting to write your function. 
- Run this test to get immediate feedback
- When refactoring or adding to your code, tests help you rest assured that the rest of your code didn't break while you were making those changes. Test also helps ensure that your function behavior is repeatable, regardless of external parameters, such as hardware and time. 

Resources: 

- [Data Science TDD](https://www.linkedin.com/pulse/data-science-test-driven-development-sam-savage/)
- [TDD for Data Science](http://engineering.pivotal.io/post/test-driven-development-for-data-science/)
- [TDD is Essential for Good Data Science Here's Why](https://medium.com/@karijdempsey/test-driven-development-is-essential-for-good-data-science-heres-why-db7975a03a44)
- [Testing your code](http://docs.python-guide.org/en/latest/writing/tests/)

# Logging

> Logging is the process of recording messages to describe events that have occurred while running your software.

Logging is valuable for understanding the events that occur while running your program. For example, if you run your model over night and see that it's producing ridiculous results the next day, log messages can really help you understand more about the context in which this occurred. Lets learn about the qualities that make a log message effective.



## Log Messages

Tip: Be professional and clear

`Bad: Hmmm... this isn't working???
Bad: idk.... :(
Good: Couldn't parse file.`

---

Tip: Be concise and use normal capitalization

`
Bad: Start Product Recommendation Process
Bad: We have completed the steps necessary and will now proceed with the recommendation process for the records in our product database.
Good: Generating product recommendations.`

---

Tip: Choose the appropriate level for logging

*DEBUG* - level you would use for anything that happens in the program.
*ERROR* - level to record any error that occurs
*INFO* - level to record all actions that are user-driven or system specific, such as regularly scheduled operations

---

Tip: Provide any useful information

`
Bad: Failed to read location data
Good: Failed to read location data: store_id 8324971
`