<div class="alert block alert-info alert">

# <center> Scientific Programming in Python
## <center>Karl N. Kirschner<br>Bonn-Rhein-Sieg University of Applied Sciences<br>Sankt Augustin, Germany

# <center> Scientific Programming Practices

<hr style="border:2px solid gray"></hr>


</b>Primary source</b>: [Wilson, G., Aruliah, D.A., Brown, C.T., Hong, N.P.C., Davis, M., Guy, R.T., Haddock, S.H., Huff, K.D., Mitchell, I.M., Plumbley, M.D. and Waugh, B., 2014. Best practices for scientific computing. PLoS biology, 12, 1-7.](https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1001745)

<hr style="border:2px solid gray"></hr>

"Computers are now essential in all branches of science, but most researchers are never taught the equivalent of basic lab skills for research computing. As a result, they ... are often unable to reproduce their own work ... and have no idea how reliable their computational results are." 

Wilson, Greg, Jennifer Bryan, Karen Cranston, Justin Kitzes, Lex Nederbragt, and Tracy K. Teal. "Good enough practices in scientific computing." PLoS computational biology 13, no. 6 (2017).

<hr style="border:2px solid gray"></hr>

# The 7 practices for programming within research settings

<a id='best_practice_people'></a>
## 1. Write programs for <b>people</b>, not computers

<b>Why</b>

1. The reader will only hold a few facts in memory at a time.

<b>How</b>

1. Break the program up into easy understandable chunks (i.e., functions).
2. Use names that are consistent, distinctive and meaningful (e.g., "density" vs. "d").
3. Use a consistent coding style
    - <b>PascalCaseNaming</b>
    - <b>camelCaseNaming</b>
    - <b>snake/pothole_case_naming</b>
    - <b>kebab-case-naming</b>
4. <b>PEP8</b>: "Function names should be lowercase, with words separated by underscores as necessary to improve readability."<br>
    (ie., </b>pothole_case_naming</b>)

<b>Example</b>

Poorly done:

In [None]:
print('Input velocity:')
l = 299.7e6
v = input()
print(f'{v} m/s is {(float(v)/l)*100:0.1e}%')

Improved upon:

In [None]:
speed_of_light = 2.99792458e8 # m/s

print('What velocity are you interested in (units: m/s)?')
input_velocity = input()

percentage = (float(input_velocity)/speed_of_light)*100

print(f'The input velocity of {input_velocity} m/s is {percentage:0.1e}% of the speed of light.')

<hr style="border:2px solid gray"></hr>

<a id='best_practice_work'></a>
## 2. Have the computer do the work

<b>Why</b>
    1. User repetition eventually results in errors, even with those who are careful

<b>How</b>
    1. Create code that does the repeating element
        
    2. Create code that allows sequential workflow usage and modulation
        
    3. Save commands to file for future use
        
    4. Ensure reproducibility
        - Everything needed to re-create the output should be clear
        - Standardize the output
        - Version control, even as simple as using a numbering system (e.g., v.0.1)

<b>Example</b>: Given the length of two triangles, get their total area. Print out all areas.

In [None]:
## Poor Example
area_total = 4.0*3.0 + 0.5*1.5

print(f'Rectangle Area 1: {4.0*3.0}')
print(f'Rectangle Area 2: {0.5*2.5}')
print(f'Total Area: {area_total}')

Why is it poorly done?
1. There is no single statement that will calculate the area of 1 rectangle
2. Not very modular, reusable, or very readable
3. Prone to human error - e.g., `print('Rectangle Area 2:', 0.5*`<font color='red'>2.5</font>)
4. Not logically planned out - e.g., total area computed first, and then individual areas

In [None]:
## Better Example
def rectangle_area(length: float, width: float) -> float:
    rect_area = length*width
    
    return rect_area


area_1 = rectangle_area(length=4.0, width=3.0)
area_2 = rectangle_area(length=0.5, width=1.5)

area_total = area_1 + area_2

print(f'Rectangle Area 1: {area_1}\n'
      f'Rectangle Area 2: {area_2}\n'
      f'Total Area:       {area_total}')

Why is this better?
1. A function that does <b> one thing </b> (i.e., isolates a single idea) that can be called multiple times (i.e., reusable)
2. Less prone to introducing errors due to the function
3. Easy to debug and modify
4. Better logical construction

<hr style="border:2px solid gray"></hr>

<a id='best_practice_concise'></a>
## 3. Don't repeat yourself, or others

<b>Why</b>
1. Repeating code makes things harder to maintain and increases the chances of introducing errors

<b>How</b>
1. DRY - "Don't Repeat Yourself," applying this to both code and data
    - One code representation per entity.
    
          Example 1: physical constants should be defined once
          
          Example 2: input raw data should be assigned to a single variable (i.e., not duplicated)
            
2. Modularize your code (e.g., user-defined functions)


3. Use reliable libraries made by others (Python is very good at this) - don't reinvent the wheel

<hr style="border:2px solid gray"></hr>

<a id='best_practice_incremental'></a>
## 4. Make small incremental (sequential) changes

<b>Why</b>
1. Typically, in scientific programming intended for research, the <b>result is not initially known</b> since each step depends on the previous one.

1. In research-related work, there is <b>not the idea of preset requirements</b> (e.g., from a company). Therefore, this gives one more <b>flexibility and creativity</b> in the programming and approach.

<b>How</b>
1. Work in small steps (e.g., <b>something doable in 1 hour - increased focus</b>)

1. Work on steps that are sequential (i.e., a <b>logically connected workflow</b>)

1. Have <b>frequent discussions and course corrections</b> (with those who will use the program)

1. Use a <b>version control system</b> (i.e., git) - also ensures reproducibility (i.e., important for science)

1. Use <b>unit tests</b> and <b>internal checks</b> - help to control errors and directs your logical thinking
    
1. Make <b>small changes</b> (especially with units tests) helps to quickly <b>isolate errors</b>

<b>Take-Home Message</b>
1. Ensures a more logical construction of the code and ideas (i.e., sequential and focused workflows)
    
1. Reduces errors (i.e., highly focused upon a single concept at a time)
    
1. Generates more flexible code (e.g., due to different users communicating different desires)

<!-- print(f'Area of circle with radius of 1.0 cm = {3.14*1.0*1.0} cm^2')
print(f'Area of circle with radius of 2.0 cm = {3.14*2.0*2.0} cm^2')
print(f'Area of circle with radius of 2.5 cm= {3.14*2.5*2.5} cm^2')
print(f'Area of circle with radius of 3.5 cm= {3.14*3.5*3.5} cm^2')
print()
print(f'The approximate weight for a tire with an area {3.14*1.0*1.0} cm^2 is {3.14*1.0*1.0*0.3} kg')
print(f'The approximate weight for a tire with an area {3.14*1.0*1.0} cm^2 is {3.14*2.0*2.0*0.3} kg.')
print(f'The approximate weight for a tire with an area {3.14*1.0*1.0} cm^2 is {3.14*2.5*2.5*0.3} kg.')
print(f'The approximate weight for a tire with an area {3.14*1.0*1.0} cm^2 is {3.14*3.5*3.5*0.3} kg') -->

<hr style="border:2px solid gray"></hr>

<a id='best_practice_document'></a>
## 5. Document the design and pupose (not the mechanics)

<b>Why</b>
1. Helps people understand the code - context
    - e.g. "This function computes ..."

2. Helps to maintain continuity (e.g. long-term projects)

<b>How</b>

1. Embed documentation in the code (helps with longevity and changes in people)
    1. <b>docstrings</b> (a.k.a. block quotes) (i.e. text within triple quotes: `'''` or `"""`)
        - Usage:
            - Python3 programs (e.g. <b>my_prog.py</b>): proving context throughout the code
            - Jupyter-notebooks' code cells: when providing context for a user-defined function

        - Results in
            - allowing others to get instructions for usage via (see code cells below)
                - `my_function.__doc__`, or
                - `help(my_function)`

        - Focus upon
            - what is the code's purpose/goals are,
            - what input is required (e.g. passed objects/variables), and
            - what output is given

        - However, don't be redundant
            - the code itself should be readable and speakable, and thus lessening the need for extensive documentation

        - Documentation generator that can read the code and make a manual (e.g. sphinx-doc: https://www.sphinx-doc.org/en/master/)

    2. <b>In-line comments</b>
        - Usage:
            - Python3 program and Jupyter-notebook code cells

        - If done well (concise, logical, readable), you should not need many in-line comments to explain your code.

            <b>Example for not needed</b>:<br>
            `if input <= threshold: #if input is less than or equal to a threshold then do` is readable, and thus this line does not need to be further explained by the in-line comment.

            <b>Example for when one is needed</b>:<br>
            `energy_total = 0.3*alpha + 0.7*beta # The weighting factors come from reference 3.`<br>
            This in-line comment clearly states the published source of the used weighting factors.  

        - Use if you think your very local coding idea (i.e. given on a single line) might be unclear, or if an unaddressed assumption/approximation needs to be explained.

2. Document your thought process, gained insights and to cite sources of information (i.e. existing knowledge)

    1. <b>markdown cells</b>
        - Usage:
            - Jupyter-notebooks: when providing context for a specific project, or to pass on information to other who might use or look at your notebook.

        - Consider markdown usage to be like the communication that occurs in traditional academic lab notebooks.
        
            <b>Example Statement within a Markdown Cell</b>:<br>
            "The above plot show several significant outliners. Thus, we extracted the outliers and correlated their values to their input features using sklearn's function (website). This lead to the identification of the following new categories: ... "

<b>Example</b>:

In [None]:
def rectangle_area(length, width):
    """ Computes a rectangle's area using its length and height.
    
        Args:    length    - the length of the first edge
                 height    - the length of the second edge
                             (must be at a right angle to the first edge)
        Return:  rect_area - the calculated area of a rectangle
    """

    rect_area = None
    rect_area = length*width

    return rect_area

In [None]:
print(rectangle_area.__doc__)

In [None]:
help(rectangle_area)

<hr style="border:2px solid gray"></hr>

<a id='best_practice_collaborate'></a>
## 6. Collaborate

<b>Why</b>
1. This is similar to "Peer Review", where other people
    - read your code,
    - use your code in unexpected ways,
    - bring their perspective to your ideas, and
    - provide quality control and feedback.

<b>How</b>
1. Have a central location for the code (e.g., Github, Dropbox)
    
2. Sit down and co-code together (or through online servers, e.g., discord)
    
3. Give it to others and have them run the code

4. Give credit/acknowledgment to everyone who contributed to the ideas and writing

<hr style="border:2px solid gray"></hr>

<a id='best_practice_plan'></a>
## 7. Plan for Mistakes

<b>Why</b> Because mistakes will happen by you and by the program's user.

<b>How</b>

1. Use `isinstance` statements<br><br>

2. Use `assert` statements (if True ...) to check the program's operation <b>while developing code</b>
    - They stop the program if something is wrong
    - Think of them as executable documentation (i.e. explains what is going on within the code)
    - (However, asserts do have some issues with their usage --- next lecture)<br><br>
    
3. Take a divide-and-conquer approach to coding: simplify the code and problems
    - e.g., user-defined functions<br><br>

4. Turn bugs into test cases (e.g., in a unit test framework)<br><br>
        
5. (Unit tests - automated test on a single "unit" of code)<br><br>
    - unittest (built-in library)- https://docs.python.org/3/library/unittest.html
    - pytest - https://docs.pytest.org

6. (Integration tests - test if units of code work together)<br><br>
        
7. (Regression tests - the program's behavior doesn't change when the program's details are modified)
    - e.g., the output data is presented and remains the same<br><br>
            
8. (Try-except statements - handle exceptions; https://docs.python.org/3/tutorial/errors.html)<br><br>    
            
<b>Note 1</b>: Tests are often done to see if the code's output matches the researcher's expectations. That means you must have a good understanding of the problem.
    
<b>Note 2</b>: Test can often initially be done on simpler systems<br><br>

<b>Example</b>: Add an `isinstance` statement. Alternatively, add an `assert` statements to the circle_area function (see point #2 above), and then supply the function with a negative radius. 

In [None]:
from math import pi


def circle_area(radius_input: float) -> float:
    ''' Compute the area of a circle

        Input:
            radius: radius of a circle
        Return:
            area: area of a circle
    '''
    if not isinstance(radius_input, float):
        raise TypeError(f'The radius_input value (i.e. {radius_input}) was not a float.')

    #assert (radius_input >= 0), f'The radius value (i.e. {radius_input}) must be positive.'

    circle_area = None
    circle_area = pi*(radius_input**2)

    return circle_area


radii = [1.0, 2.0, -2.5, 3.0]

for radius in radii:
    area = circle_area(radius_input=radius)
    print(f'Area of circle with radius of {radius:0.1f}: {area:0.1e}')

In [None]:
circle_area(radius_input=5)

Unittest require a bit more knowledge to understand their structure (e.g., class, self), but below is a very simple starting point for those who want to explore incorporating them into their Jupyter notebooks.

In [None]:
import unittest

In [None]:
class Testing(unittest.TestCase):
    def test_string(self):
        a = 'some'
        b = 'some'
        self.assertEqual(a, b)

    def test_boolean(self):
        a = True
        b = True
        self.assertEqual(a, b)


# execute Testing class (and its functions) when the cell is executed
unittest.main(argv=[''], verbosity=2, exit=False)

Applying to use our `circle_area` function above:

In [None]:
class Testing(unittest.TestCase):
    def test_area_correct(self):
        area = circle_area(radius_input=1.0)
        target_area = 3.141592653589793
        
        self.assertEqual(area, target_area)

    def test_area_incorrect(self):
        area = circle_area(radius_input=1.0)
        target_area = 3.1
        
        self.assertEqual(area, target_area)


unittest.main(argv=[''], verbosity=2, exit=False)