# Introduction to Python, Test Driven Development, and Koans

## 2.1 Python

The following has been extracted from [J.R. Johansson's](https://github.com/jrjohansson/scientific-python-lectures) wonderful Python tutorial series.

### The role of computing in science

Science has traditionally been divided into experimental and theoretical disciplines, but during the last several decades computing has emerged as a very important part of science. Scientific computing is often closely related to theory, but it also has many characteristics in common with experimental work. It is therefore often viewed as a new third branch of science. In most fields of science, computational work is an important complement to both experiments and theory, and nowadays a vast majority of both experimental and theoretical papers involve some numerical calculations, simulations or computer modeling.

In experimental and theoretical sciences there are well established codes of conducts for how results and methods are published and made available to other scientists. For example, in theoretical sciences, derivations, proofs and other results are published in full detail, or made available upon request. Likewise, in experimental sciences, the methods used and the results are published, and all experimental data should be available upon request. It is considered unscientific to withhold crucial details in a theoretical proof or experimental method, that would hinder other scientists from replicating and reproducing the results.

In computational sciences there are not yet any well established guidelines for how source code and generated data should be handled. For example, it is relatively rare that source code used in simulations for published papers are provided to readers, in contrast to the open nature of experimental and theoretical work. And it is not uncommon that source code for simulation software is withheld and considered a competitive advantage (or unnecessary to publish).
However, this issue has recently started to attract increasing attention, and a number of editorials in high-profile journals have called for increased openness in computational sciences. Some prestigious journals, including Science, have even started to demand of authors to provide the source code for simulation software used in publications to readers upon request.

Discussions are also ongoing on how to facilitate distribution of scientific software, for example as supplementary materials to scientific papers.

### References
* [Reproducible Research in Computational Science](http://dx.doi.org/10.1126/science.1213847), Roger D. Peng, Science 334, 1226 (2011).
* [Shining Light into Black Boxes](http://dx.doi.org/10.1126/science.1218263), A. Morin et al., Science 336, 159-160 (2012).
* [The case for open computer programs](http://dx.doi.org/doi:10.1038/nature10836), D.C. Ince, Nature 482, 485 (2012)."

### Requirements on scientific computing


*Replication* and *reproducibility* are two of the cornerstones in the scientific method. With respect to numerical work, complying with these concepts have the following practical implications:

* Replication: An author of a scientific paper that involves numerical calculations should be able to rerun the simulations and replicate the results upon request. Other scientist should also be able to perform the same calculations and obtain the same results, given the information about the methods used in a publication.

* Reproducibility: The results obtained from numerical simulations should be reproducible with an independent implementation of the method, or using a different method altogether.

In summary: A sound scientific result should be reproducible, and a sound scientific study should be replicable.

To achieve these goals, we need to:
* Keep and take note of exactly which source code and version that was used to produce data and figures in published papers.
* Record information of which version of external software that was used. * Keep access to the environment that was used.
* Make sure that old codes and notes are backed up and kept for future reference.
* Be ready to give additional information about the methods used, and perhaps also the simulation codes, to an interested reader who requests it (even years after the paper was published!).
* Ideally codes should be published online, to make it easier for other scientists interested in the codes to access it.

### What is Python?

[Python](https://www.python.org) is a modern, general-purpose, object-oriented, high-level programming language.

General characteristics of Python:

* **clean and simple language**: Easy-to-read and intuitive code, easy-to-learn minimalistic syntax, maintainability scales well with size of projects.
* **expressive language**: Fewer lines of code, fewer bugs, easier to maintain.

Technical details:

* **dynamically typed**: No need to define the type of variables, function arguments or return types.
* **automatic memory management**: No need to explicitly allocate and deallocate memory for variables and data arrays. No memory leak bugs.
* **interpreted**: No need to compile the code. The Python interpreter reads and executes the python code directly.

Advantages:

* The main advantage is ease of programming, minimizing the time required to develop, debug and maintain the code.
* Well designed language that encourage many good programming practices:
    * Modular and object-oriented programming, good system for packaging and re-use of code. This often results in more transparent, maintainable and bug-free code.
    * Documentation tightly integrated with the code.
* A large standard library, and a large collection of add-on packages.

Disadvantages:

* Since Python is an interpreted and dynamically typed programming language, the execution of python code can be slow compared to compiled statically typed programming languages, such as C and Fortran.
* Somewhat decentralized, with different environment, packages and documentation spread out at different places. Can make it harder to get started.

### What  makes Python suitable for scientific computing?

<img src="images/optimizing-what.png" />

* Python has a strong position in scientific computing:
    * Large community of users, easy to find help and documentation.
* Extensive ecosystem of scientific libraries and environments
    * numpy: http://numpy.scipy.org - Numerical Python
    * scipy: http://www.scipy.org - Scientific Python
    * matplotlib: http://www.matplotlib.org - graphics library
* Great performance due to close integration with time-tested and highly optimized codes written in C and Fortran:
    * blas, altas blas, lapack, arpack, Intel MKL, ...
* Good support for
    * Parallel processing with processes and threads
    * Interprocess communication (MPI)
    * GPU computing (OpenCL and CUDA)
* Readily available and suitable for use on high-performance computing clusters.
* No license costs, no unnecessary use of research budget.

### The scientific Python software stack

<img src="images/scientific-python-stack.png"/>

### Python Environments


Python is not only a programming language, but often also refers to the standard implementation of the interpreter (technically referred to as CPython) that actually runs the python code on a computer.

There are also many different environments through which the python interpreter can be used. Each environment has different advantages and is suitable for different workflows. One strength of python is that it is versatile and can be used in complementary ways, but it can be confusing for beginners so we will start with a brief survey of python environments that are useful for scientific computing.

### Python Interpreter


The standard way to use the Python programming language is to use the Python interpreter to run python code. The python interpreter is a program that reads and execute the python code in files passed to it as arguments. At the command prompt, the command python is used to invoke the Python interpreter.

For example, to run a file my-program.py that contains python code from the command prompt, use::

    $ python my-program.py

We can also start the interpreter by simply typing python at the command line, and interactively type python code into the interpreter.

<img src="images/python_terminal.png" />

### IPython

IPython is an interactive shell that addresses the limitation of the standard python interpreter, and it is a work-horse for scientific use of python. It provides an interactive prompt to the python interpreter with a greatly improved user-friendliness.

<img src="images/ipython_terminal.png" />

Some of the many useful features of IPython includes:

* Command history, which can be browsed with the up and down arrows on the keyboard.
* Tab auto-completion.
* In-line editing of code.
* Object introspection, and automatic extract of documentation strings from python objects like classes and functions.
* Good interaction with operating system shell.
* Support for multiple parallel back-end processes, that can run on computing clusters or cloud services like Amazon EE2.

### IPython (Jupyter) Notebooks


[IPython / Jupyter](http://ipython.org) notebook is an HTML-based notebook environment for Python, similar to Mathematica or Maple. It is based on the IPython shell, but provides a cell-based environment with great interactivity, where calculations can be organized and documented in a structured way.

<img src="images/ipython-notebook-screenshot.jpg" />

Although using a web browser as graphical interface, IPython notebooks are usually run locally, from the same computer that run the browser. To start a new IPython notebook session, run the following command:

    $ ipython notebook
    
or

    $ jupyter notebook

from a directory where you want the notebooks to be stored. This will open a new browser window (or a new tab in an existing window) with an index page where existing notebooks are shown and from which new notebooks can be created.

---------------

### It is all about `print`...
<img src="images/python-2-vs-python-31.jpg" />

It is not actually all about the difference between `print` and `print()`, but that might be the change that most casual users will encounter.  Python has made a number of changes transitioning from version 2 to version 3.  In my perspective, the former is an old stalwart, that has performed well for quite a while, and the latter is a new kid that brings a lot of features that you only know to miss once you have used them.

In the past, a frequent argument focused on libraries from the scientific computing stack only being Python 2 compatible.  This is no longer true. 

### References
* [Picking a Python Version: A Manifesto](https://pythonizame.s3.amazonaws.com/media/Book/picking-python-version-manifesto/file/50d2ac5e-6d4b-11e5-964d-04015fb6ba01.pdf) Mertz, David (2015).
* [What should I learn as a Beginner: Python 2 or Python 3](http://learntocodewith.me/programming/python/python-2-vs-python-3/) Learn to Code with Me (2014). - (Make sure to check the update.)

---------

## 2.2 Test Driven Development

<img src="images/rgr.gif" />
Image from: http://labs.nintex.com/wp-content/uploads/2015/03/tdd_flow.gif

In quite broad strokes, Test Driven Development (TDD) is a development process that focuses on a short development cycle where a test a first written, that test understandably fails (the code has not been written yet), so the necessary code to have the test pass is written.  Then the code can be refactored, perhaps for generalization or to improve performance.  I take the view that TDD is not a mechanism to ensure that the entirety of your code is bug free.  This is a laudable goal that I would argue, can result in longer development times and wasteful tests.  I take a more pragmatic approach.  What TDD does provide is a means to ensure that you, the developer, understand what the code is supposed to do.  TDD, in writing unit and functional tests provides the opportunity to think about requirements and design.



---------

## 2.3 What are Koans? <img src="images/zencircle.png"/>

## ko·an
### ˈkōän/

    noun
    plural noun: koans
    a paradoxical anecdote or riddle, used in Zen Buddhism to demonstrate the inadequacy of logical reasoning and to provoke enlightenment.


Within the development community, the concept of Koans, as a tool to learn a new programming language has been gaining traction.  I attribute some of this to the growth of Test Driven Development (and [Behavior Driven Development](http://guide.agilealliance.org/guide/bdd.html)).  The first Koans widely available, to my knowledge, emerged from the [Ruby Community](http://rubykoans.com).  The basic concept is simple - learn language syntax, structure, coding style, and commonly used libraries, through the act of testing.

## A Koan Walkthrough

Imagine that we have a test file with the following:

```python

import unittest

class FirstKoan(unittest.TestCase):
    
    def test_foo(self):
        self.assertTrue(False)
```

 * The first line, `import unittest` simply brings (or imports) the testing module `unittest` into the script (or more specifically namespace.)
 * Next a class is defined, `class FirstKoan(unittest.TestCase):`.  We can ignore this for now.
 * The third line, `def test_foo(self):`, is a function signature.  For now, when you see `def test_*(self):`, where the `*` is a wildcard, it indicates that we have a new test.  
 * The final line is the actual test.  Here we are asserting that the item(s) within the parentheses is True.  
 
 The above test will fail.  Intuitively False is not True.  Programmatically `True != False`.  We could replace the argument in the function to be `type(1) == str` or `float == int`.  Both of these statements would evaluate to False.
 
The Python documentation describes a number of `assert*` statements [here](https://docs.python.org/3/library/unittest.html#unittest.TestCase.debug).

In the spirit of other programming Koans, the above test is designed to fail.  The task is to alter the test so that it passes.  For the above example we could replace `self.assertTrue(False)` with `self.assertTrue(True)`.  We could also be a bit fancier with something like `self.assertTrue(1 == 1)`.

In the next cells, I will utilize a code example to illustrate a Python traceback and walk through the process of completing the assignment for this week.

In [2]:
a = 1
b = 2
assert(a == b)

AssertionError: 

Above, is an example of a Python traceback that fails to assert that 1 equals 2.  Tracebacks are most easily read from the bottom up - the error that we are trying to fix is often the final line of the traceback and the other lines simply offer context.  In the example, the code fails due to an AssertionError on the line denoted with a `---->`.  Knowing where the code is failing, I could not make the necessary alteration.

In [4]:
a = 1
b = 2
assert(a != b)  # != means not equal

# The lack of a traceback tells us that this code passed.

## 2.4 Assignment 1 (E1) Tutorial

At a high level, to complete the assignment you will:

* Be invited to access assignment 1 (assuming you turned in assignment 0).
* Fork assignment 1 into a repository of your own.
* Open the file `tests.py` in Github.
* Edit the file to fix the failing tests.
* Submit a Pull Request with your changes (this will cause automated testing to run and provide you with feedback).

### Fork assignment 1

(Links are to github documentation guides.)

1. To start, [**fork** the repository][forking].
1. [**Clone**][ref-clone] the repository to your computer.
1. Modify the files and [**commit**][ref-commit] changes to complete your
   solution.
1. [**Push**][ref-push]/sync the changes up to GitHub.
1. [Create a **pull request**][pull-request] on the original repository to turn
   in the assignment.

[forking]: https://guides.github.com/activities/forking/
[ref-clone]: http://gitref.org/creating/#clone
[ref-commit]: http://gitref.org/basic/#commit
[ref-push]: http://gitref.org/remotes/#push
[pull-request]: https://help.github.com/articles/creating-a-pull-request

### Open the file `test.py`

* Now that the repository is forked, click the `tests.py` file.

<img src="images/click_test.png" />

* Click the edit button to be able to edit the file.

<img src="images/edit_test.png" />

* Make the necessary changes to the file and commit the file.

<img src="images/commit_test.png" />

* Submit a pull request.
* Wait for the automated tests to run.  In future weeks we will look at what is going on.  In brief, a service ([Travis CI](http://travis-ci.org/)) is running all of the tests when a pull request is first submitted or when a change is made to an open pull request.  The tests will take ~60 seconds to run.

Here we see an example of a failing test:
<img src="images/failing_pr.png" />

Click on the red `X` (after the commit message, and before a commit number) to open the TravisCI page.  On the travis page you will see header information that simply documents have a virtual (cloud hosted) test machine was set up.  After that you will see one or more tracebacks.  These tracebacks, like the example above provide feedback about where the error is occuring. 

Here we see an example of all tests passing:
<img src="images/passing_pr.png" />

The green check tells us that all tests are passing.  The green check, just like the red `X` is a clickable link to the build information.  Once your pull request shows a green check you are good to go.  This could take one commit or many - the number of times you tweak the code does not matter.

---------

# Week 2 Deliverables (E1) - Due 1/26/16
For this week make sure that you have completed the following:
    
   
* Fork Assignment 1 to your own github repository.
    * You can access assignment 1 [HERE](https://classroom.github.com/assignment-invitations/320e9ec88ed80a66793b9d534f74b287).
* Use the Github editor to edit the file `test.py` to correct all of the failing tests.
* Submit a pull request to my Assignment 1 repository.