# PyData Calgary July Meeetup

## Packaging and Organizing Your Python Code

One of the challenges of using Python and the PyData ecosystem for data analysis is that it doesn't come set up for the typical use case.

When I used Stata, there was a standard method for writing scripts, logging results and creating stored procedures to use later. This is spelled out in all the examples, so there is a shared culture.

With Python, because it is put to such diverse use cases, there is less of this shared culture.

This talk will discuss setting up some of the standard tools in order to have a better foundation to build upon. While Jupyter notebooks are great, and I think they are a killer feature of python - there is a point at which they make it more difficult to organize code.

### Outline

* `conda` environments
* `conda` revisions
* "Packaging" for development
    * `conda develop`
    * `pip install -e`
* logging
* testing
    * `py.test`
    * numpy test suite

### Conda Environments

This presentation is pretty specific to `conda` and Anaconda in general.
Python itself has similar functionality (virtualenv). The big difference between `conda` and `pip` (the standard python package manager) is that `conda` is specifically designed to store multi-platform **binary** (i.e. they don't neeed to be complied) versions of the packages. (That's what makes it easy to install `numpy` and `scipy` via `conda`.)

Conda environments are great. They allow you to set up an isolated environment to install packages to work on projects. Environments can contain specific versions of a package.

This allows you to do things like have a Python 2 and a Python 3 package running on the same machine. Or you could have two different versions of pandas.

`conda` does this by storing all the installed packages in one location (within the directory that you installed conda into). Each version of the package is stored separately.

The environments create _links_ to the actual packages, so you will only install a particular version of a package **once**, no matter how many environments it appears in.

The `conda create` command will create an environment. You can specify a package to install in the environment.

In [2]:
!conda create -y --name pydata_demo anaconda

Fetching package metadata .......
Solving package specifications: ..........

Package plan for installation in environment /Users/sechilds/miniconda3/envs/pydata_demo:

The following NEW packages will be INSTALLED:

    _nb_ext_conf:       0.2.0-py35_0      
    alabaster:          0.7.8-py35_0      
    anaconda:           4.1.0-np111py35_0 
    anaconda-client:    1.4.0-py35_0      
    anaconda-navigator: 1.2.1-py35_0      
    appnope:            0.1.0-py35_0      
    appscript:          1.0.1-py35_0      
    argcomplete:        1.0.0-py35_1      
    astropy:            1.2.1-np111py35_0 
    babel:              2.3.3-py35_0      
    backports:          1.0-py35_0        
    beautifulsoup4:     4.4.1-py35_0      
    bitarray:           0.8.1-py35_0      
    bokeh:              0.11.1-py35_0     
    boto:               2.40.0-py35_0     
    bottleneck:         1.0.0-np111py35_1 
    cffi:               1.6.0-py35_0      
    chest:              0.2.3-py35_0      
    click:

You can get a list of the installed environments.

In [3]:
!conda info --envs

# conda environments:
#
dhcarpentry              /Users/sechilds/miniconda3/envs/dhcarpentry
pydata_demo              /Users/sechilds/miniconda3/envs/pydata_demo
pydata_test           *  /Users/sechilds/miniconda3/envs/pydata_test
pydata_test2             /Users/sechilds/miniconda3/envs/pydata_test2
python_testing           /Users/sechilds/miniconda3/envs/python_testing
root                     /Users/sechilds/miniconda3



It's easy to make a copy of an environment

In [5]:
!conda create -y --name pydata_demo2 --clone pydata_demo

Source:      /Users/sechilds/miniconda3/envs/pydata_demo
Destination: /Users/sechilds/miniconda3/envs/pydata_demo2
Packages: 168
Files: 5
Linking packages ...
[      COMPLETE      ]|###################################################| 100%
#
# To activate this environment, use:
# $ source activate pydata_demo2
#
# To deactivate this environment, use:
# $ source deactivate
#


In [6]:
!conda env list

# conda environments:
#
dhcarpentry              /Users/sechilds/miniconda3/envs/dhcarpentry
pydata_demo              /Users/sechilds/miniconda3/envs/pydata_demo
pydata_demo2             /Users/sechilds/miniconda3/envs/pydata_demo2
pydata_test           *  /Users/sechilds/miniconda3/envs/pydata_test
pydata_test2             /Users/sechilds/miniconda3/envs/pydata_test2
python_testing           /Users/sechilds/miniconda3/envs/python_testing
root                     /Users/sechilds/miniconda3



Deleting environments is done through the `conda remove` command.

In [7]:
!conda remove --name pydata_demo2 -y --all


Package plan for package removal in environment /Users/sechilds/miniconda3/envs/pydata_demo2:

The following packages will be REMOVED:

    _nb_ext_conf:       0.2.0-py35_0      
    alabaster:          0.7.8-py35_0      
    anaconda:           4.1.0-np111py35_0 
    anaconda-client:    1.4.0-py35_0      
    anaconda-navigator: 1.2.1-py35_0      
    appnope:            0.1.0-py35_0      
    appscript:          1.0.1-py35_0      
    argcomplete:        1.0.0-py35_1      
    astropy:            1.2.1-np111py35_0 
    babel:              2.3.3-py35_0      
    backports:          1.0-py35_0        
    beautifulsoup4:     4.4.1-py35_0      
    bitarray:           0.8.1-py35_0      
    bokeh:              0.11.1-py35_0     
    boto:               2.40.0-py35_0     
    bottleneck:         1.0.0-np111py35_1 
    cffi:               1.6.0-py35_0      
    chest:              0.2.3-py35_0      
    click:              6.6-py35_0        
    cloudpickle:        0.2.1-py35_0      
   

In [8]:
!conda remove --name pydata_demo -y --all


Package plan for package removal in environment /Users/sechilds/miniconda3/envs/pydata_demo:

The following packages will be REMOVED:

    _nb_ext_conf:       0.2.0-py35_0      
    alabaster:          0.7.8-py35_0      
    anaconda:           4.1.0-np111py35_0 
    anaconda-client:    1.4.0-py35_0      
    anaconda-navigator: 1.2.1-py35_0      
    appnope:            0.1.0-py35_0      
    appscript:          1.0.1-py35_0      
    argcomplete:        1.0.0-py35_1      
    astropy:            1.2.1-np111py35_0 
    babel:              2.3.3-py35_0      
    backports:          1.0-py35_0        
    beautifulsoup4:     4.4.1-py35_0      
    bitarray:           0.8.1-py35_0      
    bokeh:              0.11.1-py35_0     
    boto:               2.40.0-py35_0     
    bottleneck:         1.0.0-np111py35_1 
    cffi:               1.6.0-py35_0      
    chest:              0.2.3-py35_0      
    click:              6.6-py35_0        
    cloudpickle:        0.2.1-py35_0      
    

In [9]:
!conda env list

# conda environments:
#
dhcarpentry              /Users/sechilds/miniconda3/envs/dhcarpentry
pydata_test           *  /Users/sechilds/miniconda3/envs/pydata_test
pydata_test2             /Users/sechilds/miniconda3/envs/pydata_test2
python_testing           /Users/sechilds/miniconda3/envs/python_testing
root                     /Users/sechilds/miniconda3



If you want to know which packages are installed in your current environment, try `conda list`:

In [10]:
!conda list

# packages in environment at /Users/sechilds/miniconda3/envs/pydata_test:
#
_nb_ext_conf              0.2.0                    py35_0  
alabaster                 0.7.8                    py35_0  
anaconda                  4.1.0               np111py35_0  
anaconda-client           1.4.0                    py35_0  
anaconda-navigator        1.2.1                    py35_0  
appnope                   0.1.0                    py35_0  
appscript                 1.0.1                    py35_0  
argcomplete               1.0.0                    py35_1  
astropy                   1.2.1               np111py35_0  
babel                     2.3.3                    py35_0  
backports                 1.0                      py35_0  
beautifulsoup4            4.4.1                    py35_0  
bitarray                  0.8.1                    py35_0  
bokeh                     0.11.1                   py35_0  
boto                      2.40.0                   py35_0  
bottleneck              

The important thing to note about packages is that we not only track the version number (the middle column), but we also have a string that represents the python version (the `py35` part) and -- if applicable -- the `numpy` version (the `np111` part).

This is because `conda` packages contain binaries and not just source code.

If you want to see the changes to the packages install in an environment, try the `--revisions` option:

In [11]:
!conda list --revisions

2016-07-04 07:43:34  (rev 0)
    _nb_ext_conf-0.2.0-py35_0
    alabaster-0.7.8-py35_0
    anaconda-4.1.0-np111py35_0
    anaconda-client-1.4.0-py35_0
    anaconda-navigator-1.2.1-py35_0
    appnope-0.1.0-py35_0
    appscript-1.0.1-py35_0
    argcomplete-1.0.0-py35_1
    astropy-1.2.1-np111py35_0
    babel-2.3.3-py35_0
    backports-1.0-py35_0
    beautifulsoup4-4.4.1-py35_0
    bitarray-0.8.1-py35_0
    bokeh-0.11.1-py35_0
    boto-2.40.0-py35_0
    bottleneck-1.0.0-np111py35_1
    cffi-1.6.0-py35_0
    chest-0.2.3-py35_0
    click-6.6-py35_0
    cloudpickle-0.2.1-py35_0
    clyent-1.2.2-py35_0
    colorama-0.3.7-py35_0
    configobj-5.0.6-py35_0
    contextlib2-0.5.3-py35_0
    cryptography-1.4-py35_0
    curl-7.49.0-0
    cycler-0.10.0-py35_0
    cython-0.24-py35_0
    cytoolz-0.8.0-py35_0
    dask-0.10.0-py35_0
    datashape-0.5.2-py35_0
    decorator-4.0.10-py35_0
    dill-0.2.5-py35_0
    docutils-0.12-py35_2
    dynd-python-0.7.2-py35_0
    ent

In [12]:
!conda install pytest

Fetching package metadata .......
Solving package specifications: ..........

# All requested packages already installed.
# packages in environment at /Users/sechilds/miniconda3/envs/pydata_test:
#
pytest                    2.9.2                    py35_0  


In [13]:
!conda list -r -n dhcarpentry

2016-02-18 06:51:07  (rev 0)
    abstract-rendering-0.5.1-np110py35_0
    alabaster-0.7.7-py35_0
    anaconda-2.5.0-np110py35_0
    anaconda-client-1.2.2-py35_0
    appnope-0.1.0-py35_0
    appscript-1.0.1-py35_0
    argcomplete-1.0.0-py35_1
    astropy-1.1.1-np110py35_0
    babel-2.2.0-py35_0
    beautifulsoup4-4.4.1-py35_0
    bitarray-0.8.1-py35_0
    blaze-core-0.9.0-py35_0
    bokeh-0.11.0-py35_0
    boto-2.39.0-py35_0
    bottleneck-1.0.0-np110py35_0
    cffi-1.2.1-py35_0
    clyent-1.2.0-py35_0
    colorama-0.3.6-py35_0
    configobj-5.0.6-py35_0
    cryptography-1.0.2-py35_0
    curl-7.45.0-0
    cycler-0.9.0-py35_0
    cython-0.23.4-py35_1
    cytoolz-0.7.5-py35_0
    datashape-0.5.0-py35_0
    decorator-4.0.6-py35_0
    docutils-0.12-py35_0
    dynd-python-0.7.1-py35_0
    et_xmlfile-1.0.1-py35_0
    fastcache-1.0.2-py35_0
    flask-0.10.1-py35_1
    freetype-2.5.5-0
    futures-3.0.3-py35_0
    greenlet-0.4.9-py35_0
    h5py-2.5.0-np110py35

You always have a record of how each environment was built. You don't need to keep track yourself.

#### `environment.yml` files

Part of `pip`'s functionality is `requirements.txt` files - which are a list of python packages required by the current project. The `pip freeze` command will generate a list of the currently installed packages (through `pip`) for you - with their version numbers. You don't have to specify the exact version.

When you use the `pip` bundled with `conda`, you get a list of **ALL** the packages installed.

In [14]:
!pip freeze

alabaster==0.7.8
anaconda-client==1.4.0
anaconda-navigator==1.2.1
appnope==0.1.0
appscript==1.0.1
argcomplete==1.0.0
astropy==1.2.1
Babel==2.3.3
backports.shutil-get-terminal-size==1.0.0
beautifulsoup4==4.4.1
bitarray==0.8.1
bokeh==0.11.1
boto==2.40.0
Bottleneck==1.0.0
cffi==1.6.0
chest==0.2.3
click==6.6
cloudpickle==0.2.1
clyent==1.2.2
colorama==0.3.7
configobj==5.0.6
contextlib2==0.5.3
cryptography==1.4
cycler==0.10.0
Cython==0.24
cytoolz==0.8.0
dask==0.10.0
datashape==0.5.2
decorator==4.0.10
dill==0.2.5
docutils==0.12
dynd==0.7.3.dev1
et-xmlfile==1.0.1
fastcache==1.0.2
Flask==0.11.1
Flask-Cors==2.1.2
gevent==1.1.1
greenlet==0.4.10
h5py==2.6.0
HeapDict==1.0.0
idna==2.1
imagesize==0.7.1
ipykernel==4.3.1
ipython==4.2.0
ipython-genutils==0.1.0
ipywidgets==4.1.1
itsdangerous==0.24
jdcal==1.2
jedi==0.9.0
Jinja2==2.8
jsonschema==2.5.1
jupyter==1.0.0
jupyter-client==4.3.0
jupyter-console==4.1.1
jupyter-core==4.1.0
llvmlite==0.11.0
lock

In `conda` there is a similar concept, the `environment.yml` file. These files use the "Yet Another Markup Language" (YAML) format. See [http://yaml.org/](http://yaml.org/).

An `environment.yml` file looks like:

    name: test-env
    dependencies:
    - python=3
    - numpy
    - pandas
    - pip
        - selenium

In [15]:
!conda env export -n dhcarpentry

name: dhcarpentry
dependencies:
- abstract-rendering=0.5.1=np110py35_0
- alabaster=0.7.7=py35_0
- anaconda=2.5.0=np110py35_0
- anaconda-client=1.2.2=py35_0
- appnope=0.1.0=py35_0
- appscript=1.0.1=py35_0
- argcomplete=1.0.0=py35_1
- astropy=1.1.1=np110py35_0
- babel=2.2.0=py35_0
- beautifulsoup4=4.4.1=py35_0
- bitarray=0.8.1=py35_0
- blaze-core=0.9.0=py35_0
- bokeh=0.11.0=py35_0
- boto=2.39.0=py35_0
- bottleneck=1.0.0=np110py35_0
- cffi=1.2.1=py35_0
- clyent=1.2.0=py35_0
- colorama=0.3.6=py35_0
- configobj=5.0.6=py35_0
- cryptography=1.0.2=py35_0
- curl=7.45.0=0
- cycler=0.9.0=py35_0
- cython=0.23.4=py35_1
- cytoolz=0.7.5=py35_0
- datashape=0.5.0=py35_0
- decorator=4.0.6=py35_0
- docutils=0.12=py35_0
- dynd-python=0.7.1=py35_0
- et_xmlfile=1.0.1=py35_0
- fastcache=1.0.2=py35_0
- flask=0.10.1=py35_1
- freetype=2.5.5=0
- futures=3.0.3=py35_0
- greenlet=0.4.9=py35_0
- h5py=2.5.0=np110py35_4
- hdf5=1.8.15.1=2
- idna=2.0=py35_0
- ipykernel=4.2.2=py35_0

You should create the `environment.yml` file by hand.

There is a nice script that will let you change to a different conda environment every time you enter a project directory.

[https://github.com/chdoig/conda-auto-env](https://github.com/chdoig/conda-auto-env)

In [17]:
%%bash
function conda_auto_env() {
  if [ -e "environment.yml" ]; then
    # echo "environment.yml file found"
    ENV=$(head -n 1 environment.yml | cut -f2 -d ' ')
    # Check if you are already in the environment
    if [[ $PATH != *$ENV* ]]; then
      # Check if the environment exists
      source activate $ENV
      if [ $? -eq 0 ]; then
        :
      else
        # Create the environment and activate
        echo "Conda env '$ENV' doesn't exist."
        conda env create -q
        source activate $ENV
      fi
    fi
  fi
}

Then you need to add that function to your `PROMPT_COMMAND` environment variable, so it is executed every time you display your `bash` prompt.

    export PROMPT_COMMAND=conda_auto_env

Environments (and virtualenvs) work by manipulating the way Python and/or your shell find programs. Ultimately, it boils down to a list of locations to check -- in order -- for a particular python module or command line program. By manipulating this list, you can change which particular version or set of programs are found.

One key part of this is that you can change these paths for one particular shell (e.g. terminal window), but it does not change that for any other one.

You can look at the path that python uses using `sys.path`:

In [19]:
import sys
print('\n'.join(sys.path))


/Users/sechilds/miniconda3/envs/pydata_test/lib/python35.zip
/Users/sechilds/miniconda3/envs/pydata_test/lib/python3.5
/Users/sechilds/miniconda3/envs/pydata_test/lib/python3.5/plat-darwin
/Users/sechilds/miniconda3/envs/pydata_test/lib/python3.5/lib-dynload
/Users/sechilds/miniconda3/envs/pydata_test/lib/python3.5/site-packages
/Users/sechilds/miniconda3/envs/pydata_test/lib/python3.5/site-packages/Sphinx-1.4.1-py3.5.egg
/Users/sechilds/miniconda3/envs/pydata_test/lib/python3.5/site-packages/aeosa
/Users/sechilds/miniconda3/envs/pydata_test/lib/python3.5/site-packages/setuptools-23.0.0-py3.5.egg
/Users/sechilds/miniconda3/envs/pydata_test/lib/python3.5/site-packages/IPython/extensions
/Users/sechilds/.ipython


A similar concept exists for your shell. The `$PATH` environment variable.

In [25]:
import os
print('\n'.join(os.environ['PATH'].split(':')))

/Users/sechilds/miniconda3/envs/pydata_test/bin
/Users/sechilds/miniconda3/bin
/usr/local/var/rbenv/shims
/usr/local/heroku/bin
/usr/local/bin
/usr/local/sbin
/usr/bin
/bin
/usr/sbin
/sbin
/opt/X11/bin
/Users/sechilds/.bin
/Users/sechilds/bin
/Users/sechilds/.tcscripts
/usr/local/stata
/usr/texbin
/usr/local/share/npm/bin
/usr/local/opt/ruby/bin
/Users/sechilds/gocode//bin
/Users/sechilds/bin
/usr/local/stata


The first line of the `$PATH` is why switching to a conda environment changes what your command line does. It also allows you to make your environment's command line scripts accessible from your shell.

When you ask python to find a module or when you ask bash to find a program to run, they simply search through their lists until they find it. Once they find it, they use it and stop looking.

### Developing Your Own Packages

I promised you that this was going to be about creating your own packages and organizing your code. Python is setup so it will look in the current directory for `.py` files, which are called _modules_. This is useful for moving your code from a notebook to a file, but it's limiting because it makes you put everything in one directory.

It would be easier to put things in various sub-directories, organized by the part of the project you are working on. But that makes it hard to import those files into your notebooks.

I have a workflow where I test things out using Jupyter notebooks. If I'm using an unfamilar package or a new dataset, I can quickly run the commands and have the results appear instantly.

But first I find myself running the same commands over and over again, so I create a function. If it's a useful function, I end up wanting in in several notebooks. I can copy it from one to another, but then I could end up with several versions of it.

The solution is to put it in a separate file and import that into the notebook. But if you get enough of those functions, you are well on your way to having a package.

So, if I want to store my `.py` files separately from my notebooks, it's not particularly easy. While you can use **relative imports** to--say--import something from a parent directory, it's still a bit awkward.

The solution is what's known as installing a package in "development mode".

#### The Tools: `conda develop` and `pip install -e`

Since we are using `conda`, we should start out with the easiest way. The `conda-build` package includes the `conda develop` tool that will be the easiest place to start since it doesn't require much setup of your code.

The `pip` version requires a little more setup, but it's fully compatible with conda.

#### The magic `__init__.py` file

Python uses a simple method for making an ordinary directory into a module. If you include an `__init__.py` file in the directory, python will treat it as a module. This means that you can `import` it.

The `__init__.py` can be empty, it's mere presence is enough to make it work.