In [125]:
%load_ext watermark
%watermark -a 'Kamran Haider' -u -d -v -p numpy,pandas,matplotlib,scipy,sklearn
import numpy as np
from sklearn.base import BaseEstimator
from sklearn.model_selection import train_test_split
from sklearn import datasets

The watermark extension is already loaded. To reload it, use:
  %reload_ext watermark
Kamran Haider 
last updated: 2018-07-23 

CPython 3.6.5
IPython 6.4.0

numpy 1.14.5
pandas 0.23.3
matplotlib 2.2.2
scipy 1.1.0
sklearn 0.19.2


# Packaging Python Machine Learning Code with Conda

Going from a classifier idea to a package that others can use:
1. Code it up
2. Organize directory 
3. Set up version control (Git)
4. Set up unit testing (pytest)
5. Configure installation (setuptools)
6. Build package (conda-build)
7. Create a distribution channel (Anaconda)
8. Upload to Anaconda

# Fun with notebooks
Here is how you render them as slides:
```bash
jupyter nbconvert ml_packaging_workshop.ipynb --to slides --post serve
```
Be sure to toggle slide show by going to `View->Cell Toolbar->Slideshow` and then assign a type to each cell.

If you are bored with how your notebooks look, try `jupyter-themes` notebook.


## A prelude to conda enviornments
* Better to set them up systematically (put all specifications in a file)
* Create a file `enviornment.yml`, this is what it should look like:
```yaml
name: workshop
dependencies:
  - python=3.6
  - pytest
  - ...
```
* Create the enviornment:
```bash
conda env create -f enviornment.yml
source activate workshop
```
* If youmissed a specification, edit your `enviornment.yml` and update:
```bash
source deactivate workshop
conda env update -f enviornment.yml
source activate workshop
```


# Step 1: Coding up a Majority Class Classifier

In [126]:
class MajorityClassClassifier(BaseEstimator):
    """
    A majority class classifier
    """
    def __init__(self):
        self.classes = None
        self.majority_class = None
        
    def fit(self, X, y):
        classes, counts = np.unique(y, return_counts=True)
        majority_class_index = np.argmax(counts)
        majority_class = classes[majority_class_index]
        self.classes = classes
        self.majority_class = majority_class
    
    def predict(self, X):
        predictions = np.zeros((len(X), 1), dtype=np.uint8)
        predictions += self.majority_class
        return predictions

In [127]:
# Let's try this one Iris dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=1)
# Build model
model = MajorityClassClassifier()
model.fit(X_train, y_train)
print(model.classes, model.majority_class)
# Test if it works correctly
y_pred = model.predict(X_test)
np.testing.assert_array_equal(np.zeros(y_pred.shape, dtype=np.uint8), y_pred)

[0 1 2] 0


# Step 2: Set up Package Directory Structure
- `PackageName`
    * `package_name`
    * `devtools`
        - `conda-recipe`
        - `travis-ci`
    * `docs`
    * `scripts`
    * `tests`
    * `scratch`
    * `notebooks`
    

```bash
mkdir AwesomeML
mkdir AwesomeML/awesome_ml
mkdir -p AwesomeML/devtools/conda-recipe
mkdir AwesomeML/devtools/travis-ci
mkdir AwesomeML/docs
mkdir AwesomeML/tests
mkdir AwesomeML/scripts
mkdir AwesomeML/scratch
mkdir AwesomeML/notebooks
touch AwesomeML/README.md
```

# Set up Package Directory Structure (contd.)

* Manually setting up a directory structure can be tedious but it's good for staters.
* Once you really get into building packages, try:
[`CookieCutter`](https://cookiecutter.readthedocs.io/en/latest/)
* It create a project directory following a standard template.

# Step 3: Set up Version Control
* Initialize version control and make first commit
* Go to you Github and create a new repository, using the same name
```bash
cd AwesomeML/
git init
git add .
git commit -m "Initial commit"
git remote add origin git@github.com:kamran-haider/AwesomeML.git
git push origin master
```

_Git Quiz: If we commit an empty folder and push it to a repo, it won't show up, what trick can we use to make it appear?_

# Step 4: Set up Testing
* As soon as you start building up your code, start writing unit tests.
* Unit testing will feel tedious to begin with but it is your great friend.  
* Want to learn more about Unit Testing, follow this [PyCon tutorial](https://pyvideo.org/pycon-us-2016/michael-tom-wing-christie-wilson-introduction-to-unit-testing-in-python-with-pytest-pycon-2016.html)
* Strategy:
    - We will write a test script and put it in `AwesomeML/tests`.
    - We will follow some conventions, the test script's name will beging with `test_`
```bash
touch AwesomeML/tests/test_awesome_ml.py
```
    - Each unit tests in this script will also be named like this i.e., `test_`

```python
def test_fit_classes():
    """
    Tests if the MajorityClassClassifier model fits properly
    and generates classes from training data.
    """
    model, _ = load_iris_mdoel()
    np.testing.assert_array_equal(model.classes, np.array([0, 1, 2]))

def test_fit_majority_class():
    """
    Tests if the MajorityClassClassifier model fits properly
    and generates majority class from training data.
    """
    model, _ = load_iris_mdoel()
    np.testing.assert_array_equal(model.majority_class, 2)

def test_predict():
    """
    Tests if the MajorityClassClassifier model correctly predicts
    majority class for Iris dataset.
    """
    _, test_predictions = load_iris_mdoel()
    reference_predictions = np.zeros(test_predictions.shape, dtype=np.uint8) + 2
    np.testing.assert_array_equal(test_predictions, reference_predictions)
```

# Step 4: Set up Testing (contd.)

A couple of things to note:
* See how each tests checks for just one functionality
* Return value of each test is `True/False`
* `numpy.testing` module this allows for array comparisons 
How to run tests as you develope code:
* In package root directory (in `AwesomeML/`), run the following command:
```bash
python -m pytest tests/
```
`pytest` automatically picks up all file starting from `test_` and within those file it executes all functions starting from `test_` 

# Step 5: Configuring Installation
`setup.py`
```python
import setuptools

with open("README.md", "r") as fh:
    long_description = fh.read()

setuptools.setup(
    name="awesome_ml",
    version="0.0.1",
    author="Kamran Haider",
    author_email="kamranhaider.mb@gmail.com",
    description="A Python package to demonstrate packaging",
    long_description=long_description,
    long_description_content_type="text/markdown",
    url="https://github.com/kamran-haider/AwesomeML",
    install_requires=[
        'pytest', 'scikit-learn', 'pandas', 'numpy'],
    packages=setuptools.find_packages()
)
```


# Step 5: Configuring Installation (contd.)

At this point, you are already done!

If someone asks for your code, just tell them to do the following:
```bash
git clone your_package.git
cd your_package
python setup.py install
```
Issues:
* What if you make a change to your package and add new functionality?
* What if you were notified of a bug and you want to release a fixed version?

That's where distribution and releases come in!

# Step 6: Conda package building
* `conda` is an enviornment manager that comes with Anaconda (Anaconda is a python distribution)
* Within `conda` there are tools for python package building, installing, updating removing etc.
* `conda-build` allows building packages that can be managed through conda
* To confuse you guys even more, there is `pip` package management system too.
* To get some clarification on these terms, read this [artcile](https://jakevdp.github.io/blog/2016/08/25/conda-myths-and-misconceptions/)!

# Step 6: Conda package building (contd.)

* To build a conda package, you need to build a conda recipe
* Go to `devtools/conda-recipe` and create the following three files:
```bash
cd AwesomeML/devtools/conda-recipe
touch meta.yaml
touch build.sh
touch build.bat
```

`meta.yaml`
```yaml
package:
  name: awesome_ml
  version: 0.0.1


source:
  path: ../../

requirements:

  run:
    - python
    - numpy
    - pandas
    - sklearn

test:
  commands:
    - python -m pytest tests
  imports:
    - awesome_ml

about:
  home: https://github.com/kamran_haider/AwesomeML
  license: MIT
  summary: Python package to demonstrate packaging
```

`build.sh`
```bash
#!/bin/bash
$PYTHON setup.py clean
$PYTHON setup.py install
```
`build.bat`
```MS-DOS
python setup.py install
::if errorlevel 1 exit 1
```

# Step 6: Conda package building (contd.)

* All these files reside in `devtools/conda-recipe`
* You rarely need to edit these after creating them
* `conda-build` uses configurations from `meta.yaml` and build scripts to create the package.
* Here is the command to build the package (run from package root directory) :
```bash
cd AwesomeML
conda build devtools/conda-recipe
```

# Step 7: Create a distribution channel
* Go to Anaconda.org and sign up for an account.
* An account on anaconda.org can serve as a distribution channel. 
* This would be the home of your packages.
* Once uploaded, users would be able to install your packages by:
```bash
conda install -c https://anaconda.org/your_channel your_package
```
* There are community-led channels such as `conda-forge` that provide more standardized ways of distributing packages.

# Step 8: Upload package to your channel
* The build process produces a lot of output, among this deluge of information, there is a very useful piece:


```bash
# Automatic uploading is disabled
# If you want to upload package(s) to anaconda.org later, type:
anaconda upload /home/kamran/miniconda3/conda-bld/linux-64/awesome_ml-0.0.1-py36_0.tar.bz2
```

Once successfully built, login to your anaconda channel through terminal

```bash
anaconda login
```
* Enter username and password. Once logged in use the command above to upload.
* Go to your channel on anaconda.org and take a look, you now have a package that others can install using a single command.

# What more can we do?
* In our current pipeline, each time we make a change to our package, we have to re-do the  build process manually. 
* This is where continuous integration (CI) comes in. Continuous Integration is a one-stop shop for building, testing and deploying packages, patches, new releases.

* The idea is simple: 
    - You and your team maintain a Github repo of your package, one or more people are administrators.
    - The team does development in dev branches and you reserve master for releases.
    - Once you are ready for release, you merge all changes into master.
    - As your make a commit to master, CI takes charge.
    - On a cloud machine, it builds your package with enumeration of specs (mutiple python versions, multiple OS's), runs testing, only if tests pass, it deploys your package to a anaconda.

# What more can we do?
* What if not all of your users are python programmers?
* What is importing `awesome_ml` is not the only intended usage of your code?
* That's where command-line utilities come in!
* We can create python scripts that act like black-boxes and let your users run those tools directly from the command-line.
* These scripts go into `scripts` directory and we can use a mechanism called `entry_points` to ship command-line tools along with our package.

# Additional Resources

http://the-hitchhikers-guide-to-packaging.readthedocs.io/en/latest/index.html

https://conda.io/docs/user-guide/tutorials/build-pkgs.html

https://packaging.python.org/

https://pythonhosted.org/an_example_pypi_project/setuptools.html