# Lecture 10: Project 3 - Creating a template package
## November 19, 2019
## Tristan Glatard

### Objectives for today
* Create a Python package for our clustering program
* Publish it on PyPi

### First package: a simple script

A Python package is a piece of software that can be installed using `pip`. A minimal Python package has to contain:
* The software to be packaged
* A `setup.py` file with metadata

An example is available at https://github.com/tgteacher/package-template



<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Antu_task-complete.svg/1024px-Antu_task-complete.svg.png" width=50 align=left/>

* Download the example above
* Install the package by running `pip install .` in the directory containing `setup.py`

Packages can also be installed directly from GitHub (pay attention to the 'git+' prefix):

In [16]:
pip install git+https://github.com/tgteacher/package-template

Collecting git+https://github.com/tgteacher/package-template
  Cloning https://github.com/tgteacher/package-template to /tmp/pip-req-build-om8bhn3a
Building wheels for collected packages: MyPackage
  Building wheel for MyPackage (setup.py) ... [?25ldone
[?25h  Stored in directory: /tmp/pip-ephem-wheel-cache-fn3i1fji/wheels/d2/b2/fd/3ea12fd403f7c575ae942633ab24a8128b0ce7c60946fdc037
Successfully built MyPackage
[33mYou are using pip version 19.0.3, however version 19.3.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.


<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Antu_task-complete.svg/1024px-Antu_task-complete.svg.png" width=50 align=left/>

* Starting from this template, write a `setup.py` file to install the clustering program that you developed in project 2
* Make sure that your package can be installed using `pip`
* Push your package to GitHub and make sure it can be installed from there

### Adding software and data dependencies

Before publishing your project to PyPi, you should add some information in the metadata. Your `setup.py` could look like this:
```
from setuptools import setup

setup(
    name="HelloWorldTG",
    version="0.1",
    scripts=['cluster.py'],
    
    # in case your program needs data files
    data_files = [('.', ['data-1.csv'])],
    
    # Project uses matplotlib, so ensure that it gets
    # installed or upgraded on the target machine
    install_requires=['matplotlib', 'pandas'],

    # metadata to display on PyPI
    author="Me",
    author_email="me@example.com",
    description="This is an Example Package",
    keywords="hello world example examples",
    url="http://example.com/HelloWorld/",   # project home page, if any
    project_urls={
        "Bug Tracker": "https://bugs.example.com/HelloWorld/",
        "Documentation": "https://docs.example.com/HelloWorld/",
        "Source Code": "https://code.example.com/HelloWorld/",
    },
    classifiers=[
        'License :: OSI Approved :: Python Software Foundation License'
    ]

    # could also include long_description, download_url, etc.
)
```

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Antu_task-complete.svg/1024px-Antu_task-complete.svg.png" width=50 align=left/>

* Add medatata, software dependencies (matplotlib and pandas) and data to your package
* Make sure that you can still install your package locally (using `pip install .`) and from GitHub

### Pushing to PyPi

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Antu_task-complete.svg/1024px-Antu_task-complete.svg.png" width=50 align=left/>

* Create an account on PyPi:
https://pypi.org/account/register
* Install twine, Python's package uploader: `pip install twine`
* Create a distribution of your package: `python setup.py bdist_wheel`
* Upload your project to PyPi: `twine upload dist/*`
* Look at your project on PyPi!


### Creating a module

Our current package only contains a script that can be executed from the command line. Sometimes, it could useful to run the clustering algorithm from a Python program. To do that, we will have to create a function to wrap our project, and to declare our package as a module.

You may remember that a function is defined as follows:

In [11]:
def my_function(a):
    print("Hello {}!".format(a))

It can then be called as follows:

In [17]:
my_function("Concordia")

Hello Concordia!


<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Antu_task-complete.svg/1024px-Antu_task-complete.svg.png" width=50 align=left/>

* Move your clustering program into a function and test it

To create a module, we have to:
1. Create a sub-directory in our package base directory, named after the package name
2. Create an empty file called `__init__.py` in that directory


<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Antu_task-complete.svg/1024px-Antu_task-complete.svg.png" width=50 align=left/>

* Starting from the example at https://github.com/tgteacher/project-package, create a module for your project.
* Make sure you can import this module and run your code from a Python program, for instance:
    
    

In [None]:
from HelloWorldTG import cluster
cluster.cluster()

### Bonus

The [Shablona](https://github.com/uwescience/shablona) project is a template package for data science projects. In addition to the items we added to our first module, it also contains:

1. Tests
2. Documentation
3. Continuous Integration
4. Licensing

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Antu_task-complete.svg/1024px-Antu_task-complete.svg.png" width=50 align=left/>

1. Inspect the Shablona template and review its main features
2. Create a more complete package for your project using Shablona
