# Lecture 4b - Packaging

## Background

### What

A software package is a collection of software related components, such as source code, data, documentation, and so on, stored on a file system. The collection is organised in a specific such that the type of component,and how the components relate to each other, can be understood.

A package also has metadata which describes the package in terms of its functionality and curation. This data is often human readable, but is usually also designed to be processed by other software. This enables tools to be designed and used for managing common and often complicated workflows. 

When a packaging structure and its metadata are combined with an associated tool set the result is usually referred to as a software packaging system.     

### Why

Grouping logically connected components in a manner that reflects how they are related to each other facilitates their distribution, installation, **re-use** and maintenance. It also facilitates the use of standardised tools to manage a wide variety of (often complicated) work flows e.g.

- **re-running** software with multiple dependencies across a range heterogeneous environments
- **re-using** software components in other software
- **running** tests
- **reproducing** studies
- debugging

### How

Software packages are collections of files. As such, they can be created and managed using standard software (terminals, shell script, editors, file managers etc). However, popular package formats are often integrated as extensions into tools such as version management systems and integrated development environments (e.g. RStudio, VScode etc).

### When

Best practice suggests that some form of packaging should be adopted at the outset of any software related project when

- their are multiple users / stakeholders
- version control is required
- unit tests are used
- software is to be distributed
- software is to be **re-used** 

## The Basics - Python

The following diagram shows the organisation of the files in a small example python package.

<figure>
<img src="./choH-python-tree.png" style="width:30%">
<figcaption align = "center"> A simple python package</figcaption>
</figure>

As an exercise, we are going to create and use this package from scratch. The function **choH** is defined in a notebook which also shows how it is used. The notebook is provided in the Appendix under the heading **Appendix 1 - choH (python version)**

### Source code

The methods that the package provides need to be available in files (not a notebook), so the first task os to create the **choH.py** file with the **choH** function in it. Note that the code usually imports other modules that it is dependent on.

#### Exercise

Creat a python script with the the **choH** function in it.

### Modules

A python package has one or more **modules**. Each module is associated with a directory the name of which defines the name of the module. Primarily, modules contain code, but they can also have data and documentation, and other modules (sub-module ?).

#### **\_\_init\_\_.py**

Each module should have an **\_\_init\_\_.py** file - even if it is emtpy (many python systems rely on the presence of this file to determine that the files in the associated directory constitute a module). The **\_\_init\_\_.py** can contain lots of information, including (but not limited to), documentation, references, module level global names, and so on. Importantly, it is used to load the python files in this (and/or other) modules. The mechanism is the same as is used for importing modules into a python script

#### Exercise

- Create a directory (folder) for the **choH** module.
- Add the **choH.py** file to the module.
- Create an **\_\_init\_\_.py** for this module that imports the **choH.py** file.

### Project file

There are a number of ways of specifying the package metadata for a python package. One such way is via a pyproject.toml file. See here for more information on [toml files](https://toml.io/en/) and here for domain specific information about the  [pyproject.toml file](https://packaging.python.org/en/latest/guides/writing-pyproject-toml/). 

Here is a basic template for the **pyproject.toml** file.

```toml
[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"

[project]
name = "<project-name>"
version = "<version>"
maintainers = [
  { name="Daniel Grose", email="dan.grose@lancaster.ac.uk" },
]
authors = [
  { name="Daniel Grose", email="dan.grose@lancaster.ac.uk"},
]
description = "<description>"
readme = "README.md"
requires-python = "<version>"
classifiers = [
    "Programming Language :: Python :: 3",
    "License :: OSI Approved :: GNU General Public License v3 (GPLv3)",
    "Operating System :: OS Independent",
]
dependencies = [
    "<dependency-1>",
    "<dependency-2>"
]

[project.urls]
homepage = "<website URL>"


There are lots of good tools to help you generate and maintain a **pyproject.toml** file. Details for some of these can be found [here](https://realpython.com/python-pyproject-toml/#using-tools-that-leverage-the-pyprojecttoml-file). 

#### Exercise

Create a **pyproject.toml** file for the **choH** package

### Installation

There a multiple tools to help you install and use python packages, for example **pip** and **conda**. 

#### Install a local package using pip

Assume that the file structure shown in the figure above is located in **path_to_package**, then this will install the package

```bash
python -m pip install <path_to_package>
```

#### Exercise

Determine the **path_to_package** for your **choH** package and install it use **pip**.

### Install a package from github

**pip** has the facility to install from a wide range of different sources. One such source is github. Imagine you have set up the following repository using your **choH** package.

<figure>
<img src="./choH-python-github.png" style="width:60%">
<figcaption align = "center"> A simple python package on github</figcaption>
</figure>

The package can be installed from this github repository using

```bash
python -m pip install 'git+https://github.com/grosed/choH'
```

### Removing a package using pip

Once a package has been installed it cab be removed using pip

```bash
python -m pip uninstall <package-name>
```

### Using the package

Once the package has been installed with **pip** it can be used in a python script by importing it.

```python
import choH
import numpy
import matplotlib.pyplot as plt

numpy.random.seed(0)
X = [float(x) for x in list(numpy.random.normal(0,1,1000)) + list(numpy.random.normal(0.3,1,1000))]
plt.plot(X)
plt.plot(choH.choH(X))
```

#### Exercise
- Try running the above script on your own system
- Why does the above example use **choH.choH** ?

#### Exercise

- Add a README.md file to your **choH** repository 
- Can you add the above example to your repository ? 

## The Basics - R

The following diagram shows the organisation of the files in a small example R package.

<figure>
<img src="./choH-R-tree.png" style="width:30%">
<figcaption align = "center"> A simple R package</figcaption>
</figure>

As an exercise, we are going to create and use this package from scratch. The function **choH** is defined in a notebook which also shows how it is used. The notebook is provided in the Appendix under the heading **Appendix 2 - choH (R version)**

### Source code

The methods that the package provides need to be available in files (not a notebook), so the first task os to create the **choH.R** file with the **choH** function in it. Note that the code does not use he **library** function to load any of its dependencies.

#### Exercise



Create a R script with the the **choH** function in it.

### Exercise

The source code for the package resides in a directory named **R**. Create this directory and add your **choH.R** file to it.

### NAMESPACE

The **NAMESPACE** file imports any dependencies that your **R** code requires and exports the **R** functions you want expose from your package. Here is a template for a basic **NAMESPACE** file.

```R
import(<package-name>)
import(<method-name>)
import(<data-name>)

export(<method-name>)
export(<data-name>)
```

The file can **import** whole libraries, methods, methods form other libraries, data, and even documentation from other libraries. There can be multiple **import** directives. 

The file can **export** methods and data. There can be multiple **export** directives.


#### Exercise

Create a **NAMESPACE** file for your **choH** package.

### DESCRIPTION

The DESCRIPTION file describes the package. This description includes details about authors, the purpose of the package, its dependencies. It is human readable but can be processed by various tools available in most common R environments. 

Here is a template for the **DESCRIPTION** file. 

```R
Package: <package-name>
Type: Package
Title: <top-level-description-of-package>
Version: <version-number>
Date: <yyyy-mm-dd>
Authors@R: c(person("Daniel","Grose",email="dan.grose@lancaster.ac.uk",role=c("aut","cre")))
Description: <more-detailed-description> 
License: GPL
Imports: <imported-packages>
LinkingTo:
Depends: R (>= 3.5.0)
NeedsCompilation: no
Suggests:
```

#### Exercise

Create a **DESCRIPTION** file for your **choH** package.

### Installing

There a multiple tools to help you install and use R packages. The most fundamental tool is **R** itself. 

#### Install a local package using R

Assume that the file structure shown in the figure above is located in **path_to_package**, then this will install the package

```bash
R CMD INSTALL <path_to_package>
```

#### Exercise

Determine the **path_to_package** for your **choH** package and install it use **R**.

#### Install a package from github using R

There are various libraries available for extending base R functionality so that you can install libraries directly from various sources, including github.

Imagine you have set up the following repository using your **choH** package.

<figure>
<img src="./choH-R-github.png" style="width:60%">
<figcaption align = "center"> A simple R package on github</figcaption>
</figure>

 First, you might need to install **devtools**

```R
install.packages( "devtools" )
```

#### Exercise

Check that **devtools** is installed in your current R environment. If not, install it.

The package can now be installed from github using the **R** REPL.

```R
library(devtools)
install_github( "grosed/choH" )
```

### Using the package

Once the library is installed it can be imported and used in other R scripts using the **library** function. All of the methods you **exported** using the **NAMESPACE** file will become available for use.

```R
library(choH)
library(purrr)

X <- c(rnorm(1000,0,1),rnorm(1000,0.3,1)) 
X %>% choH %>% as.numeric %>% plot
```

#### Exercise

- Install your **choH** package from github and test the above example.
- The **choH** package imports **purrr**. Why do you think it is necessary to import in the local R script ?