# Code testing and Continuous Integration

We are going to automate testing of our code as part of an example continuous integration development workflow. We'll start by installing pytest, writing or modifying some code to test, and we'll finish by setting up a github actions workflow that will run automatically when we push changes to our repo. 

## Part 0 Installing pytest

To install pytest and pytest-coverage:

In [1]:
conda install pytest pytest-cov

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /opt/anaconda3/envs/DSFP

  added / updated specs:
    - pytest
    - pytest-cov


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2022.6.15          |   py38hecd8cb5_0         154 KB
    coverage-6.3.2             |   py38hca72f7f_0         244 KB
    iniconfig-1.1.1            |     pyhd3eb1b0_0           8 KB
    openssl-1.1.1q             |       hca72f7f_0         2.2 MB
    pluggy-1.0.0               |   py38hecd8cb5_1          29 KB
    py-1.11.0                  |     pyhd3eb1b0_0          76 KB
    pytest-7.1.2               |   py38hecd8cb5_0         444 KB
    pytest-cov-3.0.0           |     pyhd3eb1b0_0          22 KB
    toml-0.10.2                |     pyhd3eb1b0_0          20 KB
    tomli-2.0.1                |   py38hecd8cb5_0          

## Part 1 Returning to the SDSS Clustering Example

### 1a) Computing statistics of cluster center separation

Report the minimum, maximum, and average separation between the centers of the clusters you identified in the introduction to software repositories example. Cluster centers/cores are stored in the "core_sample_indices_" attribute of most sklearn clustering objects. 

You will want this to be done in a modular fashion. First compute the separation distance of the cluster centers. Then write separate functions that return the minimum, maximum, and average.

In [15]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial.distance import cdist

from sklearn.cluster import DBSCAN

X = np.load('/Users/kylerocha/Downloads/SDSS_Great_Wall_data.npy')


def distance(X):
    clustering = DBSCAN(eps=7, min_samples=10)
    
    preds = clustering.fit_predict(X)
    
#     plt.scatter( X[:,0], X[:,1], s=0.5, c=preds)
#     plt.colorbar()
#     plt.show()
    
    predicted_cluster_ind = preds[ clustering.core_sample_indices_ ]
    positions = X[clustering.core_sample_indices_]
    
#     plt.scatter( positions[:,0], positions[:,1], s=1, c=predicted_cluster_ind )
#     plt.show()
    
    centers = []
    for cls in np.unique(predicted_cluster_ind):
        locs = np.where(preds == cls)[0]
        cluster_center = np.mean( X[locs], axis=0 )
        
        centers.append( cluster_center )
        
    centers = np.array(centers)
    
    distances_between_centers = cdist(centers, centers)
    return distances_between_centers

out = distance(X)

out[4,9]

229.61700842607655

In [14]:
def average(distances):
    # for every cluster,
    max_dist = np.max(distances)
    min_dist = np.min(distances[distances>0])
    
    mean_dist = np.mean(distances[distances>0])
    
    return  min_dist, max_dist, mean_dist


average( distance(X) )

(12.419506484558392, 490.0921021378237, 194.32842541526801)

### 1b) Writing a unit test for cluster center separation

A good unit test: 

* Fast
* Standalone
* Repeatable (deterministic?) 
* Timely (your test shouldn't take longer than the code to write) 

For each function you wrote in 2a), write a test function. 

In [21]:
def test_distance():
    # fill this in
    X = np.load('/Users/kylerocha/Downloads/SDSS_Great_Wall_data.npy')

    dis = distance(X)
    
    assert dis[4,9] == 229.61700842607655
    
test_distance()

In [22]:
def test_average():
    X = np.load('/Users/kylerocha/Downloads/SDSS_Great_Wall_data.npy')
    avg = average( distance(X) )
    assert avg[-1] == 194.32842541526801
    
test_average()

## Part 2  Running unit tests in pytest

### 2a) Structuring the test file

Unfortunately, github actions and pytest require us to convert our jupyter notebooks to python scripts before running CI tests. There are tools to automate this for us, but for now, let's do this by hand. 

pytest expects your code to be organized according to the following convention - you should have a `file_name.py` and a `test_file_name.py`. Create each. In `file_name.py`, copy the functions for computing cluster distances and statistics. Then, in `test_file_name.py`, copy your unit tests. Be sure to import the methods from `file_name.py` into the test file script. 

### 2b) Running the unit test and checking coverage

Now to run the unit test - just type `pytest` at the command line of your conda virtual environment. 

To check the coverage (how well your tests cover your code) type `pytest --cov`

If your tests do not achieve full coverage of your code, modify your tests accordingly.

If your tests achieve complete coverage and your code passes your tests, move to part 3 below. 

### 2c) Bug fixes

If your code fails any of your tests, fix your code now and repeat until your code passes your tests. 

## Part 3 Automating Unit Tests with Github Actions

### Part 3a) Initial Github Actions Workflow template

You should find a partially complete github actions workflow template as a .yml file. Github helpfully provides many template workflows for different languages and use cases, so most of the time, you'll just need to fill in the details of an existing workflow. 


### Part 3b) Fill in when you want the tests to run so that the code runs on a push or pull_request to your working branches and main

### Part 3c) make sure that dependencies are properly installed on the virtual machine ("runner") that will execute your tests. Up to this point, you should have very minimal dependencies, but for yesterday's SDSS clustering project, you may have more complicated ones. If you have a requirements.txt file in your github directory, you can install dependencies with `pip install -r requirements.txt` - there are many ways to produce a requirements file, but I might start by trying `pip freeze > requirements.txt` within your virtual environment.

### Part 3d) Now push to your main branch and check for errors. Fix any that occur. 

## Part 4 Complete Workflow

In this part, work with your partner to adapt yesterday's example clustering problem to the full git collaborative and test-driven workflow. Open Issues for features you want to include, design tests for those features, implement them, make simultaneous changes to the clustering implementation, make push/pull requests, and automate unit testing of your code. Alternatively, get an early start on a problem you might work on in the hackathon by creating a repository, writing some code, and executing unit tests.