Code quality control
====================

Today we look at some ways we automate quality control in our packages. This includes:

1. Standardizing the format of your code
2. Standardizing compliance with [PEP8](https://peps.python.org/pep-0008/) style
3. Measuring coverage of the package
4. Ensuring these are satisfied before you commit your code to git.



In [60]:
%%bash
# Clean up any existing files
rm -fr package black-example.py flake-example.py references.bib
pip uninstall -y s23bib

Found existing installation: s23bib 0.0.1
Uninstalling s23bib-0.0.1:
  Successfully uninstalled s23bib-0.0.1


# Package setup for today

We start with a small package that just sorts a bibtex file by year. This package is missing an explicit license and readme file.



In [61]:
%%bash
mkdir -p package/s23bib
cd package
git init

hint: Using 'master' as the name for the initial branch. This default branch name
hint: is subject to change. To configure the initial branch name to use in all
hint: 
hint: 	git config --global init.defaultBranch <name>
hint: 
hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
hint: 'development'. The just-created branch can be renamed via this command:
hint: 
hint: 	git branch -m <name>


Initialized empty Git repository in /home/jovyan/work/06-code-quality/package/.git/


In [62]:
%%writefile package/setup.py
from setuptools import setup

setup(name='s23bib',
      version='0.0.1',
      description='bibtex utilities',
      maintainer='John Kitchin',
      maintainer_email='jkitchin@andrew.cmu.edu',
      license='MIT',
      packages=['s23bib'],
      scripts=[],
      long_description='''A set of bibtex utilities''')

Writing package/setup.py


In [63]:
%%writefile package/s23bib/utils.py
import bibtexparser

def sort_bibtex(bibfile, ascending=True, inplace=False):
    with open(bibfile) as bf:
        bd = bibtexparser.load(bf)
    entries = bd.entries
    entries.sort(key=lambda entry: int(entry['year']), reverse=not ascending)
    
    if inplace:
        db = bibtexparser.bibdatabase.BibDatabase
        db.entries = entries
        db.comments = []
        db.strings={}
        db.preambles=[]
        writer = bibtexparser.bwriter.BibTexWriter()
        with open(bibfile, 'w') as bibfile:
            bibfile.write(writer.write(db))
        
    else:
        return entries

Writing package/s23bib/utils.py


In [64]:
%%writefile package/s23bib/__init__.py
from .utils import sort_bibtex

Writing package/s23bib/__init__.py


In [65]:
%%writefile package/s23bib/test_sort.py
import os
import pytest
import bibtexparser
from s23bib import sort_bibtex

bs = '''
@article{kitchin-2018-machin-learn-catal,
  author =	 {John R. Kitchin},
  title =	 {Machine Learning in Catalysis},
  journal =	 {Nature Catalysis},
  volume =	 1,
  number =	 4,
  pages =	 {230-232},
  year =	 2018,
  doi =		 {10.1038/s41929-018-0056-y},
  url =		 {https://doi.org/10.1038/s41929-018-0056-y},
  DATE_ADDED =	 {Sun Mar 3 16:40:42 2019},
}
@article{kitchin-2015-examp-effec,
  author =	 {John R. Kitchin},
  title =	 {Examples of Effective Data Sharing in Scientific Publishing},
  journal =	 {ACS Catalysis},
  volume =	 5,
  number =	 6,
  pages =	 {3894-3899},
  year =	 2015,
  doi =		 {10.1021/acscatal.5b00538},
  url =		 {https://doi.org/10.1021/acscatal.5b00538},
  DATE_ADDED =	 {Fri Jan 18 09:54:51 2019},
}'''

@pytest.fixture()
def setup():
    with open('test.bib', 'w') as f:
        f.write(bs)
    yield "setup"
    os.unlink('test.bib')
    
class TestSort:
    def test_sort(self, setup):
        entries = sort_bibtex('test.bib')
        assert [e['year'] for e in entries] == ['2015', '2018'] 

Writing package/s23bib/test_sort.py


In [66]:
%%writefile references.bib
@article{kitchin-2018-machin-learn-catal,
  author =	 {John R. Kitchin},
  title =	 {Machine Learning in Catalysis},
  journal =	 {Nature Catalysis},
  volume =	 1,
  number =	 4,
  pages =	 {230-232},
  year =	 2018,
  doi =		 {10.1038/s41929-018-0056-y},
  url =		 {https://doi.org/10.1038/s41929-018-0056-y},
  DATE_ADDED =	 {Sun Mar 3 16:40:42 2019},
}
@article{kitchin-2015-examp-effec,
  author =	 {John R. Kitchin},
  title =	 {Examples of Effective Data Sharing in Scientific Publishing},
  journal =	 {ACS Catalysis},
  volume =	 5,
  number =	 6,
  pages =	 {3894-3899},
  year =	 2015,
  doi =		 {10.1021/acscatal.5b00538},
  url =		 {https://doi.org/10.1021/acscatal.5b00538},
  DATE_ADDED =	 {Fri Jan 18 09:54:51 2019},
}



Writing references.bib


## Install and test basic functionalities



In [67]:
!pwd

/home/jovyan/work/06-code-quality


In [68]:
! pip install ./package

Processing ./package
  Preparing metadata (setup.py) ... [?25ldone
[?25hBuilding wheels for collected packages: s23bib
  Building wheel for s23bib (setup.py) ... [?25ldone
[?25h  Created wheel for s23bib: filename=s23bib-0.0.1-py3-none-any.whl size=2348 sha256=02b27bd9adb5dfc7ed80db5e82612bf50bc8ebca3bdc4b03b48bce347fa104a0
  Stored in directory: /tmp/pip-ephem-wheel-cache-f4k0866z/wheels/69/ce/1c/6603d355c8a3061abccae8e7965eeb2565a6390873b5ad554b
Successfully built s23bib
Installing collected packages: s23bib
Successfully installed s23bib-0.0.1


In [69]:
! pip install bibtexparser



In [70]:
from s23bib import sort_bibtex
sort_bibtex('references.bib', ascending=True)

[{'date_added': 'Fri Jan 18 09:54:51 2019',
  'url': 'https://doi.org/10.1021/acscatal.5b00538',
  'doi': '10.1021/acscatal.5b00538',
  'year': '2015',
  'pages': '3894-3899',
  'number': '6',
  'volume': '5',
  'journal': 'ACS Catalysis',
  'title': 'Examples of Effective Data Sharing in Scientific Publishing',
  'author': 'John R. Kitchin',
  'ENTRYTYPE': 'article',
  'ID': 'kitchin-2015-examp-effec'},
 {'date_added': 'Sun Mar 3 16:40:42 2019',
  'url': 'https://doi.org/10.1038/s41929-018-0056-y',
  'doi': '10.1038/s41929-018-0056-y',
  'year': '2018',
  'pages': '230-232',
  'number': '4',
  'volume': '1',
  'journal': 'Nature Catalysis',
  'title': 'Machine Learning in Catalysis',
  'author': 'John R. Kitchin',
  'ENTRYTYPE': 'article',
  'ID': 'kitchin-2018-machin-learn-catal'}]

In [71]:
sort_bibtex('references.bib', inplace=True)

In [72]:
! cat references.bib

@article{kitchin-2015-examp-effec,
 author = {John R. Kitchin},
 date_added = {Fri Jan 18 09:54:51 2019},
 doi = {10.1021/acscatal.5b00538},
 journal = {ACS Catalysis},
 number = {6},
 pages = {3894-3899},
 title = {Examples of Effective Data Sharing in Scientific Publishing},
 url = {https://doi.org/10.1021/acscatal.5b00538},
 volume = {5},
 year = {2015}
}

@article{kitchin-2018-machin-learn-catal,
 author = {John R. Kitchin},
 date_added = {Sun Mar 3 16:40:42 2019},
 doi = {10.1038/s41929-018-0056-y},
 journal = {Nature Catalysis},
 number = {4},
 pages = {230-232},
 title = {Machine Learning in Catalysis},
 url = {https://doi.org/10.1038/s41929-018-0056-y},
 volume = {1},
 year = {2018}
}


You can see the file has been sorted by year.



We can also run tests.

In [73]:
! pytest

platform linux -- Python 3.9.13, pytest-7.3.2, pluggy-1.0.0
rootdir: /home/jovyan/work/06-code-quality
plugins: anyio-3.6.1, nbmake-1.4.1
collected 1 item                                                               [0m

package/s23bib/test_sort.py [32m.[0m[32m                                            [100%][0m



# Code formatting

When people use different formatting styles it makes it more difficult to work as a team:

1. The code looks different to different people
2. People waste time changing the format
3. git diffs contain unimportant information

A way to manage this is to use an automatic formatter. One such tools is [black](https://github.com/psf/black). It is strongly opinionated on style. 



In [74]:
%%writefile black-example.py
a=4
#how about this for loop
for i in range(a):
    b  =3
    print( a,b)#comment two

b = [1,2,3,4,5,6,7,8,9,0]

def f(x):
    'docstring'
    return ([1,2,3,4,
             5,6,7,8,9,0])

Writing black-example.py


We can see what kinds of changes will be made first.



In [75]:
%%bash
black black-example.py --diff

--- black-example.py	2023-06-14 15:49:40.638298 +0000
+++ black-example.py	2023-06-14 15:49:42.525711 +0000
@@ -1,12 +1,12 @@
-a=4
-#how about this for loop
+a = 4
+# how about this for loop
 for i in range(a):
-    b  =3
-    print( a,b)#comment two
+    b = 3
+    print(a, b)  # comment two
 
-b = [1,2,3,4,5,6,7,8,9,0]
+b = [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
+
 
 def f(x):
-    'docstring'
-    return ([1,2,3,4,
-             5,6,7,8,9,0])
+    "docstring"
+    return [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]


would reformat black-example.py

All done! ✨ 🍰 ✨
1 file would be reformatted.


To actually make the changes, we run this command. This modifies the file.



In [76]:
%%bash
black black-example.py

reformatted black-example.py

All done! ✨ 🍰 ✨
1 file reformatted.


You can open the [file](./black-example.py) or see it in the notebook.

In [77]:
!cat black-example.py

a = 4
# how about this for loop
for i in range(a):
    b = 3
    print(a, b)  # comment two

b = [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]


def f(x):
    "docstring"
    return [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]


It is possible to fine-tune what black does (see https://black.readthedocs.io/en/stable/usage_and_configuration/the_basics.html#configuration-via-a-file). We will not consider this here.

You can run black on all the files in a directory with

    black package




In [78]:
! black package

[1mreformatted /home/jovyan/work/06-code-quality/package/setup.py[0m
[1mreformatted /home/jovyan/work/06-code-quality/package/s23bib/test_sort.py[0m
[1mreformatted /home/jovyan/work/06-code-quality/package/s23bib/utils.py[0m

[1mAll done! ✨ 🍰 ✨[0m
[34m[1m3 files [0m[1mreformatted[0m, [34m1 file [0mleft unchanged.


There are also options to only check the files, output diffs, etc. There is even [black-nb](https://pypi.org/project/black-nb/) for notebooks.



In [79]:
! black -h

Usage: black [OPTIONS] SRC ...

  The uncompromising code formatter.

Options:
  -c, --code TEXT                 Format the code passed in as a string.
  -l, --line-length INTEGER       How many characters per line to allow.
                                  [default: 88]
  -t, --target-version [py33|py34|py35|py36|py37|py38|py39|py310|py311]
                                  Python versions that should be supported by
                                  Black's output. By default, Black will try
                                  to infer this from the project metadata in
                                  pyproject.toml. If this does not yield
                                  conclusive results, Black will use per-file
                                  auto-detection.
  --pyi                           Format all input files like typing stubs
                                  regardless of file extension (useful when
                                  piping source on standard input).
  -

# Code style
Style is also important. Python is a little over 30 years old now. Over the last three decades many things have been learned about effective coding styles which are described in the [PEP8](https://peps.python.org/pep-0008/) Style guide. These guidelines are even coded into a package that can analyze your code and alert you to problems: https://flake8.pycqa.org/en/latest/ and https://pylint.pycqa.org/en/latest/. 

These packages are complementary and do slightly different things. 

## Let's start with flake8.



In [80]:
%%writefile flake-example.py
a=4
#how about this for loop
for i in range(a):
    b  =3
    print( a,b)#comment two

Writing flake-example.py


In [81]:
! flake8 flake-example.py

[1mflake-example.py[m[36m:[m1[36m:[m1[36m:[m [1m[31mD100[m Missing docstring in public module
[1mflake-example.py[m[36m:[m1[36m:[m2[36m:[m [1m[31mE225[m missing whitespace around operator
[1mflake-example.py[m[36m:[m2[36m:[m1[36m:[m [1m[31mE265[m block comment should start with '# '
[1mflake-example.py[m[36m:[m4[36m:[m6[36m:[m [1m[31mE221[m multiple spaces before operator
[1mflake-example.py[m[36m:[m4[36m:[m9[36m:[m [1m[31mE225[m missing whitespace around operator
[1mflake-example.py[m[36m:[m5[36m:[m11[36m:[m [1m[31mE201[m whitespace after '('
[1mflake-example.py[m[36m:[m5[36m:[m13[36m:[m [1m[31mE231[m missing whitespace after ','
[1mflake-example.py[m[36m:[m5[36m:[m16[36m:[m [1m[31mE261[m at least two spaces before inline comment
[1mflake-example.py[m[36m:[m5[36m:[m16[36m:[m [1m[31mE262[m inline comment should start with '# '


The output tells you all the places you need to fix the code. This has to be done manually. 

You can check that all your functions are documented. flake8 is extendable, and you just install a new package called [flake8-docstrings](https://gitlab.com/pycqa/flake8-docstrings). You can then specify a docstring style as an argument.

Here we choose the numpy docstring format, and run it on our package.



In [25]:
! pip install flake8-docstrings

Collecting flake8-docstrings
  Downloading flake8_docstrings-1.7.0-py2.py3-none-any.whl (5.0 kB)
Installing collected packages: flake8-docstrings
Successfully installed flake8-docstrings-1.7.0


In [82]:
! flake8 --docstring-convention numpy package

[1mpackage/build/lib/s23bib/__init__.py[m[36m:[m1[36m:[m1[36m:[m [1m[31mD104[m Missing docstring in public package
[1mpackage/build/lib/s23bib/__init__.py[m[36m:[m1[36m:[m1[36m:[m [1m[31mF401[m '.utils.sort_bibtex' imported but unused
[1mpackage/build/lib/s23bib/test_sort.py[m[36m:[m1[36m:[m1[36m:[m [1m[31mD100[m Missing docstring in public module
[1mpackage/build/lib/s23bib/test_sort.py[m[36m:[m3[36m:[m1[36m:[m [1m[31mF401[m 'bibtexparser' imported but unused
[1mpackage/build/lib/s23bib/test_sort.py[m[36m:[m32[36m:[m1[36m:[m [1m[31mE302[m expected 2 blank lines, found 1
[1mpackage/build/lib/s23bib/test_sort.py[m[36m:[m33[36m:[m1[36m:[m [1m[31mD103[m Missing docstring in public function
[1mpackage/build/lib/s23bib/test_sort.py[m[36m:[m38[36m:[m1[36m:[m [1m[31mW293[m blank line contains whitespace
[1mpackage/build/lib/s23bib/test_sort.py[m[36m:[m39[36m:[m1[36m:[m [1m[31mD101[m Missing docstring in pu

You can exclude directories, e.g. package/build.



In [83]:
! flake8 --extend-ignore F401 --exclude package/build,package/s23bib/.ipynb_checkpoints  --docstring-convention numpy package

[1mpackage/s23bib/__init__.py[m[36m:[m1[36m:[m1[36m:[m [1m[31mD104[m Missing docstring in public package
[1mpackage/s23bib/test_sort.py[m[36m:[m1[36m:[m1[36m:[m [1m[31mD100[m Missing docstring in public module
[1mpackage/s23bib/test_sort.py[m[36m:[m34[36m:[m1[36m:[m [1m[31mD103[m Missing docstring in public function
[1mpackage/s23bib/test_sort.py[m[36m:[m41[36m:[m1[36m:[m [1m[31mD101[m Missing docstring in public class
[1mpackage/s23bib/test_sort.py[m[36m:[m42[36m:[m1[36m:[m [1m[31mD102[m Missing docstring in public method
[1mpackage/s23bib/utils.py[m[36m:[m1[36m:[m1[36m:[m [1m[31mD100[m Missing docstring in public module
[1mpackage/s23bib/utils.py[m[36m:[m4[36m:[m1[36m:[m [1m[31mD103[m Missing docstring in public function
[1mpackage/setup.py[m[36m:[m1[36m:[m1[36m:[m [1m[31mD100[m Missing docstring in public module


**Exercise** Take some time now to fix these issues. Run the cells above until they come out clean.



In [90]:
%run flake8-fix-1.ipynb
! flake8 --extend-ignore F401 --exclude package/build,package/s23bib/.ipynb_checkpoints  --docstring-convention numpy package

Overwriting package/s23bib/__init__.py
Overwriting package/s23bib/utils.py
Overwriting package/s23bib/test_sort.py
Overwriting package/setup.py


## pylint

A *linter* is used to check your code for a wide range of possible problems.


[pylint](https://pylint.pycqa.org/en/latest/) is an alternative to flake8 and often provides complementary information. It is also a tool for checking for errors, coding standards, etc. 




In [91]:
!pylint --ignore build,.ipynb_checkpoints package

************* Module s23bib.test_sort
package/s23bib/test_sort.py:7:0: C0103: Constant name "bs" doesn't conform to UPPER_CASE naming style (invalid-name)
package/s23bib/test_sort.py:37:9: W1514: Using open without explicitly specifying an encoding (unspecified-encoding)
package/s23bib/test_sort.py:37:34: C0103: Variable name "f" doesn't conform to snake_case naming style (invalid-name)
package/s23bib/test_sort.py:46:24: W0621: Redefining name 'setup' from outer scope (line 35) (redefined-outer-name)
package/s23bib/test_sort.py:46:24: W0613: Unused argument 'setup' (unused-argument)
package/s23bib/test_sort.py:43:0: R0903: Too few public methods (1/2) (too-few-public-methods)
package/s23bib/test_sort.py:4:0: W0611: Unused import bibtexparser (unused-import)
************* Module s23bib.utils
package/s23bib/utils.py:13:9: W1514: Using open without explicitly specifying an encoding (unspecified-encoding)
package/s23bib/utils.py:13:26: C0103: Variable name "bf" doesn't conform to snake_cas

**Exercise** If there are residual issues, fix them so that the package is clean.



# Testing

Our package only has one test right now. We can run it with `pytest`.



In [92]:
! pytest package

platform linux -- Python 3.9.13, pytest-7.3.2, pluggy-1.0.0
rootdir: /home/jovyan/work/06-code-quality/package
plugins: anyio-3.6.1, nbmake-1.4.1
collected 1 item                                                               [0m

package/s23bib/test_sort.py [32m.[0m[32m                                            [100%][0m



# Coverage

Your tests should *cover* as much of your code as possible. This can actually be measured using the [coverage](https://coverage.readthedocs.io/en/7.2.3/) package. There are two steps: running and reporting



In [31]:
! pip install coverage

Collecting coverage
  Downloading coverage-7.2.7-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (228 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m228.3/228.3 kB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: coverage
Successfully installed coverage-7.2.7


In [32]:
! coverage run -m pytest package/s23bib

platform linux -- Python 3.9.13, pytest-7.3.2, pluggy-1.0.0
rootdir: /home/jovyan/work/06-code-quality/package
plugins: anyio-3.6.1, nbmake-1.4.1
collected 1 item                                                               [0m

package/s23bib/test_sort.py [32m.[0m[32m                                            [100%][0m



In [33]:
!coverage report --show-missing package/s23bib/*.py

Name                          Stmts   Miss  Cover   Missing
-----------------------------------------------------------
package/s23bib/__init__.py        1      0   100%
package/s23bib/test_sort.py      15      0   100%
package/s23bib/utils.py          16      8    50%   11-18
-----------------------------------------------------------
TOTAL                            32      8    75%


Our test does not cover all of the functionality in our function; it skips the inplace argument branch. It is not necessary to achieve 100% coverage. This is a tool to help you find areas of your code that is under-tested. That doesn't mean there is not a bug in there, but it does mean you have not tested it.



# Automating these

You may find chapter 9 (https://merely-useful.tech/py-rse/automate.html) helpful. It covers make in more depth than I do there.

It is a little tedious to run these each time. There are a few ways you could solve this. One is to simply create a file as a shell command that chains all the commands together:
    
    #!/bin/bash
    black package && flake8 --exclude package/build package && pylint --ignore build package && pytest package

Then you can run one command that will run these, and stop if any single command does not succeed. Put this into a file called run.sh, make it executable, and try it out. Here we use && to only run subsequent commands if the previous command succeeded.




In [34]:
! black package && flake8 --exclude package/build package && pylint --ignore build package && pytest package

[1mAll done! ✨ 🍰 ✨[0m
[34m4 files [0mleft unchanged.
[1mpackage/s23bib/__init__.py[m[36m:[m1[36m:[m1[36m:[m [1m[31mD104[m Missing docstring in public package
[1mpackage/s23bib/__init__.py[m[36m:[m1[36m:[m1[36m:[m [1m[31mF401[m '.utils.sort_bibtex' imported but unused
[1mpackage/s23bib/test_sort.py[m[36m:[m1[36m:[m1[36m:[m [1m[31mD100[m Missing docstring in public module
[1mpackage/s23bib/test_sort.py[m[36m:[m3[36m:[m1[36m:[m [1m[31mF401[m 'bibtexparser' imported but unused
[1mpackage/s23bib/test_sort.py[m[36m:[m34[36m:[m1[36m:[m [1m[31mD103[m Missing docstring in public function
[1mpackage/s23bib/test_sort.py[m[36m:[m41[36m:[m1[36m:[m [1m[31mD101[m Missing docstring in public class
[1mpackage/s23bib/test_sort.py[m[36m:[m42[36m:[m1[36m:[m [1m[31mD102[m Missing docstring in public method
[1mpackage/s23bib/utils.py[m[36m:[m1[36m:[m1[36m:[m [1m[31mD100[m Missing docstring in public module
[1mpackag

An alternative is to create a [makefile](https://www.gnu.org/software/make/manual/make.html). Make is a GNU program that allows you to create rules that run commands by name. The syntax in a make file is sensitive. The body /must/ be indented with tabs, and not spaces. These commands are run from the same directory as the makefile, so the paths are set accordingly.

Each section starts with a target name, then a list of commands that are indented by a tab. It must be tabs or you will get an error. You don't get a tab in jupyter lab when you press tab... you get 4 spaces. I had to copy the tab from somewhere else...

The all target lists some dependencies by target name. Each of these will be run when you run the all target.



In [35]:
%%writefile package/makefile
black: 
	black .

flake8:
	flake8 --exclude build .
    
pylint:
	pylint --ignore build .
    
test:
	pytest .

all: black flake8 pylint

Writing package/makefile


In [36]:
%%bash
cd package
make all

black .


All done! ✨ 🍰 ✨
4 files left unchanged.


flake8 --exclude build .
./s23bib/__init__.py:1:1: D104 Missing docstring in public package
./s23bib/__init__.py:1:1: F401 '.utils.sort_bibtex' imported but unused
./s23bib/test_sort.py:1:1: D100 Missing docstring in public module
./s23bib/test_sort.py:3:1: F401 'bibtexparser' imported but unused
./s23bib/test_sort.py:34:1: D103 Missing docstring in public function
./s23bib/test_sort.py:41:1: D101 Missing docstring in public class
./s23bib/test_sort.py:42:1: D102 Missing docstring in public method
./s23bib/utils.py:1:1: D100 Missing docstring in public module
./s23bib/utils.py:4:1: D103 Missing docstring in public function
./setup.py:1:1: D100 Missing docstring in public module


make: *** [makefile:5: flake8] Error 1


CalledProcessError: Command 'b'cd package\nmake all\n'' returned non-zero exit status 2.

You can also run individual targets.



In [37]:
%%bash
cd package
make test

pytest .
platform linux -- Python 3.9.13, pytest-7.3.2, pluggy-1.0.0
rootdir: /home/jovyan/work/06-code-quality/package
plugins: anyio-3.6.1, nbmake-1.4.1
collected 1 item

s23bib/test_sort.py .                                                    [100%]



Note that make will exit if any rule exits with a non-zero status.

Make is complex, and does much more than this. It has many applications in building, compilation and installing software.



# Integration with git

Finally, we can look at how we can integrate all this with git. So far, we have manually run each command, and edited files, then run the commands again. That is a little tedious, and we can leverage some capability in git we have not talked about so far. 

Git has a notion of [hooks](https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks). These are programs that are run at different events that occur. There are many possible hooks that exist in the ~.git/hooks~ directory. 




In [38]:
%%bash
cd package
ls .git/hooks

applypatch-msg.sample
commit-msg.sample
fsmonitor-watchman.sample
post-update.sample
pre-applypatch.sample
pre-commit.sample
pre-merge-commit.sample
prepare-commit-msg.sample
pre-push.sample
pre-rebase.sample
pre-receive.sample
push-to-checkout.sample
update.sample


The one we are interested in is the pre-commit hook. This will be a program that runs before a commit is done, and the commit can only proceed if the program runs successfully. We can use our makefile for this. You can create .git/hooks/pre-commit with at least this content. You also need to add a shebang line (#!/bin/bash), and make the file executable (chmod +x .git/hooks/pre-commit).

Let's run some tests first.



In [39]:
%%writefile package/.git/hooks/pre-commit
#!/bin/bash
echo "running precommit in `pwd`"
exit 0

Writing package/.git/hooks/pre-commit


In [40]:
%%bash
chmod +x package/.git/hooks/pre-commit

Now, this will be run every time you try to commit, and you will not be able to make a commit if your tests don't pass.

Note that the pre-commit hook is run from the root of the git repository, and any paths used must be set accordingly.



In [43]:
%%bash
git config --global user.email "you@example.com"
git config --global user.name "Your Name"

In [44]:
%%bash
cd package
git add *.py
git commit -m "adding pyfiles"

running precommit in /home/jovyan/work/06-code-quality/package


[master (root-commit) cd62fdb] adding pyfiles
 1 file changed, 13 insertions(+)
 create mode 100644 setup.py


In [45]:
%%writefile package/.git/hooks/pre-commit
#!/bin/bash
make all

Overwriting package/.git/hooks/pre-commit


In [46]:
%%bash
chmod +x package/.git/hooks/pre-commit

In [72]:
%%bash
cd package
git add *.py
git commit -m "adding pyfiles"

black .
All done! ✨ 🍰 ✨
4 files left unchanged.
flake8 --exclude build .


Now you have to fix the errors if you want to able to commit. This helps ensure you always submit good code. You can also integrate testing into this so you make sure your code doesn't have errors in it.



In [48]:
%%bash 
cd package
git log

commit cd62fdb422fa4e0c5d360800fe280c590219ccc2
Author: Your Name <you@example.com>
Date:   Wed Jun 14 15:42:47 2023 +0000

    adding pyfiles


In [49]:
%%bash
cd package
git add *.py
git commit --no-verify -m "adding pyfiles"

On branch master
Untracked files:
  (use "git add <file>..." to include in what will be committed)
	build/
	makefile
	s23bib.egg-info/
	s23bib/

nothing added to commit but untracked files present (use "git add" to track)


CalledProcessError: Command 'b'cd package\ngit add *.py\ngit commit --no-verify -m "adding pyfiles"\n'' returned non-zero exit status 1.

# pre-commit
There are more sophisticated approaches. [pre-commit](https://pre-commit.com/#intro) is another Python package that can help you create scripts for the pre-commit hook. To set it up you have to create a yaml config file like this. I find these tricky in general, and usually adapt them from documentation at pre-commit.



In [53]:
! pip install pre-commit

Collecting pre-commit
  Downloading pre_commit-3.3.3-py2.py3-none-any.whl (202 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m202.8/202.8 kB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting cfgv>=2.0.0 (from pre-commit)
  Downloading cfgv-3.3.1-py2.py3-none-any.whl (7.3 kB)
Collecting identify>=1.0.0 (from pre-commit)
  Downloading identify-2.5.24-py2.py3-none-any.whl (98 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m98.8/98.8 kB[0m [31m11.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting nodeenv>=0.11.1 (from pre-commit)
  Downloading nodeenv-1.8.0-py2.py3-none-any.whl (22 kB)
Collecting virtualenv>=20.10.0 (from pre-commit)
  Downloading virtualenv-20.23.0-py3-none-any.whl (3.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.3/3.3 MB[0m [31m17.6 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting distlib<1,>=0.3.6 (from virtualenv>=20.10.0->pre-commit)
  Downloading distlib-0.3.6-py2.py3-none-any.whl

In [54]:
%%writefile package/.pre-commit-config.yaml
repos:
  -  repo: https://github.com/psf/black
     rev: 23.3.0
     hooks:
     - id: black

  -  repo: https://github.com/pre-commit/pre-commit-hooks
     rev: v2.0.0
     hooks:
     - id: flake8
    
  - repo: local
    hooks:
    - id: pytest-check
      name: pytest-check
      stages: [commit]
      types: [python]
      entry: pytest
      language: system
      pass_filenames: false
      always_run: true

Overwriting package/.pre-commit-config.yaml


In [55]:
%%bash
cd package
pre-commit install

Running in migration mode with existing hooks at .git/hooks/pre-commit.legacy
Use -f to use only pre-commit.
pre-commit installed at .git/hooks/pre-commit


In [56]:
! cat package/.git/hooks/pre-commit

#!/usr/bin/env bash
# File generated by pre-commit: https://pre-commit.com
# ID: 138fd403232d2ddd5efb44317e38bf03

# start templated
INSTALL_PYTHON=/opt/conda/bin/python
ARGS=(hook-impl --config=.pre-commit-config.yaml --hook-type=pre-commit)
# end templated

HERE="$(cd "$(dirname "$0")" && pwd)"
ARGS+=(--hook-dir "$HERE" -- "$@")

if [ -x "$INSTALL_PYTHON" ]; then
    exec "$INSTALL_PYTHON" -mpre_commit "${ARGS[@]}"
elif command -v pre-commit > /dev/null; then
    exec pre-commit "${ARGS[@]}"
else
    echo '`pre-commit` not found.  Did you forget to activate your virtualenv?' 1>&2
    exit 1
fi


We can run the rules manually.



In [57]:
%%bash
cd package
pre-commit run --all-files

[INFO] Initializing environment for https://github.com/psf/black.
[INFO] Initializing environment for https://github.com/pre-commit/pre-commit-hooks.
[INFO] Installing environment for https://github.com/psf/black.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
[INFO] Installing environment for https://github.com/pre-commit/pre-commit-hooks.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
black....................................................................Passed
Flake8...................................................................Passed
pytest-check.............................................................Passed


This also runs automatically whenever git detects changes in a file that would be checked.


In [58]:
%%bash
cd package

git add makefile
git commit -m "add makefile"

black .
All done! ✨ 🍰 ✨
4 files left unchanged.
flake8 --exclude build .
./s23bib/__init__.py:1:1: D104 Missing docstring in public package
./s23bib/__init__.py:1:1: F401 '.utils.sort_bibtex' imported but unused
./s23bib/test_sort.py:1:1: D100 Missing docstring in public module
./s23bib/test_sort.py:3:1: F401 'bibtexparser' imported but unused
./s23bib/test_sort.py:34:1: D103 Missing docstring in public function
./s23bib/test_sort.py:41:1: D101 Missing docstring in public class
./s23bib/test_sort.py:42:1: D102 Missing docstring in public method
./s23bib/utils.py:1:1: D100 Missing docstring in public module
./s23bib/utils.py:4:1: D103 Missing docstring in public function
./setup.py:1:1: D100 Missing docstring in public module
make: *** [makefile:5: flake8] Error 1
black................................................(no files to check)Skipped
Flake8...............................................(no files to check)Skipped
pytest-check......................................................

CalledProcessError: Command 'b'cd package\n\ngit add makefile\ngit commit -m "add makefile"\n'' returned non-zero exit status 1.

There are a lot of things you can do with pre-commit (https://pre-commit.com/hooks.html). 



In [4]:
%%bash
cd package

git commit setup.py -m "capitalize something"

# Summary

As your package gets more sophisticated, and more people have to interact with it, it becomes more and more important that you follow some standards in formatting and styling. There are tools to help with auto-formatting, and style checking.

Testing is important to help verify that your package works correctly. There are tools to examine your package, and compute how much of it is covered by the tests.

Finally, we looked at integration of these tools with git via the pre-commit hook to make sure that you only commit high-quality code to the repository. This helps avoid needing multiple commits to fix formatting, style issues, and can be used to make sure your tests pass before you commit.




# In class exercise

Fix all the issues in the package so that you can commit the files, and have a clean package.

Make sure to ignore files like the build directory and .egg-info directory.

