# More on Python packaging

Today we will learn more about packages and using git. Let's start by making a directory where we can do our work, and initialize it as a git repo.

If you haven't read https://merely-useful.tech/py-rse/git-advanced.html, you should do that now.

The next cell simply starts us fresh. You *must* be very careful with `-fr`, it means to recursively delete the path you specify, and `f` means `force` which makes it work even when src doesn't exist. You can destroy a lot of work with this command.

In [30]:
%%bash
rm -fr src

Next we use these commands to create a src directory with a package directory in it.



In [31]:
%%bash 
mkdir -p src/s23pack
cd src
git init
git checkout -b main
echo -e "s23 package\n===========" > README.md
git add README.md
git commit README.md -m "Initial readme."
git status

hint: Using 'master' as the name for the initial branch. This default branch name
hint: is subject to change. To configure the initial branch name to use in all
hint: 
hint: 	git config --global init.defaultBranch <name>
hint: 
hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
hint: 'development'. The just-created branch can be renamed via this command:
hint: 
hint: 	git branch -m <name>


Initialized empty Git repository in /home/jovyan/work/04-more-python-packaging/src/.git/


Switched to a new branch 'main'


[main (root-commit) 46bd896] Initial readme.
 1 file changed, 2 insertions(+)
 create mode 100644 README.md
On branch main
nothing to commit, working tree clean


## Setting up our initial package



The next few cells create several files we talked about last time. We start with the setup.py file. You should edit this cell to replace <> fields with your information. This file references the license, and a script we will use as a command.



In [32]:
%%writefile src/setup.py
from setuptools import setup

setup(name='s23pack',
      version='0.0.1',
      description='s23 package',
      maintainer='John Kitchin',
      maintainer_email='jkitchin@cmu.edu',
      license='MIT',
      packages=['s23pack'],
      entry_points={'console_scripts': ['oa = s23pack.main:main']},
      long_description='''A long
      multiline description.''')

Writing src/setup.py


Next write the licence file.



In [33]:
%%writefile src/LICENSE
Copyright 2023 John Kitchin

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Writing src/LICENSE


The next three cells create the `__init__.py`, `utils.py`, and the script file.



In [34]:
%%writefile src/s23pack/__init__.py
print('loaded s23pack')
from .utils import hello
from .main import openalex_institution

Writing src/s23pack/__init__.py


In [35]:
%%writefile src/s23pack/utils.py
def hello(name):
    print(f'Hi there {name}')

Writing src/s23pack/utils.py


In [36]:
%%writefile src/s23pack/main.py
import click

import requests 
from collections.abc import Iterable 

def openalex_institution(query):
    'query is a list of terms in the query, or a string.'
    if isinstance(query, str):
        query = '+'.join(query.split())

    # We assume it is an iterable of strings.
    elif isinstance(query, Iterable):
        query = '+'.join(query)
        
    url = f'https://api.openalex.org/institutions?search={query}'
    req = requests.get(url)
    data = req.json()

    return [f'{result["display_name"]:50s}{result["works_count"]:10d}{result["cited_by_count"]:10d}'
            for result in data['results']]

@click.command(help='OpenAlex Institutions')
@click.argument('query', nargs=-1)
def main(query):
    print('\n'.join(openalex_institution(query)))

Writing src/s23pack/main.py


In [37]:
!tree src

[01;34msrc[0m
├── [00mLICENSE[0m
├── [00mREADME.md[0m
├── [01;34ms23pack[0m
│   ├── [00m__init__.py[0m
│   ├── [00mmain.py[0m
│   └── [00mutils.py[0m
└── [00msetup.py[0m

1 directory, 6 files


In [38]:
import sys
sys.path.insert(0, 'src')

import s23pack

In [39]:
s23pack.hello('Class')

Hi there Class


In [40]:
# Now let's rm src from the path
sys.path.remove('src')
sys.path

['/home/jovyan/work/04-more-python-packaging',
 '/opt/conda/lib/python39.zip',
 '/opt/conda/lib/python3.9',
 '/opt/conda/lib/python3.9/lib-dynload',
 '',
 '/opt/conda/lib/python3.9/site-packages']

# Installing the package

Let's go ahead and install this. Before we do that, a quick note about installation. There are system software packages, and you typically need elevated privileges to install those. You do not have them here. Instead, Python has a *user* space where you can install packages. In this JupyterHUB, you can find it here. Yours may look different because it depends on what you have installed.



To install our package we change into the src directory and run `pip install .` which means we run install in that directory.



In [41]:
! cd src && pip install .

Processing /home/jovyan/work/04-more-python-packaging/src
  Preparing metadata (setup.py) ... [?25ldone
[?25hBuilding wheels for collected packages: s23pack
  Building wheel for s23pack (setup.py) ... [?25ldone
[?25h  Created wheel for s23pack: filename=s23pack-0.0.1-py3-none-any.whl size=3070 sha256=711527202d33b0230553a9369e34e92cd9ea2967f6c9c5dc32e70fa2d2fcd35b
  Stored in directory: /tmp/pip-ephem-wheel-cache-8tmjtlie/wheels/37/b6/73/45c180b641b7519d80d22856d813192e549b29006c664f31bd
Successfully built s23pack
Installing collected packages: s23pack
Successfully installed s23pack-0.0.1


The installation changed some things. First, It installed some packages in your local site packages. You can see there are some new s23pack directories.



Next, there are some changes in the src directory. There is a build directory, and an s23pack.egg-info directory.



In [42]:
!tree src

[01;34msrc[0m
├── [01;34mbuild[0m
│   ├── [01;34mbdist.linux-x86_64[0m
│   └── [01;34mlib[0m
│       └── [01;34ms23pack[0m
│           ├── [00m__init__.py[0m
│           ├── [00mmain.py[0m
│           └── [00mutils.py[0m
├── [00mLICENSE[0m
├── [00mREADME.md[0m
├── [01;34ms23pack[0m
│   ├── [00m__init__.py[0m
│   ├── [00mmain.py[0m
│   └── [00mutils.py[0m
├── [01;34ms23pack.egg-info[0m
│   ├── [00mdependency_links.txt[0m
│   ├── [00mentry_points.txt[0m
│   ├── [00mPKG-INFO[0m
│   ├── [00mSOURCES.txt[0m
│   └── [00mtop_level.txt[0m
└── [00msetup.py[0m

6 directories, 14 files


In [43]:
import s23pack

In [44]:
import os
for path in sys.path:
    if os.path.exists(os.path.join(path, 's23pack')):
        print(path)
        break

/opt/conda/lib/python3.9/site-packages


In [45]:
s23pack.hello('John')

Hi there John


We have to switch to a terminal to check on our `oa` script. Try it out. I think the reason is the executable path in your terminal is different than the one here.



In [46]:
! echo $PATH
! which oa

/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
/opt/conda/bin/oa


In [47]:
! oa carnegie mellon

loaded s23pack
Carnegie Mellon University                            111876   4533093
Carnegie Mellon University Qatar                         612      8180
Carnegie Mellon University Australia                      81      1672
Carnegie Mellon University Africa                        125       651


## Uninstall your package



In [48]:
! pip uninstall -y s23pack

Found existing installation: s23pack 0.0.1
Uninstalling s23pack-0.0.1:
  Successfully uninstalled s23pack-0.0.1


You can see here that the package is gone from site-packages now.



In [49]:
import os
for path in sys.path:
    if os.path.exists(os.path.join(path, 's23pack')):
        print(path)
        break

And you can see here the executable command is gone too.



In [50]:
! which oa

# Back to git

Before we reinstall, let's take some time to clean up our repo. Lets start with a high level view.



In [51]:
%%bash 
cd src
git status

On branch main
Untracked files:
  (use "git add <file>..." to include in what will be committed)
	LICENSE
	build/
	s23pack.egg-info/
	s23pack/
	setup.py

nothing added to commit but untracked files present (use "git add" to track)


In [52]:
!tree src

[01;34msrc[0m
├── [01;34mbuild[0m
│   ├── [01;34mbdist.linux-x86_64[0m
│   └── [01;34mlib[0m
│       └── [01;34ms23pack[0m
│           ├── [00m__init__.py[0m
│           ├── [00mmain.py[0m
│           └── [00mutils.py[0m
├── [00mLICENSE[0m
├── [00mREADME.md[0m
├── [01;34ms23pack[0m
│   ├── [00m__init__.py[0m
│   ├── [00mmain.py[0m
│   └── [00mutils.py[0m
├── [01;34ms23pack.egg-info[0m
│   ├── [00mdependency_links.txt[0m
│   ├── [00mentry_points.txt[0m
│   ├── [00mPKG-INFO[0m
│   ├── [00mSOURCES.txt[0m
│   └── [00mtop_level.txt[0m
└── [00msetup.py[0m

6 directories, 14 files


In the src dir, we want to ignore a few things like the whole build dir, and the .egg-info directory. Lets make a .gitignore file first.



In [53]:
%%writefile src/.gitignore
build
*.egg-info
.ipynb_checkpoints
__pycache__

Writing src/.gitignore


In [54]:
%%bash 
cd src
git status

On branch main
Untracked files:
  (use "git add <file>..." to include in what will be committed)
	.gitignore
	LICENSE
	s23pack/
	setup.py

nothing added to commit but untracked files present (use "git add" to track)


Now it looks like we can just add everything and get going. After we add them, we check to see what is in there before we commit. Note we get a warning that files were ignored.



In [55]:
%%bash 
cd src
git add .gitignore *
git commit .gitignore -m "add ignore files"
git status

The following paths are ignored by one of your .gitignore files:
build
s23pack.egg-info
hint: Use -f if you really want to add them.
hint: Turn this message off by running
hint: "git config advice.addIgnoredFile false"


[main dfb156d] add ignore files
 1 file changed, 4 insertions(+)
 create mode 100644 .gitignore
On branch main
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	new file:   LICENSE
	new file:   s23pack/__init__.py
	new file:   s23pack/main.py
	new file:   s23pack/utils.py
	new file:   setup.py



Now we commit these. 



In [56]:
%%bash 
cd src
git commit -m "First set of files"

[main 20e1b0e] First set of files
 5 files changed, 49 insertions(+)
 create mode 100644 LICENSE
 create mode 100644 s23pack/__init__.py
 create mode 100644 s23pack/main.py
 create mode 100644 s23pack/utils.py
 create mode 100644 setup.py


In [57]:
%%bash
cd src
git status
git log --graph --oneline

On branch main
nothing to commit, working tree clean
* 20e1b0e First set of files
* dfb156d add ignore files
* 46bd896 Initial readme.


Note there is a hash that we can use later, but it is hard to remember. Let's go ahead and add a tag to indicate we are at version 0.0.1. Technically this is a *lightweight* tag (https://git-scm.com/book/en/v2/Git-Basics-Tagging).



In [58]:
%%bash 
cd src
git tag v0.0.1

In [59]:
%%bash 
cd src
git status

On branch main
nothing to commit, working tree clean


# Let's catch our breath

1. We setup a small Python package with one executable script (oa), and one function in a utils.py file.
2. We installed it, and checked out what happened, where files were put, and that it worked.
3. We uninstalled, and checked if things got cleaned up.
4. We put the files under version control, and tagged v0.0.1

The package is currently uninstalled, and the repo should be clean. We are going to start making some changes now.

The `oa` script is not as reusable as we might like. The function in it does not need to be there. Let's move it to the utils.py file.  This requires us to change several files. In addition to moving the function, we have to move some imports, and modify the `__init__.py` file. Let's go ahead and do that.



In [60]:
%%writefile src/s23pack/utils.py 
import requests 
from collections.abc import Iterable 

def hello(name):
    print(f'Hi there {name}')
    

def openalex_institution(query):
    'query is a list of terms in the query, or a string.'
    if isinstance(query, str):
        query = '+'.join(query.split())

    # We assume it is an iterable of strings.
    elif isinstance(query, Iterable):
        query = '+'.join(query)
        
    url = f'https://api.openalex.org/institutions?search={query}'
    req = requests.get(url)
    data = req.json()

    return [f'{result["display_name"]:50s}{result["works_count"]:10d}{result["cited_by_count"]:10d}'
            for result in data['results']]

Overwriting src/s23pack/utils.py


In [61]:
%%writefile src/s23pack/main.py 
import click
from .utils import openalex_institution

@click.command(help='OpenAlex Institutions')
@click.argument('query', nargs=-1)
def main(query):
    print('\n'.join(openalex_institution(query)))

Overwriting src/s23pack/main.py


In [62]:
%%writefile src/s23pack/__init__.py
from .utils import hello, openalex_institution

Overwriting src/s23pack/__init__.py


# Reinstall the package after making the changes.
You probably need to restart the kernel after this.



In [63]:
! cd src && pip install .

Processing /home/jovyan/work/04-more-python-packaging/src
  Preparing metadata (setup.py) ... [?25ldone
[?25hBuilding wheels for collected packages: s23pack
  Building wheel for s23pack (setup.py) ... [?25ldone
[?25h  Created wheel for s23pack: filename=s23pack-0.0.1-py3-none-any.whl size=3091 sha256=801701be7a324022b6a0d1b5b2d0d4813c398956adb68a2d125a56659f774ab1
  Stored in directory: /tmp/pip-ephem-wheel-cache-ptfk4xwn/wheels/37/b6/73/45c180b641b7519d80d22856d813192e549b29006c664f31bd
Successfully built s23pack
Installing collected packages: s23pack
Successfully installed s23pack-0.0.1


In [64]:
import s23pack
s23pack.hello('John')

Hi there John


In [65]:
# check that our function works.
s23pack.openalex_institution('carnegie+mellon')

['Carnegie Mellon University                            111876   4533093',
 'Carnegie Mellon University Qatar                         612      8180',
 'Carnegie Mellon University Australia                      81      1672',
 'Carnegie Mellon University Africa                        125       651']

In [66]:
# Check that the command still works
! oa carnegie mellon

Carnegie Mellon University                            111876   4533093
Carnegie Mellon University Qatar                         612      8180
Carnegie Mellon University Australia                      81      1672
Carnegie Mellon University Africa                        125       651


## Commit changes to git when everything is working.

You can see there are some new nuisance files (check the git gui) we should ignore. Let's take care of that. You can either edit the .gitignore file, or run this cell.



In [67]:
%%bash
echo -e "*checkpoint*" >> src/.gitignore

In [68]:
%%bash
cd src
git status

On branch main
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   .gitignore
	modified:   s23pack/__init__.py
	modified:   s23pack/main.py
	modified:   s23pack/utils.py

no changes added to commit (use "git add" and/or "git commit -a")


Now, we can commit the results. It takes a little planning; I commit the .gitignore separately, since it is unrelated to the set of changes we make. Then, all that is left are the remaining files, so we commit them all at once. 



In [69]:
%%bash
cd src
git commit .gitignore -m "ignore checkpoint files"
git commit -am "move openalex_institutions function out of oa into utils.py"

[main e86c420] ignore checkpoint files
 1 file changed, 1 insertion(+)
[main 3651938] move openalex_institutions function out of oa into utils.py
 3 files changed, 12 insertions(+), 35 deletions(-)
 rewrite s23pack/main.py (79%)
 copy s23pack/{main.py => utils.py} (79%)


In [70]:
%%bash
cd src
git status



On branch main
nothing to commit, working tree clean


## Seeing older versions of files
We can see older versions of our files like this:



In [71]:
%%bash
cd src
git show v0.0.1:s23pack/utils.py

def hello(name):
    print(f'Hi there {name}')


Compare that to our current version. HEAD always points to the most recent version.



In [72]:
%%bash
cd src
git show HEAD:s23pack/utils.py

import requests 
from collections.abc import Iterable 

def hello(name):
    print(f'Hi there {name}')
    

def openalex_institution(query):
    'query is a list of terms in the query, or a string.'
    if isinstance(query, str):
        query = '+'.join(query.split())

    # We assume it is an iterable of strings.
    elif isinstance(query, Iterable):
        query = '+'.join(query)
        
    url = f'https://api.openalex.org/institutions?search={query}'
    req = requests.get(url)
    data = req.json()

    return [f'{result["display_name"]:50s}{result["works_count"]:10d}{result["cited_by_count"]:10d}'
            for result in data['results']]


In [73]:
%%bash
cd src
git log --oneline

3651938 move openalex_institutions function out of oa into utils.py
e86c420 ignore checkpoint files
20e1b0e First set of files
dfb156d add ignore files
46bd896 Initial readme.


You can also use a hash to indicate which version you want to see.



In [75]:
%%bash
cd src
git show 20e1b0e:s23pack/utils.py

def hello(name):
    print(f'Hi there {name}')


# Summary - take two
We have made our package a little better now. It still has the script, but it also has an importable function you can reuse in other applications, e.g. this notebook. There are a few things that pull this together:

1. setup.py has information about the package and script location for installing it.
2. utils.py has code that is imported in the oa script
3. `__init__.py` makes sure the function is imported and available

Leaving any of those details out makes something stop working.



# Testing

So far we have been testing by hand. That is moderately tedious... Every time we make changes, we have to go through and check if we broke something. We can set up some tests to help us with this.

Here is a simple test we can try.



In [76]:
%%writefile src/test_oa.py
import s23pack

def test_hello():
    assert s23pack.hello('John') == 'Hi there John'

Writing src/test_oa.py


We use [pytest](https://docs.pytest.org/en/7.2.x/contents.html) to run the test. You just run `pytest` at the command line.



In [77]:
%%bash
cd src
pytest

platform linux -- Python 3.9.13, pytest-7.3.2, pluggy-1.0.0
rootdir: /home/jovyan/work/04-more-python-packaging/src
plugins: anyio-3.6.1, nbmake-1.4.1
collected 1 item

test_oa.py F                                                             [100%]

__________________________________ test_hello __________________________________

    def test_hello():
>       assert s23pack.hello('John') == 'Hi there John'
E       AssertionError: assert None == 'Hi there John'
E        +  where None = <function hello at 0x7fc07f8cb430>('John')
E        +    where <function hello at 0x7fc07f8cb430> = s23pack.hello

test_oa.py:4: AssertionError
----------------------------- Captured stdout call -----------------------------
Hi there John
FAILED test_oa.py::test_hello - AssertionError: assert None == 'Hi there John'


CalledProcessError: Command 'b'cd src\npytest\n'' returned non-zero exit status 1.

Oh no! see if you can figure out the problem here. Fix the problem and commit the files to git. Note you will see some new files you should ignore in git.

The probem is that we only printed the result in `utils.hello`, and so the function returns None. You can fix it like this. It is moderately tedious that you have to reinstall the package after doing this. That is avoidable, but we save that for a later lesson.

In [80]:
%%writefile src/s23pack/utils.py 
import requests 
from collections.abc import Iterable 

def hello(name):
    return(f'Hi there {name}')
    

def openalex_institution(query):
    'query is a list of terms in the query, or a string.'
    if isinstance(query, str):
        query = '+'.join(query.split())

    # We assume it is an iterable of strings.
    elif isinstance(query, Iterable):
        query = '+'.join(query)
        
    url = f'https://api.openalex.org/institutions?search={query}'
    req = requests.get(url)
    data = req.json()

    return [f'{result["display_name"]:50s}{result["works_count"]:10d}{result["cited_by_count"]:10d}'
            for result in data['results']]

Overwriting src/s23pack/utils.py


In [81]:
%%bash
cd src
pip install . > /dev/null  # silent installation
pytest

platform linux -- Python 3.9.13, pytest-7.3.2, pluggy-1.0.0
rootdir: /home/jovyan/work/04-more-python-packaging/src
plugins: anyio-3.6.1, nbmake-1.4.1
collected 1 item

test_oa.py .                                                             [100%]



Re-read https://merely-useful.tech/py-rse/scripting.html on building python functions and scripts.

Then, read https://merely-useful.tech/py-rse/packaging.html about python packages. It is a little more involved than we have done so far, but you should be in good shape to read about it now. We do not use virtual environments here. I think they add a layer of complexity we don't want now, and there are many complications in using them (mostly in the form of what virtual environment am I in, and is it active).

