Python packaging and version control
====================================

When you import a Python library, you are importing code from a package (also sometimes called a module). There is some magic that happens when you do this. For example, consider this simple import of a core library. Where is that code?



In [1]:
import os

In [2]:
dir(os)

['CLD_CONTINUED',
 'CLD_DUMPED',
 'CLD_EXITED',
 'CLD_KILLED',
 'CLD_STOPPED',
 'CLD_TRAPPED',
 'DirEntry',
 'EX_CANTCREAT',
 'EX_CONFIG',
 'EX_DATAERR',
 'EX_IOERR',
 'EX_NOHOST',
 'EX_NOINPUT',
 'EX_NOPERM',
 'EX_NOUSER',
 'EX_OK',
 'EX_OSERR',
 'EX_OSFILE',
 'EX_PROTOCOL',
 'EX_SOFTWARE',
 'EX_TEMPFAIL',
 'EX_UNAVAILABLE',
 'EX_USAGE',
 'F_LOCK',
 'F_OK',
 'F_TEST',
 'F_TLOCK',
 'F_ULOCK',
 'GenericAlias',
 'Mapping',
 'MutableMapping',
 'NGROUPS_MAX',
 'O_ACCMODE',
 'O_APPEND',
 'O_ASYNC',
 'O_CLOEXEC',
 'O_CREAT',
 'O_DIRECT',
 'O_DIRECTORY',
 'O_DSYNC',
 'O_EXCL',
 'O_LARGEFILE',
 'O_NDELAY',
 'O_NOATIME',
 'O_NOCTTY',
 'O_NOFOLLOW',
 'O_NONBLOCK',
 'O_RDONLY',
 'O_RDWR',
 'O_RSYNC',
 'O_SYNC',
 'O_TRUNC',
 'O_WRONLY',
 'POSIX_FADV_DONTNEED',
 'POSIX_FADV_NOREUSE',
 'POSIX_FADV_NORMAL',
 'POSIX_FADV_RANDOM',
 'POSIX_FADV_SEQUENTIAL',
 'POSIX_FADV_WILLNEED',
 'POSIX_SPAWN_CLOSE',
 'POSIX_SPAWN_DUP2',
 'POSIX_SPAWN_OPEN',
 'PRIO_PGRP',
 'PRIO_PROCESS',
 'PRIO_USER',
 'P_ALL',
 'P_N

We can find where the code for that library resides using the `__file__` attribute.



In [3]:
os.__file__

'/opt/tljh/user/lib/python3.9/os.py'

In [5]:
! ls /opt/tljh/user/lib/python3.9/

abc.py			     _osx_support.py
aifc.py			     pathlib.py
_aix_support.py		     pdb.py
antigravity.py		     __phello__.foo.py
argparse.py		     pickle.py
ast.py			     pickletools.py
asynchat.py		     pipes.py
asyncio			     pkgutil.py
asyncore.py		     platform.py
base64.py		     plistlib.py
bdb.py			     poplib.py
binhex.py		     posixpath.py
bisect.py		     pprint.py
_bootlocale.py		     profile.py
_bootsubprocess.py	     pstats.py
bz2.py			     pty.py
calendar.py		     _py_abc.py
cgi.py			     __pycache__
cgitb.py		     pyclbr.py
chunk.py		     py_compile.py
cmd.py			     _pydecimal.py
codecs.py		     pydoc_data
codeop.py		     pydoc.py
code.py			     _pyio.py
collections		     queue.py
_collections_abc.py	     quopri.py
colorsys.py		     random.py
_compat_pickle.py	     reprlib.py
compileall.py		     re.py
_compression.py		     rlcompleter.py
concurrent		     runpy.py
config-3.9-x86_64-linux-gnu  sched.py
configparser.py		     secrets.py
contextlib.py		     selectors.py
contextvars.

The reason we can import this file without saying where it is is because Python has a list of directories it knows to look in. These are available to you in the `sys` module. This contains a list of directories where Python looks. Here, it specifically looks for a file named os.py in one of those directories. Your path may look different from this.



In [6]:
! echo $PYTHONPATH




In [7]:
import sys
sys.path

['/home/jupyter-jkitchin@andrew.cm-11dd7/s24-06643/sse/03-python-packaging',
 '/opt/tljh/user/lib/python39.zip',
 '/opt/tljh/user/lib/python3.9',
 '/opt/tljh/user/lib/python3.9/lib-dynload',
 '',
 '/home/jupyter-jkitchin@andrew.cm-11dd7/.local/lib/python3.9/site-packages',
 '/opt/tljh/user/lib/python3.9/site-packages']

In [9]:
! ls /home/jupyter-jkitchin@andrew.cm-11dd7/.local/lib/python3.9/site-packages

a11y_pygments
absl
absl_py-1.2.0.dist-info
accessible_pygments-0.0.4.dist-info
canvasapi
canvasapi-2.2.0.dist-info
easy-install.pth
etils
etils-0.7.1.dist-info
flake8_docstrings-1.7.0.dist-info
flake8_docstrings.py
importlib_resources
importlib_resources-5.9.0.dist-info
ipywidgets
ipywidgets-8.0.1.dist-info
jupyter_book
jupyter_book-0.15.1.dist-info
jupyter_cache
jupyter_cache-0.6.1.dist-info
jupyterlab-spreadsheet-editor
jupyterlab_spreadsheet_editor-0.6.1.dist-info
linkify_it
linkify_it_py-2.0.0.dist-info
markdown_it
markdown_it_py-2.2.0.dist-info
mdit_py_plugins
mdit_py_plugins-0.3.5.dist-info
mdurl
mdurl-0.1.2.dist-info
networkx
networkx-3.2.1.dist-info
opt_einsum
opt_einsum-3.3.0.dist-info
__pycache__
pycse
pycse-2.2.1.dist-info
pydata_sphinx_theme
pydata_sphinx_theme-0.13.3.dist-info
pydotplus
pydotplus-2.0.2.dist-info
rich
rich-13.3.3.dist-info
s23bib
s23bib-0.0.1.dist-info
s23oa
s23oa-0.0.1.dist-info
s23pack
s23pack-0.0.1.dist-info
s24pack
s24pack-0.0.1.dist-info
sbc.egg-link
s

You can see how this works here.



In [13]:
for path in sys.path:
    if os.path.exists(os.path.join(path, 'sklearn')):
        print(path)
        break

/opt/tljh/user/lib/python3.9/site-packages


In [14]:
import sklearn
sklearn.__file__

'/opt/tljh/user/lib/python3.9/site-packages/sklearn/__init__.py'

In [15]:
! env

SHELL=/bin/bash
JUPYTERHUB_ADMIN_ACCESS=1
JUPYTERHUB_API_TOKEN=a8dd8e0e271549629ad7c28fdeac5cc6
JUPYTERHUB_BASE_URL=/
PWD=/home/jupyter-jkitchin@andrew.cm-11dd7/s24-06643/sse/03-python-packaging
LOGNAME=jupyter-jkitchin@andrew.cm-11dd7
JUPYTERHUB_SERVER_NAME=
HOME=/home/jupyter-jkitchin@andrew.cm-11dd7
LANG=en_US.UTF-8
JPY_API_TOKEN=a8dd8e0e271549629ad7c28fdeac5cc6
JUPYTERHUB_SERVICE_PREFIX=/user/jkitchin@andrew.cmu.edu/
JUPYTERHUB_OAUTH_CALLBACK_URL=/user/jkitchin@andrew.cmu.edu/oauth_callback
CLICOLOR=1
INVOCATION_ID=6c6ce578298b4446b3ebf001ffde1b06
RUNTIME_DIRECTORY=/run/jupyter-jkitchin@andrew.cmu.edu
JPY_PARENT_PID=428309
KMP_DUPLICATE_LIB_OK=True
KMP_INIT_AT_FORK=FALSE
TERM=xterm-color
USER=jupyter-jkitchin@andrew.cm-11dd7
GIT_PAGER=cat
KITCHIN=True
SHLVL=0
PAGER=cat
JUPYTERHUB_API_URL=http://127.0.0.1:15001/hub/api
JUPYTERHUB_CLIENT_ID=jupyterhub-user-jkitchin%40andrew.cmu.edu
JUPYTERHUB_HOST=
MPLBACKEND=module://ipykernel.pylab.backend_inline
GIT_PYTHON_REFRESH=quiet
JOURNAL_ST

# Anatomy of a package

A Python package is a collection of files and directories that follow some conventions. It is common for the whole set to be in a single root directory. This is helpful to isolate the files from other files, so they are easy to move later.

In the package root, you need several files:

- [README.md](./package-root/README.md) :: A text file describing the package
- [setup.py](./package-root/setup.py) :: A Python file for installing the package
- [LICENSE](./package-root/LICENSE) :: A file containing the terms of use for your package.

There are a lot of licenses: https://opensource.org/licenses. We will primarily focus on the MIT license.

We put the source for our package in a directory inside called *testpack*.

In [17]:
! rm -fr package-root  # start clean
! mkdir package-root
! mkdir package-root/testpack

In [18]:
%%writefile package-root/README.md
Example package
===============

There is one function: testpack.hello.

Writing package-root/README.md


In [19]:
%%writefile package-root/LICENSE
Copyright 2024 John Kitchin

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Writing package-root/LICENSE


In [20]:
%%writefile package-root/setup.py
from setuptools import setup

setup(name='testpack',
      version='0.0.1',
      description='testpack utilities',
      maintainer='John Kitchin',
      maintainer_email='jkitchin@andrew.cmu.edu',
      license='MIT',
      packages=['testpack'],
      scripts=[],
      long_description='''\
testpack utilities
==============
Handy functions for a project.''')

Writing package-root/setup.py


Inside the testpack directory there must be an `__init__.py` file, and maybe additional package source files (.py files). 

Check out [\_\_init__.py](./package-root/testpack/__init__.py). This file is run every time you import the package. We define a single function in this file that we can use later, and there is a diagnostic line that should print when we import the package later.

In [21]:
%%writefile package-root/testpack/__init__.py
print('Loading testpack! Version 1')

def hello(name):
    return f'Hello {name}'

Writing package-root/testpack/__init__.py


In [22]:
! tree package-root

[01;34mpackage-root[00m
├── LICENSE
├── README.md
├── setup.py
└── [01;34mtestpack[00m
    └── __init__.py

1 directory, 4 files


We cannot directly import this package yet. Try it:



In [30]:
import testpack
testpack.__file__

'/home/jupyter-jkitchin@andrew.cm-11dd7/s24-06643/sse/03-python-packaging/package-root/testpack/__init__.py'

That fails because it is not found anywhere on your Python path. Usually, we will install a package to do that, but we will first manually modify the path for development purposes. `sys.path` is just a list of directories, and we can add to it or append directories using Python. This is only temporary, while this notebook is alive. We use a relative path here, which implies the working directory is the same as the path to this notebook. If you haven't specifically changed that, it should be. If in doubt, you can also use an absolute path.



In [29]:
import sys
sys.path

['/home/jupyter-jkitchin@andrew.cm-11dd7/s24-06643/sse/03-python-packaging',
 '/opt/tljh/user/lib/python39.zip',
 '/opt/tljh/user/lib/python3.9',
 '/opt/tljh/user/lib/python3.9/lib-dynload',
 '',
 '/home/jupyter-jkitchin@andrew.cm-11dd7/.local/lib/python3.9/site-packages',
 '/opt/tljh/user/lib/python3.9/site-packages']

In [31]:
sys.path.insert(0, 'package-root')
print(sys.path)
import testpack

['package-root', '/home/jupyter-jkitchin@andrew.cm-11dd7/s24-06643/sse/03-python-packaging', '/opt/tljh/user/lib/python39.zip', '/opt/tljh/user/lib/python3.9', '/opt/tljh/user/lib/python3.9/lib-dynload', '', '/home/jupyter-jkitchin@andrew.cm-11dd7/.local/lib/python3.9/site-packages', '/opt/tljh/user/lib/python3.9/site-packages']


Now, we can access the hello function that is present in the `__init__.py` file. We have to use the dot notation to access this.



In [26]:
testpack.hello('Class')

'Hello Class'

In [32]:
sys.path.remove('package-root')
sys.path

['/home/jupyter-jkitchin@andrew.cm-11dd7/s24-06643/sse/03-python-packaging',
 '/opt/tljh/user/lib/python39.zip',
 '/opt/tljh/user/lib/python3.9',
 '/opt/tljh/user/lib/python3.9/lib-dynload',
 '',
 '/home/jupyter-jkitchin@andrew.cm-11dd7/.local/lib/python3.9/site-packages',
 '/opt/tljh/user/lib/python3.9/site-packages']

In [33]:
! pwd

/home/jupyter-jkitchin@andrew.cm-11dd7/s24-06643/sse/03-python-packaging


In [35]:
%%bash
cd package-root 
pip install -e .

Defaulting to user installation because normal site-packages is not writeable
Obtaining file:///home/jupyter-jkitchin%40andrew.cm-11dd7/s24-06643/sse/03-python-packaging/package-root
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Installing collected packages: testpack
  Running setup.py develop for testpack
Successfully installed testpack-0.0.1


In [1]:
import testpack
testpack.__file__

Loading testpack! Version 1


'/home/jupyter-jkitchin@andrew.cm-11dd7/s24-06643/sse/03-python-packaging/package-root/testpack/__init__.py'

In [3]:
! pip uninstall testpack

Found existing installation: testpack 0.0.1
Uninstalling testpack-0.0.1:
  Successfully uninstalled testpack-0.0.1


In [5]:
! rm -r package-root/testpack.egg-info

# Version control

It is tempting to start modifying the package right away. That would probably be a mistake though. What if we do something that breaks it? How would we recover back to a working state? The solution to this problem is called *version control*. It is an essential part of software development. We will use git (https://git-scm.com/doc) for version control. 

With git, we will create a *repository* in our package-root. Then we can *commit* changes we make to files in the repository as we go. If some changes don't work out, we can *revert* them. We can also make *branches* to test ideas out on. 

To get started, we need to tell git about ourselves. Open a terminal, and run these commands (obviously, change the name and email to yours):

    

In [6]:
%%bash
git config --global user.name "John Kitchin"
git config --global user.email johndoe@example.com

In [10]:
! git status

On branch main
Your branch is ahead of 'origin/main' by 11 commits.
  (use "git push" to publish your local commits)

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	[31mmodified:   03-python-packaging.ipynb[m

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	[31m../00-introduction/sorted-list.dat[m
	[31m../00-introduction/untitled.txt[m
	[31m../01-rest-api-openalex/test.py[m
	[31m../05-python-classes/ris.txt[m
	[31m../Untitled.ipynb[m

no changes added to commit (use "git add" and/or "git commit -a")


In [8]:
! git config -l

user.email=johndoe@example.com
user.name=John Kitchin
core.repositoryformatversion=0
core.filemode=true
core.bare=false
core.logallrefupdates=true
remote.origin.url=https://github.com/jkitchin/s24-06643
remote.origin.fetch=+refs/heads/main:refs/remotes/origin/main
branch.main.remote=origin
branch.main.merge=refs/heads/main


That should create a file called ~/.gitconfig. Check out the contents:

    

In [9]:
! cat ~/.gitconfig 

[user]
	email = johndoe@example.com
	name = John Kitchin


Next, run this command to create a git repository.   

In [11]:
%%bash
cd package-root
git init

Initialized empty Git repository in /home/jupyter-jkitchin@andrew.cm-11dd7/s24-06643/sse/03-python-packaging/package-root/.git/


In [13]:
%%bash
tree -a package-root

package-root
├── .git
│   ├── branches
│   ├── config
│   ├── description
│   ├── HEAD
│   ├── hooks
│   │   ├── applypatch-msg.sample
│   │   ├── commit-msg.sample
│   │   ├── fsmonitor-watchman.sample
│   │   ├── post-update.sample
│   │   ├── pre-applypatch.sample
│   │   ├── pre-commit.sample
│   │   ├── pre-merge-commit.sample
│   │   ├── prepare-commit-msg.sample
│   │   ├── pre-push.sample
│   │   ├── pre-rebase.sample
│   │   ├── pre-receive.sample
│   │   └── update.sample
│   ├── info
│   │   └── exclude
│   ├── objects
│   │   ├── info
│   │   └── pack
│   └── refs
│       ├── heads
│       └── tags
├── .ipynb_checkpoints
│   └── README-checkpoint.md
├── LICENSE
├── README.md
├── setup.py
└── testpack
    ├── __init__.py
    ├── .ipynb_checkpoints
    │   └── __init__-checkpoint.py
    └── __pycache__
        └── __init__.cpython-39.pyc

14 directories, 23 files


    
You should see something like:

    Initialized empty Git repository in /home/jupyter-jkitchin@andrew.cm-11dd7/src/lectures/03-python-packaging/package-root/.git/
    
a new directory has been created in the folder called .git. This is where your git repository is stored. So far, there is nothing in it. Let's check the status.

In [14]:
%%bash
cd package-root
git status

On branch master

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	.ipynb_checkpoints/
	LICENSE
	README.md
	setup.py
	testpack/

nothing added to commit but untracked files present (use "git add" to track)


git is telling us that we are on the master branch and we have many untracked files. Today it is more favorable for the default branch to be called ~main~ rather than master (https://www.theserverside.com/feature/Why-GitHub-renamed-its-master-branch-to-main). Let's change that. We just checkout a new branch called main. 

    git checkout -b main
    



In [15]:
%%bash
cd package-root
git checkout -b main

Switched to a new branch 'main'


Now, we can add files. There are some files we want to ignore. For example, .ipynb_checkpoints does not need to be under version control, and there is a `__pycache__` we don't need in the repository. Let us set up a .gitignore file. This goes in the package-root directory. I do it here with shell commands, but you can also open an editor and write it directly. Now, running `git status` should not show those files.

We use > to redirect output into a file. This will overwrite the file each time you use it. To append, we use >>.



In [16]:
%%writefile package-root/.gitignore
__pycache__
.ipynb_checkpoints

Writing package-root/.gitignore


In [17]:
%%bash
cd package-root
git status

On branch main

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	.gitignore
	LICENSE
	README.md
	setup.py
	testpack/

nothing added to commit but untracked files present (use "git add" to track)


The next step is to add and commit the files. Since we have set up the .gitignore file, we will take a shortcut this time, and add everything. Then, we commit the files.

    git add *
    git commit -m "First commit"



In [18]:
%%bash
cd package-root
git add *
git status

On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   LICENSE
	new file:   README.md
	new file:   setup.py
	new file:   testpack/__init__.py

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	.gitignore



In [19]:
%%bash
cd package-root
git commit -m "First commit"
git status

[main (root-commit) 33bfc38] First commit
 4 files changed, 29 insertions(+)
 create mode 100644 LICENSE
 create mode 100644 README.md
 create mode 100644 setup.py
 create mode 100644 testpack/__init__.py
On branch main
Untracked files:
  (use "git add <file>..." to include in what will be committed)
	.gitignore

nothing added to commit but untracked files present (use "git add" to track)


In [23]:
! ls * 

03-python-packaging.ipynb

package-root:
LICENSE  README.md  setup.py  testpack


In [24]:
%%bash
cd package-root
git status

On branch main
Untracked files:
  (use "git add <file>..." to include in what will be committed)
	.gitignore

nothing added to commit but untracked files present (use "git add" to track)


Note that the wild-card did not match the .gitignore file. We have to add and commit that separately.



In [25]:
%%bash
cd package-root
git add .gitignore
git commit -m "Add the .gitignore file"
git status

[main d60276d] Add the .gitignore file
 1 file changed, 2 insertions(+)
 create mode 100644 .gitignore
On branch main
nothing to commit, working tree clean


Now we have a "clean" repository. All files are added and committed, and `git status` tells us everything is good. We have made two commits so far.



In [26]:
%%bash
cd package-root
git log

commit d60276d67d05e3b2dd552fbaa14c281e065c951d
Author: John Kitchin <johndoe@example.com>
Date:   Wed Mar 20 18:48:57 2024 +0000

    Add the .gitignore file

commit 33bfc388619ea6d316265bdf7f4ca2511de7aedb
Author: John Kitchin <johndoe@example.com>
Date:   Wed Mar 20 18:47:28 2024 +0000

    First commit


In the log, you can see the two commits, and each one is identified by a long hash, e.g. commit 33a50e04b75c90b34a274aea287dd1e6c6c045de. This is a unique cryptographic hash of the content that we committed, and we can use it to see what happened or changed, to revert changes, etc. We will return to that later. Now, we are ready to safely make some changes to our package. By safely, I mean we will be able to undo changes, revert changes, see what changes were made, etc.



In [28]:
%%bash
cd package-root
pip install -e .

Defaulting to user installation because normal site-packages is not writeable
Obtaining file:///home/jupyter-jkitchin%40andrew.cm-11dd7/s24-06643/sse/03-python-packaging/package-root
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Installing collected packages: testpack
  Running setup.py develop for testpack
Successfully installed testpack-0.0.1


In [4]:
import testpack

Loading testpack! Version 1


In [3]:
%%bash
cd package-root
git checkout testpack/__init__.py
git status

Updated 1 path from the index


On branch main
Untracked files:
  (use "git add <file>..." to include in what will be committed)
	testpack.egg-info/

nothing added to commit but untracked files present (use "git add" to track)


# First package modification

There are lots of ways to use git. Here we explore the idea of using a `feature branch`. We have a working package, and we want to add a new feature in a way that minimizes the risk of messing up the current state. The strategy is that we make a new branch, do all our work there, and when we are satisfied with it, we merge it back on to main.

Let's see what we have so far. Our commit history is linear, and the current position is at the HEAD commit on `main`.



In [5]:
! cd package-root; git log --graph --oneline

* [33md60276d[m[33m ([m[1;36mHEAD -> [m[1;32mmain[m[33m)[m Add the .gitignore file
* [33m33bfc38[m First commit


## A feature branch
We are going to checkout a new branch, let's call it `feature`.



In [7]:
! pwd

/home/jupyter-jkitchin@andrew.cm-11dd7/s24-06643/sse/03-python-packaging


In [6]:
%%bash
cd package-root
git checkout -b feature
git status
git log --graph --oneline

Switched to a new branch 'feature'


On branch feature
Untracked files:
  (use "git add <file>..." to include in what will be committed)
	testpack.egg-info/

nothing added to commit but untracked files present (use "git add" to track)
* d60276d Add the .gitignore file
* 33bfc38 First commit


Now we can add some new features. Let's add a new function to the `__init__.py` file:

```

In [8]:
%%writefile -a package-root/testpack/__init__.py

def goodbye(name):
    return f'Goodbye {name}'

Appending to package-root/testpack/__init__.py


After you add that, save the file, and check your git status:

In [11]:
%%bash
cd package-root
git status

On branch feature
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   .gitignore
	modified:   testpack/__init__.py

no changes added to commit (use "git add" and/or "git commit -a")


This is telling us two things:
1. We are on the feature branch
2. There is a modified file.

Now, let's commit this change.



In [10]:
%%writefile -a package-root/.gitignore
*.egg-info

Appending to package-root/.gitignore


In [12]:
%%bash
cd package-root
git commit testpack/__init__.py -m "Add a new function"
git commit .gitignore -m "ignore the egg-info"
git log --graph --oneline

[feature 79cf89b] Add a new function
 1 file changed, 3 insertions(+)
[feature db1b581] ignore the egg-info
 1 file changed, 1 insertion(+)
* db1b581 ignore the egg-info
* 79cf89b Add a new function
* d60276d Add the .gitignore file
* 33bfc38 First commit


## Back and forth on branches

Before we go further, let's see that we can go back to the main branch where that addition does not exist, and then come back. First, we see what is in the file right now.



In [13]:
%%bash
cd package-root
git status
cat testpack/__init__.py

On branch feature
nothing to commit, working tree clean
print('Loading testpack! Version 1')

def hello(name):
    return f'Hello {name}'

def goodbye(name):
    return f'Goodbye {name}'


Now, we checkout the main branch. The change we made does not exist there.



In [15]:
%%bash
cd package-root
git checkout main
git status
cat testpack/__init__.py
cat .gitignore

Already on 'main'


On branch main
Untracked files:
  (use "git add <file>..." to include in what will be committed)
	testpack.egg-info/

nothing added to commit but untracked files present (use "git add" to track)
print('Loading testpack! Version 1')

def hello(name):
    return f'Hello {name}'
__pycache__
.ipynb_checkpoints


And now back to our feature branch. Now you see the new feature is back.



In [16]:
%%bash
cd package-root
git checkout feature
git status
cat testpack/__init__.py
cat .gitignore

Switched to branch 'feature'


On branch feature
nothing to commit, working tree clean
print('Loading testpack! Version 1')

def hello(name):
    return f'Hello {name}'

def goodbye(name):
    return f'Goodbye {name}'
__pycache__
.ipynb_checkpoints
*.egg-info


In [18]:
! cd package-root; git log --graph --oneline

* [33mdb1b581[m[33m ([m[1;36mHEAD -> [m[1;32mfeature[m[33m)[m ignore the egg-info
* [33m79cf89b[m Add a new function
* [33md60276d[m[33m ([m[1;32mmain[m[33m)[m Add the .gitignore file
* [33m33bfc38[m First commit


## Add a commit on main

git allows us to have many branches where we can add features, fix bugs, try new implementations, etc. You can make changes to all the branches simultaneously. For example, let's go back to the main branch to add some detail to the README.



In [19]:
%%bash
cd package-root
git checkout main
echo -e "\n\nThere is one function: testpack.hello." >> README.md
git commit README.md -m "document the function in the package"
git log --graph --oneline

Switched to branch 'main'


[main 2addc8c] document the function in the package
 1 file changed, 3 insertions(+)
* 2addc8c document the function in the package
* d60276d Add the .gitignore file
* 33bfc38 First commit


If we switch back to our feature branch, you will see that this new change does not exist.



In [24]:
%%bash
cd package-root
git checkout feature
cat README.md
git log --graph --oneline

Switched to branch 'feature'


Example package

There is one function: testpack.hello.
* db1b581 ignore the egg-info
* 79cf89b Add a new function
* d60276d Add the .gitignore file
* 33bfc38 First commit


In [26]:
%%bash
cd package-root
git checkout main
cat README.md
 git log --graph --oneline

Already on 'main'


Example package

There is one function: testpack.hello.


There is one function: testpack.hello.
* 2addc8c document the function in the package
* d60276d Add the .gitignore file
* 33bfc38 First commit


## merge main onto feature branch

Before we continue, we should merge the new change in main into our feature branch. 



In [27]:
%%bash
cd package-root
git checkout feature
git merge main
git log --graph --oneline

Switched to branch 'feature'


Merge made by the 'recursive' strategy.
 README.md | 3 +++
 1 file changed, 3 insertions(+)
*   c8c84e8 Merge branch 'main' into feature
|\  
| * 2addc8c document the function in the package
* | db1b581 ignore the egg-info
* | 79cf89b Add a new function
|/  
* d60276d Add the .gitignore file
* 33bfc38 First commit


Now, we can finish up our feature branch. Let's add some documentation to the README.md. Add some text about the new function you added, then commit the change.



In [28]:
%%bash
cd package-root
git status

On branch feature
nothing to commit, working tree clean


Finally, when satisfied with your feature branch, we go back to our main branch, and merge the feature into it. If you are done with the branch, it is a good practice to delete it. 



In [29]:
%%bash
cd package-root
git checkout main
git merge feature
git branch --delete feature
git log --graph --oneline

Switched to branch 'main'


Updating 2addc8c..aaa0dc0
Fast-forward
 .gitignore           | 1 +
 README.md            | 2 +-
 testpack/__init__.py | 3 +++
 3 files changed, 5 insertions(+), 1 deletion(-)
Deleted branch feature (was aaa0dc0).
* aaa0dc0 fixed documentation
*   c8c84e8 Merge branch 'main' into feature
|\  
| * 2addc8c document the function in the package
* | db1b581 ignore the egg-info
* | 79cf89b Add a new function
|/  
* d60276d Add the .gitignore file
* 33bfc38 First commit


Let's take some time to review what this git log shows. You can see there was some branching, with commits on different branches. You can see where the main branch was merged into the feature branch, and at the end where the feature branch was merged back into main.



In [30]:
# Check we don't have the branch anymore
! cd package-root; git branch -a

* [32mmain[m


# Try the new python function

We might naively just try it, but it does not work.



In [31]:
testpack.goodbye('John')

AttributeError: module 'testpack' has no attribute 'goodbye'

It doesn't work though. It is necessary to reload this package (or you have to restart the kernel). This is a limitation of how Python (and in particular the persistent environment in Jupyter lab) loads packages. We simply have to reload it like this.



In [32]:
import importlib
importlib.reload(testpack)
testpack.goodbye('John')

Loading testpack! Version 1


'Goodbye John'

In [34]:
! cd package-root; git log --graph

* [33mcommit aaa0dc051abe98bb4959b794f7fe6dc84b396f5c[m[33m ([m[1;36mHEAD -> [m[1;32mmain[m[33m)[m
[31m|[m Author: John Kitchin <johndoe@example.com>
[31m|[m Date:   Wed Mar 20 19:17:57 2024 +0000
[31m|[m 
[31m|[m     fixed documentation
[31m|[m   
*   [33mcommit c8c84e88eeb0ef3b34e47096441365a16779b4cf[m
[32m|[m[33m\[m  Merge: db1b581 2addc8c
[32m|[m [33m|[m Author: John Kitchin <johndoe@example.com>
[32m|[m [33m|[m Date:   Wed Mar 20 19:15:34 2024 +0000
[32m|[m [33m|[m 
[32m|[m [33m|[m     Merge branch 'main' into feature
[32m|[m [33m|[m 
[32m|[m * [33mcommit 2addc8cb2991f792b49aee778c036aacda7a2d7e[m
[32m|[m [33m|[m Author: John Kitchin <johndoe@example.com>
[32m|[m [33m|[m Date:   Wed Mar 20 19:13:02 2024 +0000
[32m|[m [33m|[m 
[32m|[m [33m|[m     document the function in the package
[32m|[m [33m|[m 
* [33m|[m [33mcommit db1b58107ad04c57d9edddaf76cfa02cc14089dc[m
[33m|[m [33m|[m Author: John Kitchin <john

In [40]:
! cd package-root; git show d60276d6:.gitignore

__pycache__
.ipynb_checkpoints


# Summary

We learned how to:

1. initialize a git repo
2. Add files and commit them to the repo
3. edit files and commit changes.
4. Create a feature branch
5. make changes on the feature branch
6. switch between branches
7. merge changes in branches
8. delete a feature branch.
9. Look at the commit log

git is an iceberg. You can learn a lot more from the [Pro Git book](https://git-scm.com/book/en/v2) and the [reference manual](https://git-scm.com/docs).

You should also read [https://third-bit.com/py-rse/git-advanced.html](https://third-bit.com/py-rse/git-advanced.html).

Today we learned about using branches to try making a change. The nice thing about branches is if you don't like the change, you can simply delete the branch, or go back to the main branch. If you do like it, then you just merge it in, and get on with your work.

There is still quite a bit to learn about git. We will get in to some of these things next time, including dealing with merge conflicts, 

