In [None]:
rm -rf repository

# introduction to `git`

`git` – and **v**ersion **c**ontrol **s**ystems (VCS) in general – remember the changes of files in "commits", which contain metadata and a "diff", the changes between two versions of a file.

for two versions of the same file, a set of differences can be computed. These differences are called "diff", and applying the diff to a file is called "to patch".

Each commit is also uniquely identified by the commit hash, which is a mathematical summary of the changes. An example for such a hash is `ded105a62b9d78717f8dc64652e3903190b585dd`.

Since hash values are not easy to remember and type, there are two forms of human-readable labels: tags, or static labels, and branches, or dynamic labels. For example, in the following graph:

```mermaid
gitGraph
    commit
    commit
    branch feature-branch
    checkout main
    commit
    checkout feature-branch
    commit
    commit
    branch feature-branch2
    commit
    commit
    checkout main
    merge feature-branch
    commit tag: "v0.3.1"
    checkout feature-branch2
    merge main
    commit
    checkout main
    commit
```

`main`, `feature-branch` and `feature-branch2` are branches (the white nodes are merge commits with multiple parents), and `v0.3.1` is a tag.

For more extensive explanations see the official [Git Book (website)](https://git-scm.com/book/en/v2).

With all that in mind, let's start by creating a repository:

## creating a repository

Repositories can be created using two methods:
- if we want to create a new repository: `git init`
- if we want to help with a repository that already exists: `git clone`

### repository initialization

In [None]:
mkdir repository
cd repository

In [None]:
git init
# or `git init .`

we can also do the same thing with
```bash
git init repository
cd repository
```

Next, we need to configure the repository: since git was designed to allow collaboration with other people, we need to tell `git` the name and email address so it knows who authored what. This information will be used to fill in the author and the (last) committer's information of a commit (we'll see what this is used for in the next section).

To do this, we use the `git config` command.

:::{note}

We're using the `--local` flag for `git config`. This flag, together with `--global`, `--system`, and `-f` / `--file`, controls the configuration file we write to:
- `--local` selects `.git/config`
- `--global` selects `~/.gitconfig`
- `--system` selects `/etc/gitconfig`
- `-f` / `--file` allow specifying a custom location

`--local` is the default when setting configuration values, but for reading `git config` will read all configuration files and merge them (local overrides global, which in turn overrides system)

:::

In [None]:
cat .git/config

In [None]:
git config --local --get-regexp 'user.'

In [None]:
git config user.name "The user's name"
git config user.email "user@example.com"

In [None]:
git config --local --get-regexp 'user.'

In [None]:
cat .git/config

## commits

git remembers changes to files (be that creating, modifying, or deleting) in the form of commits. To see the components of a commit, see [this section](#commit-contents).

A newly created repository will not have any commits at all, which we can verify by running `git status`:

In [None]:
git status

Use this every time you're not sure about the state of the repository.

### creating commits and the staging area

git calls the visible directory the "workdir", and that's where we make changes using `jupyter lab` or text editors.

We can select changes to commit using `git add`, which will add the changes to the staging area. This allows us to use multiple calls to `git add` until we're content with the changes to commit. If there's anything we want to remove, we can do so using `git rm --cached`. Once we're happy with the changes in the staging area, we can commit them using `git commit`.

You can see the relationship between workdir, stage, and commits here:
```mermaid
graph LR
    A((workdir)) -- git add --> B((stage))
    B -- git rm --> A
    B -- git commit --> C((commit))
    C -- git reset --> B
```

Let's start by creating two files:

In [None]:
echo "a" > file1
echo "b" > file2
git status

git tells us that there are 2 untracked files, and helpfully advises to use `git add` to track the files. Let's start with one of the files:

In [None]:
git add file1
git status

we can look at the changes using `git diff`:

In [None]:
git diff --staged  # for changes in the staging area

This is called a "unified patch", a text format that encodes changes between files. The most important bits are:
- we compare the first version (a) of `file1` with a second version (b) of `file1`
- `/dev/null` is a marker for "does not exist"
- at line 0, we insert a line containing `a`

:::{note}
`git diff` doesn't print any changes in the workdir, even though `file2` is marked in red. The reason for that is that `git diff` only shows the changes of tracked files. Since file2 is not tracked yet, `git` doesn't consider it. Files added to the staging area count as "tracked", so we can modify `file1` to see the workdir changes:
:::

In [None]:
git diff  # for changes in the workdir

Instead, let's try changing the tracked `file1`:

In [None]:
echo "c" >> file1
git diff

Let's look again at the state of the repository:

In [None]:
git status

We don't want to keep the new changes to `file1`, so as `git` suggests, let's use `git restore`:

In [None]:
git restore file1

In [None]:
git status

We do want to track `file2`, though, so let's add that, as well:

In [None]:
git add file2
git status

If we're happy, we can commit:
:::{note}
If we don't add `-m <message>`, this would usually open an editor, which doesn't work too well in a notebook.
:::

In [None]:
git commit -m "first commit"

In [None]:
git status

### commit contents

Commits were originally built on emails (people used to mail around diffs), so they consist of:
- the creation time
- the author (the user first creating this changeset) in the form of `User <email-address>`
- the time of last modification
- the committer (the user who last modified the commit) in the form `User <email-address>`
- the hash value of one or two parents
- the commit message
- the changeset in the form of a diff (a text representation of the changes)
- a hash of all that information as a unique id (the current commit's id)

#### The commit message

By convention, the commit message consists of:
- a one-line summary of the changes within the commit (the recommendation is to keep that below ~70 characters)
- optionally more text separated from the summary by a blank line

## inspecting the history of changes

With at least one commit in the repository, we can look at the history of commits using `git log`:

In [None]:
git log

Let's try creating a few other commits:

In [None]:
for i in {1..10}; do
    echo "$i" > file3
    git add file3
    git commit -m "${i}th change to file3"
done

In [None]:
git status

In [None]:
git log

To see the changes of one commit, we can use `git show`:

In [None]:
git show -p

To see all changes, we can use `git log`:

In [None]:
git log -p

## Trying multiple things at the same time: branches

When working on one thing there might be something popping up that needs immediate attention, or we run out of steam / into a road block on one feature and would like to postpone until that has been resolved, or we might have someone else working on the same project.

`git` has a feature called `branch` that allows "branching out" from a commit and keep multiple chains of commits. For example:
```mermaid
gitGraph
    commit
    commit
    branch feature-branch
    commit
    commit
    commit
    checkout main
    commit
    checkout feature-branch
    commit
    commit
    checkout main
    merge feature-branch
    commit
```
In this case, we start at commit 0, and create commit 1. Then we branch out into branch "feature-branch" and create two more commits. After that, we go back to the original branch (commonly "main") and create a new commit there, then add two more commit to "feature-branch". Finally, we merge "feature-branch" back into "main" and create a new commit afterwards.

The commands we can use for this are:

- `git branch` to create or interrogate branches
- `git switch` and `git checkout` to switch between branches / commits
- `git merge` to merge branches

### Creating, manipulating or interrogating branches

We can create a new branch using `git branch <name>`:

In [None]:
git status

In [None]:
git log --oneline --graph

In [None]:
git branch feature-branch

In [None]:
git status

To then work on the branch, we have to switch to it:

In [None]:
git switch feature-branch

In [None]:
git status

We can then add a few more commits:

In [None]:
echo "1" > file1; git add file1; git commit -m "branch commit 1"
echo "2" > file1; git add file1; git commit -m "branch commit 2"

In [None]:
git status

In [None]:
git log --oneline --graph

We can look at the existing branches (the one marked with a `*` is the current branch):

In [None]:
git branch

For more information, add `-v`, and to include remote branches as well, use `-a` (together: `-av`).

`git log` can take the branch (or the commit hash / tag) to display the log of a commit that is not currently checked out:

In [None]:
git log --oneline --graph feature-branch

To delete branches, use `git branch -d` (or `-D` to avoid `git`'s safety checks):

In [None]:
git branch feature2
git branch
git status
git branch -d feature2

In [None]:
git branch

In [None]:
git branch feature2
git switch feature2
git branch -d feature2

### Merging branches

The operation of joining branches back together is called "merging". There are three different ways of merges:
- fast-forward merge
- squash-merges
- normal merges

The latter two may see merge conflicts we have to resolve before completing the merge.

In [None]:
git switch main
export current_main=$(git show -q --pretty=format:%H main)
echo $current_main

#### Fast-forward merge

For this kind of merge we simply move the current branch to the last commit on the merged branch.

:::{note}
This is only possible if the two branches only differ by the commits on the merged branch.
:::

:::::{grid} 1 1 2 3
::::{card}
---
header: Before the merge
---
:::{mermaid}
gitGraph
    commit id: "0-a1cbbe0"
    branch feature-branch
    commit id: "1-ac3d30a"
    commit id: "2-4db5765"
    commit id: "3-4488b52"
:::
::::
::::{card}
---
header: After the merge
---
:::{mermaid}
gitGraph
    commit id: "0-a1cbbe0"
    commit id: "1-ac3d30a"
    commit id: "2-4db5765"
    commit id: "3-4488b52"
:::
::::
:::::

In [None]:
git merge --ff feature-branch

In [None]:
git log --oneline --graph

In [None]:
git reset --hard $current_main

#### Squash merge

For this kind of merge, we combine the commits of the merged branch into a single commit and add it to the current branch.

:::::{grid} 1 1 2 3
::::{card}
---
header: Before the merge
---
:::{mermaid}
gitGraph
    commit id: "0-a1cbbe0"
    branch feature-branch
    commit id: "1-ac3d30a"
    commit id: "2-4db5765"
    commit id: "3-4488b52"
:::
::::
::::{card}
---
header: After the merge
---
:::{mermaid}
gitGraph
    commit id: "0-a1cbbe0"
    commit id: "1-9e45b8f"
:::
::::
:::::

In [None]:
git merge --squash feature-branch
git commit -m "squashed feature"

In [None]:
git log --oneline --graph

In [None]:
git show -p

In [None]:
git reset --hard $current_main

#### Standard merge

If we want to keep the history of commits, and there are commits on the branch we merge into (`main`), we have no choice but to create a special merge commit: a commit with two parents, the last commit of both branches that are involved.

:::::{grid} 1 1 2 3
::::{card}
---
header: Before the merge
---
:::{mermaid}
gitGraph
    commit id: "0-a1cbbe0"
    branch feature-branch
    commit id: "1-ac3d30a"
    commit id: "2-4db5765"
    checkout main
    commit id: "3-4a1aba1"
    commit id: "4-3bcb400"
    checkout feature-branch
    commit id: "5-4488b52"
    checkout main
:::
::::
::::{card}
---
header: After the merge
---
:::{mermaid}
gitGraph
    commit id: "0-a1cbbe0"
    branch feature-branch
    commit id: "1-ac3d30a"
    commit id: "2-4db5765"
    checkout main
    commit id: "3-4a1aba1"
    commit id: "4-3bcb400"
    checkout feature-branch
    commit id: "5-4488b52"
    checkout main
    merge feature-branch
:::
::::
:::::

In this case, the merge commit has the parents `5-4488b52` (the latest commit on `feature-branch`) and `4-3bcb400` (the latest commit on `main`).

In [None]:
# empty commit for demonstration purposes
git commit --allow-empty -m "empty commit"

In [None]:
git merge feature-branch -m "Merge commit"  # will open an editor without `-m`, which does not work in `jupyterlab`

In [None]:
git log --oneline

In [None]:
git log --oneline --graph

In [None]:
git reset --hard $current_main

### Merge conflicts

If both branches changed the same areas within the files, the merging algorithm won't know how combine the changes and will create "merge conflict" markers within the files.

In [None]:
echo "0" > file1; git add file1; git commit -m "commit 0"  # to create a conflict

In [None]:
git merge feature-branch

In [None]:
git status

In [None]:
git diff

In [None]:
# resolve the conflict (edit / remove conflict markers + add to staging area)
echo "-1" > file1; git add file1

In [None]:
git status

In [None]:
# or git merge --continue
git commit -m "merge commit" # same here: avoid opening the editor

In [None]:
git log --oneline --graph

In [None]:
git reset --hard $current_main

## github / gitlab workflow

The current two main software forges, `github` and `gitlab`, promote a workflow where changes are proposed by:
- creating personal (but publicly readable) copies of repositories ("forks")
- pushing a branch containing the proposed changes to the fork
- creating a "request for changes". github calls this Pull Request (PR), while gitlab calls it Merge Request (MR)

For both, we need to
- create the fork
- setup access (ssh, https + token)
- connect the local git repositories to remote copies (git: `remote`), most importantly the main repository and the personal fork

### Setup

#### github
- [create an SSH key](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent) (once per machine):
  :::{code-block} bash
  ssh-keygen -t ed25519 -f ~/.ssh/github
  :::
- [register the new SSH key](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/adding-a-new-ssh-key-to-your-github-account)
- use `git@github.com` urls as remotes

Full tutorial: https://docs.github.com/en/get-started/start-your-journey/hello-world

#### gitlab

:::{note}
Some privately hosted gitlab instances (like https://gitlab.ifremer.fr) don't support authentication with SSH. In that case, we have to go with token authentication.
:::