# Git course

### by Ralph Heinkel
rh@ralph-heinkel.com

<small>Published under [Creative Commons Attribution ShareAlike (CC BY-SA) license](https://creativecommons.org/licenses/by-sa/4.0/).</small>
[<img style="vertical-align: left;" src="Images/cc-by-sa.png">](https://creativecommons.org/licenses/by-sa/4.0/)

# About Ralph Heinkel

### History:
 * Studied Medical Computer Science at University of Heidelberg
 * Diploma theses at EMBL in structural biology
 * Supercomputing Resource Manager at EMBL
 * IT unit director at Cenix Bioscience GmbH
 
### Current:
 * Freelance biocomputing and IT consultant (GlaxoSmithCline, Oxford University, SAP, ...)


# Overview
 1. Git recap
 2. Github
   - Let's inspect a mature project
   - Your own github learning project
 3. git command line interface (CLI)
 4. Cloning and Pushing (to/from local repo)
 5. Collaborative working

##  Detailed overview - Git recap
- What is a version control system?
- What is git? Distributed version control systems (DVCS)
  - No central server, peer-to-peer instead
  - Full copy of repo on every peer's computer
  - Can be used offline, most operations are local anyway
  - Only patches are transferred when sync'ing with peers
  - Releases usually coordinated on reference repo (e.g. github)
- Structure of git
  - working directory / staging area / git repo
  - three stages (modified / staged / committed)
  - git hash (checksum) for each commit (for everything)


## Detailed overview - Github

Let's inspect a mature project
https://github.com/pallets/click

- General stuff
  - Commits
  - Branches
  - Releases
  - Contributors
  - Forks
- Inspecting files
  - History
  - Looking at diffs
  - Blame / Annotate

## Detailed overview - Github learning project
- create repo, check the 'Create a README'
- change README, make commit, check log
- create a branch, commit multiple README changes
- open a pull request
- inspect pull request, possibly code review
- merge pull request
- look at network graph

## Detailed overview - git CLI
- git clone
- git config
- git log (--decorate)
- git add
- git commit
- git push

## Detailed overview - Collaborative working
- Find a peer (e.g. your neighbour)
    - Owner: clone your repo
    - Peer: fork your neighbour's repo and clone it
- Do further commits:
  - add new files
  - change existing lines in common README.md file
- Push changes, each to his/her own (forked) repo
- Owner and Peer togther solve merge conflict in Github



# 1. Git recap

# What is git?

It is a **Distributed Version Control System DVCS**

## A what? 

## And why would we need this?

# Does this look familiar?

You have a document, work on it for some time, creating multiple versions ...

... and store them like `master_thesis.docx`

... and then like `master_thesis_final.docx`

... and `master_thesis_final_with_corrections.docx`

... and `master_thesis_final_with_more_corrections.docx`

... and `master_thesis_really_final.docx`

... and `master_thesis_really_final2.docx`
...

## Then: time has come for a 

# Version Control System (VCS)

# A Local VCS
<img style="height: 600px;" align="center" src="Images/local-version-control-system.png">

Examples: `rcs`, but also `git`.

<small>Source of image (and all other below as well): https://www.atlassian.com/git/tutorials</small>

# A Centralized VCS
<img style="height: 600px;" align="center" src="Images/centralized-version-control-system.png">

Examples: `cvs`, `subversion`.

# A Distributed VCS
<img style="height: 600px;" align="center" src="Images/distributed-version-control-system.png">

Examples: `git`, `mercurial`

# Some git repository characteristics

- No central server, peer-to-peer instead
- Full copy of the repo on every peer's computer
- Can be used offline, most operations are local anyway
- Only patches are transferred when sync'ing with peers (push)
- Releases usually coordinated on reference repo (e.g. github)

# Git creates snapshots of a 'mini-filesystem'

<img style="height: 400px;" align="center" src="Images/checkins-over-time-2.png">

In order to obtain Version 3 (A1, B, C2), the 'vertical' snapshot can be checkout with:
```
Cmd:         git checkout <hash of version 3>
i.e.         git checkout 142d6bd3b81629a4435f4c9bd425e2d5
or shorter:  git checkout 142d6bd3b    # (if unique)   
or with tag: git checkout Version3
```

### Every commit has a unique hash (a checksum) within the repo!

# Let's go to Github

# Tasks on Github
(we go through this step by step in a minute)
1. create repo (accept all defaults)
1. add a README file, make commit
1. check log, check diff
1. create a branch, commit multiple README changes
1. open a pull request
1. inspect pull request, possibly code review
1. merge pull request
1. look at network graph

# Let's work on the first three steps on Github

0. Login with your own user account (you have one, do you?)
1. create repo
    - choose a name, keep any other default value
    - by default every new repo contains a 'master' branch.
2. Let's change the master branch in the repo
    - click on 'Create new file'
    - name it README.md, add some content (do you know 'markdown'?), use preview
    - at the bottom add a commit message, and commit. Check log.
    - add more files, check log again
    - change those added files
3. look at logs and diffs

# What is a branch? And how to deal with it ...

- Branch allow to store variations of repo content in a separate 'area'
- Can be easily created, merged back, but also deleted
- Useful for creating new or experimental features
- Good practice is to 'code review' them before merging into main branch

### Branch before merge:
<img style="height: 300px;" align="center" src="Images/branch-before-merge.png">

### Branch after merge (e.g. via pull request):
<img style="height: 300px;" align="center" src="Images/branch-after-merge.png">

# Back to Github

1. Create a branch from master<br/>
   - click on the 'branch:master' button. 
   - a drop down opens
   - type in a name for the new branch, e.g. 'my-feature'
2. Add at least one new file and few commits (also to the README)

Then we take care about merging those changes back to the 'master' branch by a Pull Request on the next section.

# What is a pull request?

### A pull request is a request to the maintainer of a project to merge a branch into the main repo (into the master branch, in our case).

The branch to be merged can be located
 - within the same repo
 - but more often within a (forked) repo (on Github)

### A pull request 
 - is usually created by the person working on the feature (branch)
 - can - or better: should - be combined with a "code review" by the maintainer

# Create a pull request for your feature branch

- Click on the 'pull requests' tab in the sub menu bar
- Github will immediately propose possible pull requests
- Click on the 'Compare & pull request' button
- Provide some details of your feature to help the maintainer. Optionally provide extended details.
- Click on 'Create pull request'.

## Now switching roles, pretend to be the maintainer
- Checkout the contents of the pull request<br/>
  Look at the 'commits' and 'files changed' tab
- On the 'files changed' tab you can move over the lines with your mouse, a blue 'add' button should appear
- If you click you'll be offered a little entry text box, allowing to provide a comment to the auther of the feature branch. The author will automatically receive an email about your comment.
- Click on the green 'Review changes' and decide how to handle this request (accept, reject)

# The git command line interface (CLI)

# First steps:

- Clone the repo -> get the URL from Github!

  `git clone https://github.com/<user>/<repo>`<br/>
  `cd <repo>`
  
- Check which branch you are on:

  `git branch`
  
- Show available branches (the asterix tells the current one):

  `git branch --all`
  
- Checkout a different branch:

  `git checkout somebranch`

# Anatomy of a local git repo

<img style="height: 400px;" align="center" src="Images/local-operations.png">


# Inspecting and modifying the state of the local repo

Use `git status` to inspect the state of the local repo:
```
$ git status
# On branch master
nothing to commit, working directory clean
```

Add a new file and inspect again:
```
$ echo 'my new D file' > D
$ git status
# On branch master
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#       D
nothing added to commit but untracked files present (use "git add" to track)
```

### Adding the new file to the staging area
```
$ git add D
$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#       new file:   D
```
A followup `git commit` would now commit D to the repo.

### Unstaging a file
```
$ git reset D
$ git status
# On branch master
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#       D
nothing added to commit but untracked files present (use "git add" to track)
```
**Important:** The new file keeps its content, but is unstaged.

# Changing a committed file...
```
$ echo 'some new content' > C
# On branch master
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#       modified:   C
```
# ... and reverting the change:
```
$ git checkout C
$ cat C
C3
```

### Before we do an actual commit:
# Intermezzo: Some basic git configuration (1)

Commiting a file to git will store
 - the (human readable) name of the committer
 - and her/his email address
 
to the commit. These two items are important to be set prior to the first commit:
```
git config [--global] user.name 'Donald Duck'
git config [--global] user.email 'donald@duck-city.com'
```
Local (i.e. non-global) configurations are stored in `.git/config` of your repo, and are only valid for this specific repo.<br/>
Global configurations are stored in `~/.gitconfig`. This configuration is valid for all repos with do not override this information locally.


# Committing a change

So, let's make some change to file `C`: Edit it, add or remove lines, and save your changes.

Then commit the change to your **local** repo:
```
git add C
git commit -m 'my intelligent changes to C' 
```
Typing `git log` will show a new entry in your commit history.

**Note:** This change has not yet been trasferred to the remote repo on Github.

### Before pushing to github
# Intermezzo: Some basic git configuration (2)

There are two ways to clone a git repo (via https):
 1. `git clone https://github.com/<owner>/<repo>` (no username given)
 2. `git clone https://<user>@github.com/<owner>/<repo>` (with username given)

In order to be able to push Github needs to know *who* you are. 
When pushing you can avoid being asked for a username each time by cloning with variant 2. In this case the username will automatically provided to Github at each push.

## Setting username afterwards
If you've cloned with variant 1 (without username) you can set it in a separate step:

`git config [--global] credential.https://github.com.username <user>`

# Providing your password to Github

Not only does the push command require a username but also your password. The combination of username and password is called *credentials*. 

There are multiple ways that git can retrieve your password, and one possibility is by setting the shell environment variable GIT_ASKPASS or SSH_ASKPASS. These can be configured to retrieve your password e.g. from a password manager. Since we don't have one setup here we need to unset those variables:
```bash
unset GIT_ASKPASS SSH_ASKPASS
```
Having those variables unset you will simply be prompted for a password in your shell.

## Caching passwords (for a certain time)
It is nice not having to type your password for every single push, so a cache can be installed with:

`git config [--global] credential.helper 'cache --timeout 900'` (the default)

# Pushing (finally)
After we have now configured everything we are finally ready to push our change to file `C` to Github:
```
git push
# enter your password
```
Wow - we're done ;-)

# Dealing with branches

# Branching, committing
This is sort of the default work flow in practical day work, e.g. for a new feature.

Start with creating a new branch from master, then add and/or edit existing files:
```
git checkout -b some-feature
```
This is a shortcut for `git branch some-feature; git checkout some-feature`.<br/>
Now apply some modifications, and commit them (use two different commits, to get a bit of history, and only change file C for now).
```
echo 'C++' > C; git commit -a -m 'changed C to C++'
echo 'C++' >> C ; git commit -a -m 'changed C to C++,C++'
```

# Pushing your feature branch to Github
It might be interesting to also push changes in your local feature branch to your (new) feature branch to Github in order to
- have a backup
- make it available to collaborators.

So let's apply the `push`-command. What happens?

Git complains that it doesn't know where to push at. Naturally we would assume that the repo should be pushed to a remote branch with the same name. <br/>
But git does not work with assumptions. It wants definite instructions!

So be more specific:
```
git push -U origin some-feature
```
Read this as: Push my current branch to a branch called `some-feature` on the remote server, which is called origin by default. And memorize this connection btw. local and remote branch.


# Current situation is:
<img style="height: 700px;" align="center" src="Images/simple-branch-before-and-after-merge.svg">

# Merging the feature branch into master

Finally checkout master again, add another change (make sure not to change C for now, but some other file instead), and finally merge the `some-feature`-branch into master:
```
git checkout master
echo 'Bxx' > B; git commit -a -m 'changed B to Bxx'
git merge some-feature
```

# Look at the log, with the `--graph` option
```
$ git log --graph --decorate --oneline

*   525f62f (HEAD, master) Merged branch 'some-feature'
|\  
* | 8d3270a changed B to Bxx
| * f693afa (some-feature) changed C to C++,C++
| * c38472d changed C to C++
|/  
* 8f211ee modified C
```



# Dealing with merge conflicts

A merge conflict will occur if the same area of one or many files has been changed on different branches.

Let's create an artificial merge conflict:
- From master create a new branch 'myconflict', and check branch out.
- Make some modifications to file C in there, and commit them.
- Checkout master again, make modifications to file C in the same lines. Commit.

Now try to merge branch 'myconflict' into master: `git merge myconflict`.

What happens? 

Git will report some thing like this:
```
Auto-merging C
CONFLICT (content): Merge conflict in C
Automatic merge failed; fix conflicts and then commit the result.
```
Typing `cat C` will show the conflicting area of the two versions:
```
<<<<<<< HEAD
C3-xxxxxxxx
=======
C3-0000
>>>>>>> myconflict
```
Line(s) from HEAD to the line with the equal signs are located in master, the one(s) below come from your 'myconflict' branch. You will now have to edit the file and decide manually which change you prefer to keep. Then remove all lines with `>` or `<` and with the equal signs. Save your file. Then `git add C; git commit -m 'merged my branches'` and you're set.

Looking at the graph will show that your merge has been successful:
```
$ git log --graph --decorate --oneline
*   93f4dfc (HEAD -> master) merge my files
|\  
| * 91ff9b3 (myconflict) changed C in myconflict branch
* | 8d08763 changed C in master branch
|/  
* e8b0e83 Version 5
```