# Torvalds' Divine Comedy: paths through the midway of your Git journey

**Note:** The starting point for this tutorial was

- [Version control for fun and profit](http://nbviewer.jupyter.org/github/fperez/reprosw/blob/master/Version%20Control.ipynb) by [Fernando Perez](http://fperez.org/), which in turn was particularly modeled on, and therefore owes a lot to, the materials in:

- ["Git for Scientists: A Tutorial"](http://nyuccl.org/pages/GitTutorial) by John McDonnell 
- Emanuele Olivetti's lecture notes and exercises from the G-Node summer school on [Advanced Scientific Programming in Python](https://python.g-node.org/wiki/schedule).

In particular it re-reuses the excellent images from the [Pro Git book](http://git-scm.com/book) that John had already selected and downloaded, as well as some of his outline.  But Fernando's version of the tutorial aimed to be 100% reproducible by being executed directly as an IPython notebook.  Many thanks to Fernando, John, and Emanuele for making their materials available online.

This Brandeis fork of Fernando's interactive tutorial, however, is aimed at computer science-savvy people who know the basics of Git, but don't always feel that they know why they're repeating these incantations in the order they're repeating them, or what exactly is going through Git's little state machine when it barfs error messages. This tutorial also assumes little knowledge of unix concepts, and makes a point of mentioning little unix tips as asides.

The key divergence from Fernando's basis is some excruciating detail on the front end, and then the creation of additional common-and/or-scary situations with guidance to resolve them.

We also use the `bash` kernel for Jupyter, rather than explicitly shelling out at every step as was necessary in Fernando's notebook. Install it from https://github.com/takluyver/bash_kernel, or you can try running this cell (I haven't tried this without the bash kernel installed)

In [None]:
%%bash

pip install bash_kernel
python -m bash_kernel.install

## The plan for this tutorial

This tutorial is structured in the following way: we will begin with a detailed account of how git keeps track of what happens when you `add`, `commit`, `merge`, etc.  We will then dive into hands-on work: after a brief interlude into necessary configuration we will discuss 5 "stages of git" with scenarios of increasing sophistication and complexity, introducing the necessary commands for each stage:
            
1. What happens when you `init`, `add`, and `commit`.
2. Single local user, branching
3. Using remotes as a single user
4. Remotes for collaborating in a small team
5. Ways things can go wrong
    
In reality, this tutorial only covers stages 1-5, since for #6 there are many software develoment-oriented tutorials and documents of very high quality online.  But most scientists start working alone with a few files or with a small team, so I feel it's important to build first the key concepts and practices based on problems scientists encounter in their everyday life and without the jargon of the software world.  Once you've become familiar with 1-4, the excellent tutorials that exist about collaborating on github on open-source projects should make sense.

## Very high level picture: an overview of key concepts

Have you seen a picture like this before?

![](git-transport.png)

The leftmost part, **workspace**, is actually just another name for "the files on your computer's filesystem" under git's (partial) control. You and git share control of the files within a repository, except that git doesn't even think about the existence of files that you haven't told it about (with `git add`).

One thing that I feel is underemphasized: **workspace is the least important part of this flow** from git's perspective. The state of your files is a separate concern from everything else that the repo deals with. One thing you should know, in order to feel safe in git, is that once you commit a state of the tracked files in your workspace, you can do whatever you want to the files in workspace. As long as you don't mess up the `.git/` directory, you can always get back to where you were when you committed. I think that's the reason why it is grey in the diagram above, and also why the `git diff` commmands are grey.

What's in the `.git/` directory, then? Let's create a new one.

## First things first: git must be configured before first use

The minimal amount of configuration for git to work without pestering you is to tell it who you are:

In [None]:
git config --global user.name "EDIT THIS AND WRITE YOUR FULL NAME HERE"
git config --global user.email "your@email.com"

 And how you will edit text files (it will often ask you to edit messages and other information, and thus wants to know how you like to edit your files):

In [None]:
# Put here your preferred editor. If this is not set, git will honor
# the $EDITOR environment variable
# git config --global core.editor /usr/local/bin/vim  # 

# On Windows Notepad will do in a pinch, I recommend Notepad++ as a free alternative
# On the mac, you can set nano or emacs as a basic option

# And while we're at it, we also turn on the use of color, which is very useful
git config --global color.ui "auto"

Set git to use the credential memory cache so we don't have to retype passwords too frequently. On Linux, you should run the following (note that this requires git version 1.7.10 or newer):

In [None]:
git config --global credential.helper cache
# Set the cache to timeout after 2 hours (setting is in seconds)
git config --global credential.helper 'cache --timeout=7200'

Github offers in its help pages instructions on how to configure the credentials helper for [Mac OSX](https://help.github.com/articles/set-up-git#platform-mac) and [Windows](https://help.github.com/articles/set-up-git#platform-windows).

### `git init`: create an empty repository

In [1]:
rm -rf test

git init test

Initialized empty Git repository in /Users/orion/Google Drive/2017Spring/Seminar/test/.git/


In [2]:
cd test

**Note:** all these cells below are meant to be run from the `test` directory. If you change directories in a cell that you create, subsequent git calls may do something funny; to fix it, `cd` back to the `test` directory.

# What's In An Empty Repo?

Let's look at the aftermath of `git init`:

In [3]:
ls -l

(There is no output). That's just because the file is hidden by having its name start with a period; you can show hidden files by adding the `-a` flag to `ls`:

In [4]:
ls -la

total 0
drwxr-xr-x   3 orion  staff  102 Mar 10 11:21 [34m.[39;49m[0m
drwxr-xr-x@ 14 orion  staff  476 Mar 10 11:21 [34m..[39;49m[0m
drwxr-xr-x   9 orion  staff  306 Mar 10 11:21 [34m.git[39;49m[0m


The `-R` flag lists a directory's contents recursively:

In [5]:
ls -lR .git

total 24
-rw-r--r--   1 orion  staff   23 Mar 10 11:21 HEAD
-rw-r--r--   1 orion  staff  137 Mar 10 11:21 config
-rw-r--r--   1 orion  staff   73 Mar 10 11:21 description
drwxr-xr-x  12 orion  staff  408 Mar 10 11:21 [34mhooks[39;49m[0m
drwxr-xr-x   3 orion  staff  102 Mar 10 11:21 [34minfo[39;49m[0m
drwxr-xr-x   4 orion  staff  136 Mar 10 11:21 [34mobjects[39;49m[0m
drwxr-xr-x   4 orion  staff  136 Mar 10 11:21 [34mrefs[39;49m[0m

.git/hooks:
total 88
-rwxr-xr-x  1 orion  staff   478 Mar 10 11:21 [31mapplypatch-msg.sample[39;49m[0m
-rwxr-xr-x  1 orion  staff   896 Mar 10 11:21 [31mcommit-msg.sample[39;49m[0m
-rwxr-xr-x  1 orion  staff   189 Mar 10 11:21 [31mpost-update.sample[39;49m[0m
-rwxr-xr-x  1 orion  staff   424 Mar 10 11:21 [31mpre-applypatch.sample[39;49m[0m
-rwxr-xr-x  1 orion  staff  1642 Mar 10 11:21 [31mpre-commit.sample[39;49m[0m
-rwxr-xr-x  1 orion  staff  1348 Mar 10 11:21 [31mpre-push.sample[39;49m[0m
-rwxr-xr-x  1 orion  staff  4951 Mar 1

We're going to be looking closely at `.git/objects` for a little while, so let's reiterate how boring and empty it is now:

In [6]:
ls -lR .git/objects

total 0
drwxr-xr-x  2 orion  staff  68 Mar 10 11:21 [34minfo[39;49m[0m
drwxr-xr-x  2 orion  staff  68 Mar 10 11:21 [34mpack[39;49m[0m

.git/objects/info:

.git/objects/pack:


# Now you start adding files

In [7]:
echo "My first bit of text" > file1.txt
ls -la

total 8
drwxr-xr-x   4 orion  staff  136 Mar 10 11:22 [34m.[39;49m[0m
drwxr-xr-x@ 14 orion  staff  476 Mar 10 11:21 [34m..[39;49m[0m
drwxr-xr-x   9 orion  staff  306 Mar 10 11:21 [34m.git[39;49m[0m
-rw-r--r--   1 orion  staff   21 Mar 10 11:22 file1.txt


In [8]:
ls -lR .git/objects

total 0
drwxr-xr-x  2 orion  staff  68 Mar 10 11:21 [34minfo[39;49m[0m
drwxr-xr-x  2 orion  staff  68 Mar 10 11:21 [34mpack[39;49m[0m

.git/objects/info:

.git/objects/pack:


In [9]:
git status

On branch master

Initial commit

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	[31mfile1.txt[m

nothing added to commit but untracked files present (use "git add" to track)


So, the first principle, even if it is very hard to trust and internalize: **git doesn't know or care about files you haven't told it about**. It can, however, list files that it is not tracking.

## `git add`: tell git about this new file

In [10]:
ls -l .git
git add file1.txt # typically no output
git status

total 24
-rw-r--r--   1 orion  staff   23 Mar 10 11:21 HEAD
-rw-r--r--   1 orion  staff  137 Mar 10 11:21 config
-rw-r--r--   1 orion  staff   73 Mar 10 11:21 description
drwxr-xr-x  12 orion  staff  408 Mar 10 11:21 [34mhooks[39;49m[0m
drwxr-xr-x   3 orion  staff  102 Mar 10 11:21 [34minfo[39;49m[0m
drwxr-xr-x   4 orion  staff  136 Mar 10 11:21 [34mobjects[39;49m[0m
drwxr-xr-x   4 orion  staff  136 Mar 10 11:21 [34mrefs[39;49m[0m
On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

	[32mnew file:   file1.txt[m



What happened internally to make this happen?

In [14]:
ls -lR .git/objects

total 0
drwxr-xr-x  3 orion  staff  102 Mar 10 11:22 [34mce[39;49m[0m
drwxr-xr-x  2 orion  staff   68 Mar 10 11:21 [34minfo[39;49m[0m
drwxr-xr-x  2 orion  staff   68 Mar 10 11:21 [34mpack[39;49m[0m

.git/objects/ce:
total 8
-r--r--r--  1 orion  staff  37 Mar 10 11:22 645c7ac024e8f0ae22e0d6c3c2d5987e5a2223

.git/objects/info:

.git/objects/pack:


**Notice** that there is a new plain file here, `index`. It contains binary data, which if you really want, you can inspect with `hexdump`:

In [12]:
hexdump -C .git/index

00000000  44 49 52 43 00 00 00 02  00 00 00 01 58 c2 d2 b4  |DIRC........X...|
00000010  00 00 00 00 58 c2 d2 b4  00 00 00 00 01 00 00 06  |....X...........|
00000020  03 1f 74 5a 00 00 81 a4  00 00 01 f6 00 00 00 14  |..tZ............|
00000030  00 00 00 15 ce 64 5c 7a  c0 24 e8 f0 ae 22 e0 d6  |.....d\z.$..."..|
00000040  c3 c2 d5 98 7e 5a 22 23  00 09 66 69 6c 65 31 2e  |....~Z"#..file1.|
00000050  74 78 74 00 18 d8 a9 fa  0e a8 b3 da 16 0b e6 c7  |txt.............|
00000060  42 96 c9 1a 52 a5 50 76                           |B...R.Pv|
00000068


But the real way to look at it is to use `git ls-files`:

In [13]:
git ls-files --stage

100644 ce645c7ac024e8f0ae22e0d6c3c2d5987e5a2223 0	file1.txt


Notice that the file is stored inside `.git/objects/ce/645c7ac...` where the first two characters of the full hash is the directory name, and the other 38 characters are the filename. (Note, also, that the hexdump output, starting on the fourth line, contains this hash as well — expressed more efficiently, since the values `ce 64 5c...` etc. are all single bytes, instead of being the one byte each as they are in UTF-8:

In [None]:
echo ce645c7ac024e8f0ae22e0d6c3c2d5987e5a2223 | hexdump -C

But I digress. The important thing is this:

In [None]:
ls .git/objects/ce

Notice, more importantly, that I am writing this on a Sunday afternoon, but I know that a file with the contents "My first bit of text" will end up in directory `ce/` in file named `645c7ac...`.

### ☞ Why this partitioning scheme, and what does it remind you of? ☜

- Why: efficiency of filesystem access.
- What: a **hash**: i.e. the output of a hashing function. In Git, this a fingerprint of the content of each commit *and its parent*

In [15]:
echo -ne "blob 21\0My first bit of text\n" | shasum

ce645c7ac024e8f0ae22e0d6c3c2d5987e5a2223  -


**Unix protip**: you can get the length of a Bash string variable `$foo` by doing `${#foo}` . For more human-unreadable tricks like this see http://tldp.org/LDP/abs/html/string-manipulation.html

In [None]:
data="My first bit of text
"
echo -ne "blob ${#data}\0$data" | shasum

In [None]:
data1='This is the start of my paper2.'
hash1=`echo -ne "blob ${#data1}\0$data1" | shasum`
echo $hash1

In [None]:
# Our second commit, linked to the first
data2='Some more text in my paper...'
meta2='date: 1/21/2020'
# Note we add the parent hash here!
hash2=`echo -ne $data2 $meta2 $hash1 | shasum`
echo $hash2

And this is pretty much the essence of Git!

Bare files in `objects` are hashed purely on their content; trees and commits, which we'll see below, are also hashed with their date, so in some of the cells below I will know the hash, but in others I'll have to pull it dynamically.

### `git commit`: permanently record our changes in git's database

We added a file to the index, and Git knows about it now. But looking at this diagram again:

![](git-transport.png)

We're only one step past the `workspace`. We're in the "index", but not in the "local repository". Git knows the file, but hasn't made a record for it in the history of the repo. You know how to see the history of a repo, right?

In [16]:
git log

fatal: your current branch 'master' does not have any commits yet


: 128

In general, as you may know by now, you can call `git commit` with a few different options:

- (no options) - commit files that have been staged in the index with `git add` 
- `-a` - automatically stage already-tracked files that have been modified or deleted; ignore anything new
- `<file...>` list, by name, specific files you want to commit (in addition to staged ones).
 
Since we've just staged our new file and we don't have any others, we don't have to use any flags — except for one to specify the commit message:

In [17]:
git commit -m "This is our first commit"

[master (root-commit) 0a4a9f0] This is our first commit
 1 file changed, 1 insertion(+)
 create mode 100644 file1.txt


In the commit above, we  used the `-m` flag to specify a message at the command line.  If we don't do that, git will open the editor we specified in our configuration above and require that we enter a message.  By default, git refuses to record changes that don't have a message to go along with them (though you can obviously 'cheat' by using an empty or meaningless string: git only tries to facilitate best practices, it's not your nanny).

But what else has happened? First line, beginning `[master (root-commit) BLABLAH]` has a real value in BLABLAH, and I can't guess what it is as I write this, because it is computed based on the moment the commit was made.

# TO `HEAD`!

I really wanted to tell you about `HEAD` later, but since the flow of this tutorial sort of depends on it, we're going to have to start addressing it now. We had a file with hash that was predictable because that hash was based solely on the content of the file. Then we committed, and ended up with a hash that is not predictable because the hashed value includes the date that the commit was done. 

In [18]:
ls -l .git

total 40
-rw-r--r--   1 orion  staff   25 Mar 10 11:25 COMMIT_EDITMSG
-rw-r--r--   1 orion  staff   23 Mar 10 11:21 HEAD
-rw-r--r--   1 orion  staff  137 Mar 10 11:21 config
-rw-r--r--   1 orion  staff   73 Mar 10 11:21 description
drwxr-xr-x  12 orion  staff  408 Mar 10 11:21 [34mhooks[39;49m[0m
-rw-r--r--   1 orion  staff  137 Mar 10 11:25 index
drwxr-xr-x   3 orion  staff  102 Mar 10 11:21 [34minfo[39;49m[0m
drwxr-xr-x   4 orion  staff  136 Mar 10 11:25 [34mlogs[39;49m[0m
drwxr-xr-x   7 orion  staff  238 Mar 10 11:25 [34mobjects[39;49m[0m
drwxr-xr-x   4 orion  staff  136 Mar 10 11:21 [34mrefs[39;49m[0m


In [19]:
cat .git/HEAD

ref: refs/heads/master


In [20]:
cat .git/refs/heads/master

0a4a9f0eef158724ce97874950942bb3586d0935


### We could put that in a shell variable

In [21]:
last_commit=`cat .git/refs/heads/master`
echo $last_commit

0a4a9f0eef158724ce97874950942bb3586d0935


but in practice that is just the default argument to a bunch of git commands.

### What objects do we have in `.git/objects`?

In [22]:
ls -lR .git/objects

total 0
drwxr-xr-x  3 orion  staff  102 Mar 10 11:25 [34m0a[39;49m[0m
drwxr-xr-x  3 orion  staff  102 Mar 10 11:25 [34ma7[39;49m[0m
drwxr-xr-x  3 orion  staff  102 Mar 10 11:22 [34mce[39;49m[0m
drwxr-xr-x  2 orion  staff   68 Mar 10 11:21 [34minfo[39;49m[0m
drwxr-xr-x  2 orion  staff   68 Mar 10 11:21 [34mpack[39;49m[0m

.git/objects/0a:
total 8
-r--r--r--  1 orion  staff  139 Mar 10 11:25 4a9f0eef158724ce97874950942bb3586d0935

.git/objects/a7:
total 8
-r--r--r--  1 orion  staff  54 Mar 10 11:25 3ca0c3b54f0b93abf0157f398864b0daf556fd

.git/objects/ce:
total 8
-r--r--r--  1 orion  staff  37 Mar 10 11:22 645c7ac024e8f0ae22e0d6c3c2d5987e5a2223

.git/objects/info:

.git/objects/pack:


We had one thing in `.git/objects` before, and now we have three. We recognize one of them as the `file1.txt` that we've examined; we recognize another one from the commit hash we just got. But there's a third one, too. These are binary files, like the `index` we hexdumped, so there's no point looking at them directly, but we can ask git to show us what's in their binary format:

In [24]:
git cat-file -p a73ca0

100644 blob ce645c7ac024e8f0ae22e0d6c3c2d5987e5a2223	file1.txt


And that's the source of our third file: this `tree` hash.

## Ok, what are we looking at?

Suppose you needed to implement version control yourself.

The **commit**: *a snapshot of work at a point in time*

<!-- offline: 
![](fig/commit_anatomy.png)
-->

<img src="https://raw.github.com/fperez/reprosw/master/fig/commit_anatomy.png">

Credit: ProGit book, by Scott Chacon, CC License.

In [25]:
git cat-file -p $last_commit

tree a73ca0c3b54f0b93abf0157f398864b0daf556fd
author Orion Montoya <orion@mdcclv.com> 1489163148 -0500
committer Orion Montoya <orion@mdcclv.com> 1489163148 -0500

This is our first commit


In [26]:
git cat-file -p a73ca0c3

100644 blob ce645c7ac024e8f0ae22e0d6c3c2d5987e5a2223	file1.txt


In [27]:
git ls-files --stage

100644 ce645c7ac024e8f0ae22e0d6c3c2d5987e5a2223 0	file1.txt


## By the way -- blobs don't care what their filename is, only what their contents are.

Let's make a new file with the same content as the one before:

In [28]:
echo "My first bit of text" > file1a.txt
git add file1a.txt
git commit -m "Copy that file for demo"

[master a6caae9] Copy that file for demo
 1 file changed, 1 insertion(+)
 create mode 100644 file1a.txt


In [29]:
git cat-file -p HEAD

tree 45fa2d70498e73771b0cb31a1d92c99d7047dbe8
parent 0a4a9f0eef158724ce97874950942bb3586d0935
author Orion Montoya <orion@mdcclv.com> 1489163384 -0500
committer Orion Montoya <orion@mdcclv.com> 1489163384 -0500

Copy that file for demo


## ☞ This `tree` object also has a predictable hash, because it is the hash of the following contents:

In [30]:
git cat-file -p 45fa2d7

100644 blob ce645c7ac024e8f0ae22e0d6c3c2d5987e5a2223	file1.txt
100644 blob ce645c7ac024e8f0ae22e0d6c3c2d5987e5a2223	file1a.txt


### `git diff`: what have I changed?

Let's do a little bit more work... Again, in practice you'll be editing the files by hand, here we do it via shell commands for the sake of automation (and therefore the reproducibility of this tutorial!)

Before, we created a file with content by echoing to "> filename". That makes an empty file and adds text to it. Two arrows will do that, too, but if the file already exists it will append to the end of it instead.

In [31]:
echo "And now some more text..." >> file1.txt

And now we can ask git what is different:

In [32]:
git diff

[1mdiff --git a/file1.txt b/file1.txt[m
[1mindex ce645c7..4baa979 100644[m
[1m--- a/file1.txt[m
[1m+++ b/file1.txt[m
[36m@@ -1 +1,2 @@[m
 My first bit of text[m
[32m+[m[32mAnd now some more text...[m


In [33]:
git status

On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	[31mmodified:   file1.txt[m

no changes added to commit (use "git add" and/or "git commit -a")


In [34]:
ls -lR .git/objects

total 0
drwxr-xr-x  3 orion  staff  102 Mar 10 11:25 [34m0a[39;49m[0m
drwxr-xr-x  3 orion  staff  102 Mar 10 11:29 [34m45[39;49m[0m
drwxr-xr-x  3 orion  staff  102 Mar 10 11:29 [34ma6[39;49m[0m
drwxr-xr-x  3 orion  staff  102 Mar 10 11:25 [34ma7[39;49m[0m
drwxr-xr-x  3 orion  staff  102 Mar 10 11:22 [34mce[39;49m[0m
drwxr-xr-x  2 orion  staff   68 Mar 10 11:21 [34minfo[39;49m[0m
drwxr-xr-x  2 orion  staff   68 Mar 10 11:21 [34mpack[39;49m[0m

.git/objects/0a:
total 8
-r--r--r--  1 orion  staff  139 Mar 10 11:25 4a9f0eef158724ce97874950942bb3586d0935

.git/objects/45:
total 8
-r--r--r--  1 orion  staff  59 Mar 10 11:29 fa2d70498e73771b0cb31a1d92c99d7047dbe8

.git/objects/a6:
total 8
-r--r--r--  1 orion  staff  168 Mar 10 11:29 caae9499295f25ddfdf0919f23612aae742e03

.git/objects/a7:
total 8
-r--r--r--  1 orion  staff  54 Mar 10 11:25 3ca0c3b54f0b93abf0157f398864b0daf556fd

.git/objects/ce:
total 8
-r--r--r--  1 orion  staff  37 Mar 10 11:29 645c7ac024e8f0ae22e0d6c3c

We changed a file, and git can describe what changed, but it hasn't made any changes to the `.git/objects` directory yet because we haven't staged anything — we just have local changes in our workspace.

In [35]:
git add file1.txt
git status

On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

	[32mmodified:   file1.txt[m



In [36]:
ls -lR .git/objects

total 0
drwxr-xr-x  3 orion  staff  102 Mar 10 11:25 [34m0a[39;49m[0m
drwxr-xr-x  3 orion  staff  102 Mar 10 11:29 [34m45[39;49m[0m
drwxr-xr-x  3 orion  staff  102 Mar 10 11:31 [34m4b[39;49m[0m
drwxr-xr-x  3 orion  staff  102 Mar 10 11:29 [34ma6[39;49m[0m
drwxr-xr-x  3 orion  staff  102 Mar 10 11:25 [34ma7[39;49m[0m
drwxr-xr-x  3 orion  staff  102 Mar 10 11:22 [34mce[39;49m[0m
drwxr-xr-x  2 orion  staff   68 Mar 10 11:21 [34minfo[39;49m[0m
drwxr-xr-x  2 orion  staff   68 Mar 10 11:21 [34mpack[39;49m[0m

.git/objects/0a:
total 8
-r--r--r--  1 orion  staff  139 Mar 10 11:25 4a9f0eef158724ce97874950942bb3586d0935

.git/objects/45:
total 8
-r--r--r--  1 orion  staff  59 Mar 10 11:29 fa2d70498e73771b0cb31a1d92c99d7047dbe8

.git/objects/4b:
total 8
-r--r--r--  1 orion  staff  60 Mar 10 11:31 aa979a0af89c9d52506593c7b5390c84a05b70

.git/objects/a6:
total 8
-r--r--r--  1 orion  staff  168 Mar 10 11:29 caae9499295f25ddfdf0919f23612aae742e03

.git/objects/a7:
total 8
-r--

In [37]:
git reset HEAD file1.txt

Unstaged changes after reset:
M	file1.txt


The file has been unstaged, but FWIW the updated blob of it in 4baa979 still exists.

In [38]:
git cat-file -p HEAD

tree 45fa2d70498e73771b0cb31a1d92c99d7047dbe8
parent 0a4a9f0eef158724ce97874950942bb3586d0935
author Orion Montoya <orion@mdcclv.com> 1489163384 -0500
committer Orion Montoya <orion@mdcclv.com> 1489163384 -0500

Copy that file for demo


Before, the header of HEAD was three lines: `tree, author, committer` (and then the commit message). Now it is four: there is an extra line saying what is the `parent` of this commit. Every commit after the first will have this. Back to the implementation diagrams:

A **repository**: a group of *linked* commits

<!-- offline: 
![](files/fig/threecommits.png)
-->

<img src="https://raw.github.com/fperez/reprosw/master/fig/threecommits.png" >

Note: these form a Directed Acyclic Graph (DAG), with nodes identified by their *hash*.

### `git log`: what has been committed so far

In [39]:
git log

[33mcommit a6caae9499295f25ddfdf0919f23612aae742e03[m
Author: Orion Montoya <orion@mdcclv.com>
Date:   Fri Mar 10 11:29:44 2017 -0500

    Copy that file for demo

[33mcommit 0a4a9f0eef158724ce97874950942bb3586d0935[m
Author: Orion Montoya <orion@mdcclv.com>
Date:   Fri Mar 10 11:25:48 2017 -0500

    This is our first commit


In [40]:
ls -lR .git/objects

total 0
drwxr-xr-x  3 orion  staff  102 Mar 10 11:25 [34m0a[39;49m[0m
drwxr-xr-x  3 orion  staff  102 Mar 10 11:29 [34m45[39;49m[0m
drwxr-xr-x  3 orion  staff  102 Mar 10 11:31 [34m4b[39;49m[0m
drwxr-xr-x  3 orion  staff  102 Mar 10 11:29 [34ma6[39;49m[0m
drwxr-xr-x  3 orion  staff  102 Mar 10 11:25 [34ma7[39;49m[0m
drwxr-xr-x  3 orion  staff  102 Mar 10 11:22 [34mce[39;49m[0m
drwxr-xr-x  2 orion  staff   68 Mar 10 11:21 [34minfo[39;49m[0m
drwxr-xr-x  2 orion  staff   68 Mar 10 11:21 [34mpack[39;49m[0m

.git/objects/0a:
total 8
-r--r--r--  1 orion  staff  139 Mar 10 11:25 4a9f0eef158724ce97874950942bb3586d0935

.git/objects/45:
total 8
-r--r--r--  1 orion  staff  59 Mar 10 11:29 fa2d70498e73771b0cb31a1d92c99d7047dbe8

.git/objects/4b:
total 8
-r--r--r--  1 orion  staff  60 Mar 10 11:31 aa979a0af89c9d52506593c7b5390c84a05b70

.git/objects/a6:
total 8
-r--r--r--  1 orion  staff  168 Mar 10 11:29 caae9499295f25ddfdf0919f23612aae742e03

.git/objects/a7:
total 8
-r--

### The cycle of git virtue: work, commit, work, commit, ...

In [41]:
git commit -a -m"I have made great progress on this critical matter."

[master cd49e05] I have made great progress on this critical matter.
 1 file changed, 1 insertion(+)


In [42]:
ls -lR .git/objects

total 0
drwxr-xr-x  3 orion  staff  102 Mar 10 11:25 [34m0a[39;49m[0m
drwxr-xr-x  3 orion  staff  102 Mar 10 11:29 [34m45[39;49m[0m
drwxr-xr-x  3 orion  staff  102 Mar 10 11:31 [34m4b[39;49m[0m
drwxr-xr-x  3 orion  staff  102 Mar 10 11:29 [34ma6[39;49m[0m
drwxr-xr-x  3 orion  staff  102 Mar 10 11:25 [34ma7[39;49m[0m
drwxr-xr-x  3 orion  staff  102 Mar 10 11:33 [34mcd[39;49m[0m
drwxr-xr-x  3 orion  staff  102 Mar 10 11:22 [34mce[39;49m[0m
drwxr-xr-x  3 orion  staff  102 Mar 10 11:33 [34me1[39;49m[0m
drwxr-xr-x  2 orion  staff   68 Mar 10 11:21 [34minfo[39;49m[0m
drwxr-xr-x  2 orion  staff   68 Mar 10 11:21 [34mpack[39;49m[0m

.git/objects/0a:
total 8
-r--r--r--  1 orion  staff  139 Mar 10 11:25 4a9f0eef158724ce97874950942bb3586d0935

.git/objects/45:
total 8
-r--r--r--  1 orion  staff  59 Mar 10 11:29 fa2d70498e73771b0cb31a1d92c99d7047dbe8

.git/objects/4b:
total 8
-r--r--r--  1 orion  staff  60 Mar 10 11:33 aa979a0af89c9d52506593c7b5390c84a05b70

.git/obje

In [43]:
cat .git/refs/heads/master

cd49e05ed32b7d1e9ef685fab5f0807959486e9c


In [44]:
git cat-file -p `cat .git/refs/heads/master`

tree e14d35d9509833a16b7814f518c5fe7c04d8fca5
parent a6caae9499295f25ddfdf0919f23612aae742e03
author Orion Montoya <orion@mdcclv.com> 1489163630 -0500
committer Orion Montoya <orion@mdcclv.com> 1489163630 -0500

I have made great progress on this critical matter.


In [45]:
git cat-file -p `git cat-file -p HEAD | head -1 | awk '{print $2}'`

100644 blob 4baa979a0af89c9d52506593c7b5390c84a05b70	file1.txt
100644 blob ce645c7ac024e8f0ae22e0d6c3c2d5987e5a2223	file1a.txt


In [46]:
# this bit never has to change because the content is the same!
git cat-file -p 4baa979

My first bit of text
And now some more text...


### what will we see when we add another file?

In [47]:
echo "breaking things into sub-components" >> file2.txt
git add file2.txt
git commit -m "refactoring"

[master baf007f] refactoring
 1 file changed, 1 insertion(+)
 create mode 100644 file2.txt


In [48]:
git cat-file -p HEAD

tree 5e2831d9384ee8f43bde45bd04909f190902bc57
parent cd49e05ed32b7d1e9ef685fab5f0807959486e9c
author Orion Montoya <orion@mdcclv.com> 1489163725 -0500
committer Orion Montoya <orion@mdcclv.com> 1489163725 -0500

refactoring


In [49]:
git cat-file -p 5e2831d

100644 blob 4baa979a0af89c9d52506593c7b5390c84a05b70	file1.txt
100644 blob ce645c7ac024e8f0ae22e0d6c3c2d5987e5a2223	file1a.txt
100644 blob 2ed01a294f59cd2cda1fd438be46f5939bc7a69c	file2.txt


So again, as we see here:

<!-- offline: 
![](fig/commit_anatomy.png)
-->

<img src="https://raw.github.com/fperez/reprosw/master/fig/commit_anatomy.png">

a commit points to a tree, a tree points to a list of blobs, and blobs are files with particular contents. This is a directed acyclic graph if there ever was one. A branch is just the root of a particular chain of commits.

When you check out a branch, looks at the tree for it, looks at the blobs in it, and just dumps the contents of those blobs into files inside your workspace directory. Every revision of every file exists inside `.git/objects`; git just puts each thing in its proper place for you when you ask it.

But how, then, do you move from one branch to another? There must be a list of heads of trees somewhere, right? Enter the `.git/refs/`.

You know how, with breadth-first search, it doesn't really matter which node you start on? With a connected graph, any node *could* be the root. HEAD is just a reference to a node that is the root of the tree you want to have checked out.

In [50]:
cat .git/HEAD

ref: refs/heads/master


In [51]:
cat .git/refs/heads/master

baf007facc76f5e237236854f65fe169d684d25b


In [52]:
git checkout -b test2.0
#git checkout test2.0

Switched to a new branch 'test2.0'


In [53]:
ls -l .git/refs/heads/

total 16
-rw-r--r--  1 orion  staff  41 Mar 10 11:35 master
-rw-r--r--  1 orion  staff  41 Mar 10 11:36 test2.0


In [54]:
cat .git/refs/heads/*

baf007facc76f5e237236854f65fe169d684d25b
baf007facc76f5e237236854f65fe169d684d25b


Right now these two heads are identical. But let's make another commit:

In [55]:
echo "TODO: decide between imperial and metric unit tests" >> file3.txt
echo "notes that I don't want to track" >> notes.txt
git add file3.txt
git commit -am "big-picture vision stuff"

[test2.0 2f91016] big-picture vision stuff
 1 file changed, 1 insertion(+)
 create mode 100644 file3.txt


In [56]:
ls .git/refs/heads/

master  test2.0


In [57]:
cat .git/refs/heads/*

baf007facc76f5e237236854f65fe169d684d25b
2f91016ce3cef3b81629e479d4e9e008d3ce88da


In [58]:
ls -l

total 40
-rw-r--r--  1 orion  staff  47 Mar 10 11:31 file1.txt
-rw-r--r--  1 orion  staff  21 Mar 10 11:29 file1a.txt
-rw-r--r--  1 orion  staff  36 Mar 10 11:35 file2.txt
-rw-r--r--  1 orion  staff  52 Mar 10 11:37 file3.txt
-rw-r--r--  1 orion  staff  33 Mar 10 11:37 notes.txt


In [59]:
git checkout master
ls -l

Switched to branch 'master'
total 32
-rw-r--r--  1 orion  staff  47 Mar 10 11:31 file1.txt
-rw-r--r--  1 orion  staff  21 Mar 10 11:29 file1a.txt
-rw-r--r--  1 orion  staff  36 Mar 10 11:35 file2.txt
-rw-r--r--  1 orion  staff  33 Mar 10 11:37 notes.txt


Notice that:
1. **file3.txt disappeared**. It is tracked in the `test2.0` branch, but it does not exist in `master`, so git cleans it up.
2. **`notes.txt` is still there**. It is not tracked in any branch, so git doesn't try to do anything to it at all.

In [60]:
echo "Boss says let's go with imperial unit tests" >> file3.txt
git add file3.txt
git commit -am "executive decisions"

[master ec30798] executive decisions
 1 file changed, 1 insertion(+)
 create mode 100644 file3.txt


In [61]:
gitk --all

## The meta-point

The meta-point here is that, now that you've had Data Structures and stuff, every piece of software was written by an idiot just like you, with recourse to the same data structures and algorithms as you. Understanding how a system fits together can help you master it.

## now to merge

In [62]:
git branch

* [32mmaster[m
  test2.0[m


In [63]:
git merge test2.0

Auto-merging file3.txt
CONFLICT (add/add): Merge conflict in file3.txt
Automatic merge failed; fix conflicts and then commit the result.


: 1

In [64]:
git mergetool

Merging:
file3.txt

Normal merge conflict for 'file3.txt':
  {local}: created file
  {remote}: created file
2017-03-10 11:41:54.396 FileMerge[4282:9975607] Unable to load platform at path /Applications/Xcode.app/Contents/Developer/Platforms/AppleTVOS.platform
2017-03-10 11:41:54.398 FileMerge[4282:9975607] Unable to load platform at path /Applications/Xcode.app/Contents/Developer/Platforms/AppleTVSimulator.platform
2017-03-10 11:41:54.399 FileMerge[4282:9975607] Unable to load platform at path /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform
2017-03-10 11:41:54.400 FileMerge[4282:9975607] Unable to load platform at path /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneSimulator.platform
2017-03-10 11:41:54.405 FileMerge[4282:9975607] Unable to load platform at path /Applications/Xcode.app/Contents/Developer/Platforms/WatchOS.platform
2017-03-10 11:41:54.406 FileMerge[4282:9975607] Unable to load platform at path /Applications/Xcode.app/Contents/Developer

In [65]:
git status

On branch master
All conflicts fixed but you are still merging.
  (use "git commit" to conclude merge)

Changes to be committed:

	[32mmodified:   file3.txt[m

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	[31mfile3.txt.orig[m
	[31mnotes.txt[m



In [66]:
git commit -m "resolved organizational conflict"

[master e63b0b9] resolved organizational conflict


## Cherrypick

Actually, at this point I think I can describe `git cherry-pick` to you and you can tell me how it is implemented.

You're in a branch, and you have some commits. You want to pull in changes from another commit or list of commits. You're in the branch you want the changes in. You call `git cherry-pick <commits>`. Now the changes are part of your branch history.

- are the commit hashes the same, or different?

## Reflog

Ok. Remember how your current branch is recorded in `.git/HEAD`, and that points to a place like `refs/heads/master` or `refs/heads/<branchname>`? Well, it turns out git keeps a log of what your HEAD is pointing to every time it changes, and you can look at it.

This is great in various situations where you messed up history. Maybe you did a reset to rollback some commits, but realized you had some good work in the second-to-last commit. Maybe you inadvertently rebased `master` and then inadvertently force-pushed, and the clock is ticking for everyone to get mad at you.

In [67]:
git reflog

[33me63b0b9[m HEAD@{0}: commit (merge): resolved organizational conflict
[33mec30798[m HEAD@{1}: commit: executive decisions
[33mbaf007f[m HEAD@{2}: checkout: moving from test2.0 to master
[33m2f91016[m HEAD@{3}: commit: big-picture vision stuff
[33mbaf007f[m HEAD@{4}: checkout: moving from master to test2.0
[33mbaf007f[m HEAD@{5}: commit: refactoring
[33mcd49e05[m HEAD@{6}: commit: I have made great progress on this critical matter.
[33ma6caae9[m HEAD@{7}: commit: Copy that file for demo
[33m0a4a9f0[m HEAD@{8}: commit (initial): This is our first commit


## Indexing from HEAD

In [68]:
git cat-file -p HEAD

tree ac98f0ffa6405a9d2be88530b128fa6efbdbd00d
parent e63b0b965b0dd9dae8014c9cecec45cce0758414
author Orion Montoya <orion@mdcclv.com> 1489165170 -0500
committer Orion Montoya <orion@mdcclv.com> 1489165194 -0500

one thing and another


In [69]:
git cat-file -p HEAD~1

tree 8558234198eb9c2b5435e1e0c131bc8be802ab87
parent ec30798af9980aa9b8016275d0e9264cf8a2999c
parent 2f91016ce3cef3b81629e479d4e9e008d3ce88da
author Orion Montoya <orion@mdcclv.com> 1489164230 -0500
committer Orion Montoya <orion@mdcclv.com> 1489164230 -0500

resolved organizational conflict


In [70]:
git cat-file -p HEAD~4

tree e14d35d9509833a16b7814f518c5fe7c04d8fca5
parent a6caae9499295f25ddfdf0919f23612aae742e03
author Orion Montoya <orion@mdcclv.com> 1489163630 -0500
committer Orion Montoya <orion@mdcclv.com> 1489163630 -0500

I have made great progress on this critical matter.


In [71]:
git diff HEAD~3..HEAD

[1mdiff --git a/file3.txt b/file3.txt[m
[1mnew file mode 100644[m
[1mindex 0000000..b72e7d4[m
[1m--- /dev/null[m
[1m+++ b/file3.txt[m
[36m@@ -0,0 +1 @@[m
[32m+[m[32mbla bla[m
[1mdiff --git a/file4.txt b/file4.txt[m
[1mnew file mode 100644[m
[1mindex 0000000..9797d05[m
[1m--- /dev/null[m
[1m+++ b/file4.txt[m
[36m@@ -0,0 +1 @@[m
[32m+[m[32mboo balbl[m


In [72]:
git diff master..test2.0

[1mdiff --git a/file3.txt b/file3.txt[m
[1mindex b4345c3..b72e7d4 100644[m
[1m--- a/file3.txt[m
[1m+++ b/file3.txt[m
[36m@@ -1,2 +1 @@[m
[31m-Boss says let's go with imperial unit tests[m
[31m-TODO: decide between imperial and metric unit tests[m
[32m+[m[32mbla bla[m
[1mdiff --git a/file4.txt b/file4.txt[m
[1mnew file mode 100644[m
[1mindex 0000000..9797d05[m
[1m--- /dev/null[m
[1m+++ b/file4.txt[m
[36m@@ -0,0 +1 @@[m
[32m+[m[32mboo balbl[m


In [75]:
git diff test2.0..master file*

[1mdiff --git a/file3.txt b/file3.txt[m
[1mindex b72e7d4..b4345c3 100644[m
[1m--- a/file3.txt[m
[1m+++ b/file3.txt[m
[36m@@ -1 +1,2 @@[m
[31m-bla bla[m
[32m+[m[32mBoss says let's go with imperial unit tests[m
[32m+[m[32mTODO: decide between imperial and metric unit tests[m
[1mdiff --git a/file4.txt b/file4.txt[m
[1mdeleted file mode 100644[m
[1mindex 9797d05..0000000[m
[1m--- a/file4.txt[m
[1m+++ /dev/null[m
[36m@@ -1 +0,0 @@[m
[31m-boo balbl[m


## Squashing commits/interactive rebase

**This will happen to you.** I can't tell you how many workplace pissing contests have centered around a new person who doesn't squash their commits. Large, professional projects tend to want commits to be thematically isolated, and almost everybody wants their `master` history free of commits with messages like "wip" or "missed a semicolon". Every team has its own standards, for sure, but knowing how to squash your commits will make you look professional from your first pull request. If you get nothing else from this exercise, I would love it if you walked away knowing that you have the option of tidying up your commit history before anybody else looks at it.

By now we've made quite a few piddling little commits. If we wanted someone else to review the things we've done, we can squash them into a smaller number of commits for simplicity of reviewing.

### staging hunks with gitk

## Stash

## Rebase vs Merge

https://www.quora.com/How-does-Git-merge-work/answer/Anders-Kaseorg

## Bisect

Bisect is a really useful thing that comes up in bigger, more complicated projects. The scenario where it makes sense is as follows: you have some code that works. You work on it some more, and you make multiple changes, over maybe dozens of commits. At some point, you notice that something doesn't work like it used to. You check out your last commit, and you find that it was broken there too. You roll back one more commit, and the thing that used to work is *still* broken. When did it last work? `git bisect` is for this.

You give it a starting "good" commit and and ending "bad" commit, then do a binary search.
https://git-scm.com/docs/git-bisect

## `man git-<whatever>`: the manpages are your friend

This is roughly the amount of knowledge that I have needed in order to be able to confidently read Git manpages. The trick is that even the manpages require insider knowledge to find them: to find out about `git pull`, you have to run `man git-pull` which is a pretty uncommon manpage naming format.

In [76]:
gitk

### `git log` revisited

First, let's see what the log shows us now:

In [None]:
git log

Sometimes it's handy to see a very summarized version of the log:

In [None]:
git log --oneline --topo-order --graph

Git supports *aliases:* new names given to command combinations. Let's make this handy shortlog an alias, so we only have to type `git slog` and see this compact log:

In [None]:
# We create our alias (this saves it in git's permanent configuration file):
git config --global alias.slog "log --oneline --topo-order --graph"

# And now we can use it
git slog

### `git mv` and `rm`: moving and removing files

While `git add` is used to add fils to the list git tracks, we must also tell it if we want their  names to change or for it to stop tracking them.  In familiar Unix fashion, the `mv` and `rm` git commands do precisely this:

In [None]:
git mv file1.txt file-newname.txt
git status

Note that these changes must be committed too, to become permanent!  In git's world, until something hasn't been committed, it isn't permanently recorded anywhere.

In [None]:
git commit -a -m"I like this new name better"
echo "Let's look at the log again:"
git slog

And `git rm` works in a similar fashion.

### Exercise

Add a new file `file4.txt`, commit it, make some changes to it, commit them again, and then remove it (and don't forget to commit this last step!).

## Local user, branching

What is a branch?  Simply a *label for the 'current' commit in a sequence of ongoing commits*:

<!-- offline: 
![](files/fig/masterbranch.png)
-->

<img src="https://raw.github.com/fperez/reprosw/master/fig/masterbranch.png" >

There can be multiple branches alive at any point in time; the working directory is the state of a special pointer called HEAD.  In this example there are two branches, *master* and *testing*, and *testing* is the currently active branch since it's what HEAD points to:

<!-- offline: 
![](files/fig/HEAD_testing.png)
-->

<img src="https://raw.github.com/fperez/reprosw/master/fig/HEAD_testing.png" >

Once new commits are made on a branch, HEAD and the branch label move with the new commits:

<!-- offline: 
![](files/fig/branchcommit.png)
-->

<img src="https://raw.github.com/fperez/reprosw/master/fig/branchcommit.png" >

This allows the history of both branches to diverge:

<!-- offline: 
![](files/fig/mergescenario.png)
-->

<img src="https://raw.github.com/fperez/reprosw/master/fig/mergescenario.png" >

But based on this graph structure, git can compute the necessary information to merge the divergent branches back and continue with a unified line of development:
    
<!-- offline: 
![](files/fig/mergeaftermath.png)
-->

<img src="https://raw.github.com/fperez/reprosw/master/fig/mergeaftermath.png" >

Let's now illustrate all of this with a concrete example.  Let's get our bearings first:

In [None]:



git status
ls

We are now going to try two different routes of development: on the `master` branch we will add one file and on the `experiment` branch, which we will create, we will add a different one.  We will then merge the experimental branch into `master`.

In [None]:



git branch experiment
git checkout experiment

In [None]:



echo "Some crazy idea" > experiment.txt
git add experiment.txt
git commit -a -m"Trying something new"
git slog

In [None]:



git checkout master
git slog

In [None]:



echo "All the while, more work goes on in master..." >> file-newname.txt
git commit -a -m"The mainline keeps moving"
git slog

In [None]:
ls

In [None]:
git merge experiment
git slog

## Using remotes as a single user

We are now going to introduce the concept of a *remote repository*: a pointer to another copy of the repository that lives on a different location.  This can be simply a different path on the filesystem or a server on the internet.

For this discussion, we'll be using remotes hosted on the [GitHub.com](http://github.com) service, but you can equally use other services like [BitBucket](http://bitbucket.org) or [Gitorious](http://gitorious.org) as well as host your own.

In [None]:
ls
echo "Let's see if we have any remote repositories here:"
git remote -v

Since the above cell didn't produce any output after the `git remote -v` call, it means we have no remote repositories configured.  We will now proceed to do so.  Once logged into GitHub, go to the [new repository page](https://github.com/new) and make a repository called `test`.  Do **not** check the box that says `Initialize this repository with a README`, since we already have an existing repository here.  That option is useful when you're starting first at Github and don't have a repo made already on a local computer.

We can now follow the instructions from the next page:

In [None]:
git remote add origin https://github.com/mdcclv/test.git
git push -u origin master

Let's see the remote situation again:

In [None]:
git remote -v

We can now [see this repository publicly on github](https://github.com/fperez/test).


Let's see how this can be useful for backup and syncing work between two different computers.  I'll simulate a 2nd computer by working in a different directory...

In [None]:
cd ..
# Here I clone my 'test' repo but with a different name, test2, to simulate a 2nd computer
git clone https://github.com/mdcclv/test.git test2
cd test2
pwd
git remote -v

Let's now make some changes in one 'computer' and synchronize them on the second.

In [None]:
cd ../test2  # working on computer #2

echo "More new content on my experiment" >> experiment.txt
git commit -a -m"More work, on machine #2"

Now we put this new work up on the github server so it's available from the internet

In [None]:
cd ../test2

git push

Now let's fetch that work from machine #1:

In [None]:
git pull

### An important aside: conflict management

While git is very good at merging, if two different branches modify the same file in the same location, it simply can't decide which change should prevail.  At that point, human intervention is necessary to make the decision.  Git will help you by marking the location in the file that has a problem, but it's up to you to resolve the conflict.  Let's see how that works by intentionally creating a conflict.

We start by creating a branch and making a change to our experiment file:

In [None]:
git branch trouble
git checkout trouble
echo "This is going to be a problem..." >> experiment.txt
git commit -a -m"Changes in the trouble branch"

And now we go back to the master branch, where we change the *same* file:

In [None]:
git checkout master
echo "More work on the master branch..." >> experiment.txt
git commit -a -m"Mainline work"

So now let's see what happens if we try to merge the `trouble` branch into `master`:

In [None]:
git merge trouble

Let's see what git has put into our file:

In [None]:
cat experiment.txt

At this point, we go into the file with a text editor, decide which changes to keep, and make a new commit that records our decision.  I've now made the edits, in this case I decided that both pieces of text were useful, but integrated them with some changes:

In [None]:
cat experiment.txt

Let's then make our new commit:

In [None]:
git commit -a -m"Completed merge of trouble, fixing conflicts along the way"
git slog

*Note:* While it's a good idea to understand the basics of fixing merge conflicts by hand, in some cases you may find the use of an automated tool useful.  Git supports multiple [merge tools](https://www.kernel.org/pub/software/scm/git/docs/git-mergetool.html): a merge tool is a piece of software that conforms to a basic interface and knows how to merge two files into a new one.  Since these are typically graphical tools, there are various to choose from for the different operating systems, and as long as they obey a basic command structure, git can work with any of them.

## Collaborating on github with a small team

Single remote with shared access: we are going to set up a shared collaboration with one partner (the person sitting next to you).  This will show the basic workflow of collaborating on a project with a small team where everyone has write privileges to the same repository.  

Note for SVN users: this is similar to the classic SVN workflow, with the distinction that commit and push are separate steps.  SVN, having no local repository, commits directly to the shared central resource, so to a first approximation you can think of `svn commit` as being synonymous with `git commit; git push`.

We will have two people, let's call them Alice and Bob, sharing a repository.  Alice will be the owner of the repo and she will give Bob write privileges.  

We begin with a simple synchronization example, much like we just did above, but now between *two people* instead of one person.  Otherwise it's the same:

- Bob clones Alice's repository.
- Bob makes changes to a file and commits them locally.
- Bob pushes his changes to github.
- Alice pulls Bob's changes into her own repository.

Next, we will have both parties make non-conflicting changes each, and commit them locally.  Then both try to push their changes:

- Alice adds a new file, `alice.txt` to the repo and commits.
- Bob adds `bob.txt` and commits.
- Alice pushes to github.
- Bob tries to push to github.  What happens here?

The problem is that Bob's changes create a commit that conflicts with Alice's, so git refuses to apply them.  It forces Bob to first do the merge on his machine, so that if there is a conflict in the merge, Bob deals with the conflict manually (git could try to do the merge on the server, but in that case if there's a conflict, the server repo would be left in a conflicted state without a human to fix things up).  The solution is for Bob to first pull the changes (pull in git is really fetch+merge), and then push again.

## Full-contact github: distributed collaboration with large teams

Multiple remotes and merging based on pull request workflow: this is beyond the scope of this brief tutorial, so we'll simply discuss how it works very briefly, illustrating it with the activity on the [IPython github repository](http://github.com/ipython/ipython).

## Other useful commands

- [show](http://www.kernel.org/pub/software/scm/git/docs/git-show.html)
- [reflog](http://www.kernel.org/pub/software/scm/git/docs/git-reflog.html)
- [rebase](http://www.kernel.org/pub/software/scm/git/docs/git-rebase.html)
- [tag](http://www.kernel.org/pub/software/scm/git/docs/git-tag.html)

## Git resources

### Introductory materials

There are lots of good tutorials and introductions for Git, which you
can easily find yourself; this is just a short list of things I've found
useful.  For a beginner, I would recommend the following 'core' reading list, and
below I mention a few extra resources:

1. The smallest, and in the style of this tuorial: [git - the simple guide](http://rogerdudler.github.com/git-guide)
contains 'just the basics'.  Very quick read.

1.  The concise [Git Reference](http://gitref.org): compact but with
    all the key ideas. If you only read one document, make it this one.

1. In my own experience, the most useful resource was [Understanding Git
Conceptually](http://www.sbf5.com/~cduan/technical/git).
Git has a reputation for being hard to use, but I have found that with a
clear view of what is actually a *very simple* internal design, its
behavior is remarkably consistent, simple and comprehensible.

1.  For more detail, see the start of the excellent [Pro
    Git](http://progit.org/book) online book, or similarly the early
    parts of the [Git community book](http://book.git-scm.com). Pro
    Git's chapters are very short and well illustrated; the community
    book tends to have more detail and has nice screencasts at the end
    of some sections.

If you are really impatient and just want a quick start, this [visual git tutorial](http://www.ralfebert.de/blog/tools/visual_git_tutorial_1)
may be sufficient. It is nicely illustrated with diagrams that show what happens on the filesystem.

For windows users, [an Illustrated Guide to Git on Windows](http://nathanj.github.com/gitguide/tour.html) is useful in that
it contains also some information about handling SSH (necessary to interface with git hosted on remote servers when collaborating) as well
as screenshots of the Windows interface.

Cheat sheets
:   Two different
    [cheat](http://zrusin.blogspot.com/2007/09/git-cheat-sheet.html)
    [sheets](http://jan-krueger.net/development/git-cheat-sheet-extended-edition)
    in PDF format that can be printed for frequent reference.

### Beyond the basics

At some point, it will pay off to understand how git itself is *built*.  These two documents, written in a similar spirit, 
are probably the most useful descriptions of the Git architecture short of diving into the actual implementation.  They walk you through
how you would go about building a version control system with a little story. By the end you realize that Git's model is almost
an inevitable outcome of the proposed constraints:

* The [Git parable](http://tom.preston-werner.com/2009/05/19/the-git-parable.html) by Tom Preston-Werner.
* [Git foundations](http://matthew-brett.github.com/pydagogue/foundation.html) by Matthew Brett.

[Git ready](http://www.gitready.com)
:   A great website of posts on specific git-related topics, organized
    by difficulty.

[QGit](http://sourceforge.net/projects/qgit/): an excellent Git GUI
:   Git ships by default with gitk and git-gui, a pair of Tk graphical
    clients to browse a repo and to operate in it. I personally have
    found [qgit](http://sourceforge.net/projects/qgit/) to be nicer and
    easier to use. It is available on modern linux distros, and since it
    is based on Qt, it should run on OSX and Windows.

[Git Magic](http://www-cs-students.stanford.edu/~blynn/gitmagic/index.html)
:   Another book-size guide that has useful snippets.

The [learning center](http://learn.github.com) at Github
:   Guides on a number of topics, some specific to github hosting but
    much of it of general value.

A [port](http://cworth.org/hgbook-git/tour) of the Hg book's beginning
:   The [Mercurial book](http://hgbook.red-bean.com) has a reputation
    for clarity, so Carl Worth decided to
    [port](http://cworth.org/hgbook-git/tour) its introductory chapter
    to Git. It's a nicely written intro, which is possible in good
    measure because of how similar the underlying models of Hg and Git
    ultimately are.

[Intermediate tips](http://andyjeffries.co.uk/articles/25-tips-for-intermediate-git-users)
:   A set of tips that contains some very valuable nuggets, once you're
    past the basics.

Finally, if you prefer a video presentation, this 1-hour tutorial prepared by the GitHub educational team will walk you through the entire process:

In [None]:
%%python

from IPython.display import YouTubeVideo
YouTubeVideo('U8GBXvdmHT4')

### A few useful tips for common tasks

#### Better shell support

Adding git branch info to your bash prompt and tab completion for git commands and branches is extremely useful.  I suggest you at least copy:

- [git-completion.bash](https://github.com/git/git/blob/master/contrib/completion/git-completion.bash)
- [git-prompt.sh](https://github.com/git/git/blob/master/contrib/completion/git-prompt.sh)
 
You can then source both of these files in your `~/.bashrc` and then set your prompt (I'll assume you named them as the originals but starting with a `.` at the front of the name):

    source $HOME/.git-completion.bash
    source $HOME/.git-prompt.sh
    PS1='[\u@\h \W$(__git_ps1 " (%s)")]\$ '   # adjust this to your prompt liking

See the comments in both of those files for lots of extra functionality they offer.

#### Embedding Git information in LaTeX documents

(Sent by [Yaroslav Halchenko](http://www.onerussian.com))
su
I use a Make rule:

    # Helper if interested in providing proper version tag within the manuscript
    revision.tex: ../misc/revision.tex.in ../.git/index
       GITID=$$(git log -1 | grep -e '^commit' -e '^Date:' | sed  -e 's/^[^ ]* *//g' | tr '\n' ' '); \
       echo $$GITID; \
       sed -e "s/GITID/$$GITID/g" $< >| $@

in the top level `Makefile.common` which is included in all
subdirectories which actually contain papers (hence all those
`../.git`). The `revision.tex.in` file is simply:

    % Embed GIT ID revision and date
    \def\revision{GITID}

The corresponding `paper.pdf` depends on `revision.tex` and includes the
line `\input{revision}` to load up the actual revision mark.

#### git export

Git doesn't have a native export command, but this works just fine:

    git archive --prefix=fperez.org/  master | gzip > ~/tmp/source.tgz