# Lab 1: Version Control with Git

**This tutorial is largely based on the repository**:

git@github.com:rdadolf/git-and-github.git

# Table of Contents
* [Lab 1: Version Control with Git](#Lab-1:-Version-Control-with-Git)
	* [Git Basics](#Git-Basics)
	* [Common Tasks in the version control of files.](#Common-Tasks-in-the-version-control-of-files.)
		* [Forking a repository](#Fork)
		* [Cloning a repository](#Clone)
		* [Poking around](#Status(Poking-around))
		* [Staging changes](##Add (staging))
        * [Commiting](##Commit))
		* [Pushing](##Push)
		* [Staging changes](##Add (staging))
		* [Branching](#Branch)
	* [Git habits](#Git-habits)


----

# Git Basics

The first thing to understand about git is that the contents of your project are stored in several different states and forms at any given time.

You can think about git operating on four different areas:

![Git Commands](git_layout.png)

 - The **working directory** is what you're currently looking at. When you use an editor to modify a file, the changes are made to the working directory.
 - The **staging area** is a place to collect a set of changes made to your project. If you have changed three files to fix a bug, you will add all three to the staging area so that you can remember the changes as one historical entity. It is also called the **index**. You move files from the working directory to the index using the command `git add`.
 - The **local repository** is the place where git stores everything you've ever done to your project. Even when you delete a file, a copy is stored in the repo (this is necessary for always being able to undo any change). It's important to note that a local repository doesn't look much at all like your project files or directories. Git has its own way of storing all the information, and if you're curious what it looks like, look in the `.git` directory in the working directory of your project. Files are moved from the index to the local repository via the command `git commit`.
 - When working in a team, every member will be working on their own local repository. An **upstream repository** allows everyone to agree on a single version of history. If two people have made changes on their local repositories, they will combine those changes in the upstream repository. In our case this upstream repository is hosted by github. This upstream repository is also called a **remote** in git parlance. The standard github remote is called the **origin**: it is the repository which is given a web page on github. One usually moves code from local to remote repositories using `git push`, and in the other direction using `git fetch`.

You can think of most git operations as moving code or metadata from one of these areas to another.

----

# Common Tasks in the version control of files.

## Fork

Forking a repository done on github. On github, go to the url https://github.com/rfarouni/github_tutorial. Click the "Fork button on the upper right side. A screenshot is below. 

## Clone

Now that we have a **fork** of the `rfarouni/github_tutorial` repository, lets **clone** it down to our local machines.

----
`clone`

![clone](git_clone.png)

Cloning a repository does two things: 

1. it takes a repository from somewhere (usually an **upstream repository**) and makes a local copy (your new **local repository**)
2. it creates the most recent copy of all of the files in the project (your new **working directory**). 

**Note** If you havent set up your ssh keys, use *https* instead.

**Note for windows users**

`bash.exe` is installed along with git but isnt in your path. Convert the next cell from Raw NbConvert to code and run it to add the path to bash to your path"

In [2]:
%%bash
cd /tmp
rm -rf github_tutorial #remove if it exists
git clone git@github.com:rfarouni/github_tutorial.git

Cloning into 'Testing'...


In [3]:
%%bash
ls /tmp/github_tutorial

LICENSE
README.md
hello.md


## Status (Poke around)

We have a nice smelling fresh repository. Lets look around it.

`log`

Log tells you all the changes that have occured in this project as of now...

In [4]:
%%bash
cd /tmp/github_tutorial; git log

commit 7ba94f7e4ed9506b0b3eab02e03d516c2b38977e
Author: Rahul Dave <rahuldave@gmail.com>
Date:   Tue Sep 1 22:53:03 2015 -0400

    added a line in readme

commit 3a2909aa790f041f1426e51565674165e52acdc8
Author: Rahul Dave <rahuldave@gmail.com>
Date:   Fri Aug 28 02:44:21 2015 -0400

    Added a test file to demonstrate git features

commit 98bd53e8117d85cf0f7978f6aca22ce90ebc9709
Author: Rahul Dave <rahuldave@gmail.com>
Date:   Fri Aug 28 02:23:32 2015 -0400

    Attributed the test file to A.

commit 5bd2f661fb25953b444bd5e8ba9557456406ece0
Author: Rahul Dave <rahuldave@gmail.com>
Date:   Fri Aug 28 02:21:57 2015 -0400

    Added a test file to demonstrate git features

commit 45c25a4d944ad583a383fb7b35f185750e9b5b9c
Author: Rahul Dave <rahuldave@gmail.com>
Date:   Fri Aug 28 02:19:02 2015 -0400

    Added a test file to demonstrate git features

commit 11961a3e0d50ea2ede1265da4bf586c001e63d7b
Author: Rahul Dave <rahuldave@gmail.com>
Date:   Fri Aug 28 01:55:49 2015 -0400

    Initia

Each one of these "commits" is a SHA hash. It uniquely identifies all actions that have happened to this repository previously. We shall soon see how to add our own actions in. In the meanwhile, lets see the "status"of our working directory.

----
`status`

![status](git_status.png)

Status is your window into the current state of your project. It can tell you which files you have changed and which files you currently have in your staging area.

In [5]:
%%bash
cd /tmp/github_tutorial; git status

On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working directory clean



"`origin/master`" represents the local copy of the branch that came from the upstream repository (nicknamed "`origin`" in this case). 

**Branches** are different, co-existing versions of your project. They represent a snapshot of the project, by someone, at some particular point in time. In general you will only care about your own branches, and those of the "parent" remotes you forked/cloned from.

You are always working on a given branch in a repository. Typically this is `master`. More on this later..You can know which branch you are on by typing `git branch`. The strred one is the one you are on.


In [8]:
%%bash
cd /tmp/github_tutorial; git branch

* master


#### Making changes

Ok! Enough poking around. Lets get down to business and add some files into our folder.

Now let's say that we want to add a new file to the project. The canonical sequence is "edit&ndash;add&ndash;commit&ndash;push".

In [9]:
%%bash
cd /tmp/github_tutorial
echo '# Hello world' > hello.md
git status

On branch master
Your branch is up-to-date with 'origin/master'.
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   hello.md

no changes added to commit (use "git add" and/or "git commit -a")


We've added a file to the working directory, but it hasn't been staged yet.

## Add (staging)

----
`add`

![add](git_add.png)

When you've made a change to a set of files and are ready to create a commit, the first step is to add all of the changed files to the staging area. Add does that. Remember that what you see in the filesystem is your working directory, so the way to see what's in the staging area is with the `status` command. This also means that **if you add something to the staging area and then edit it again, you'll also need to add the file to the staging area again if you want to remember the new changes**.

In [10]:
%%bash
cd /tmp/github_tutorial
git add hello.md
git status

On branch master
Your branch is up-to-date with 'origin/master'.
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

	modified:   hello.md



Now our file is in the staging area (Index), waiting to be committed.

I will sometimes simply use `git add .` in the top level of the repository. This adds all new files and changed files to the index, and is particularly useful if I have created multiple new files.

## Commit

----
`commit`

![commit](git_commit.png)

When you're satisfied with the changes you've added to your staging area, you can commit those changes to your local repository with the `commit` command. Those changes will have a permanent record in the repository from now on.

Every commit has two features you should be aware of. The first is a hash. This is a unique identifier for all of the information about that commit, including the code changes, the timestamp, and the author. The second is a commit message. This is text that you can (and should) add to a commit to describe what the changes were.

**Good commit messages are important.**

In [11]:
%%bash
cd /tmp/github_tutorial
git commit -m "Said hello to myself"
git status

[master 15471cd] Said hello to myself
 1 file changed, 1 insertion(+), 1 deletion(-)
On branch master
Your branch is ahead of 'origin/master' by 1 commit.
  (use "git push" to publish your local commits)
nothing to commit, working directory clean


The `git commit -m...` version is just a way to specify a commit message without opening a text editor (ipython notebook can't handle it). Otherwise you just say `git commit` or `git commit -a` (if you `add`ed a new file to the current branch on the repository)

Now we see that our branch, "`master`", has one more commit than the "`origin/master`" branch, the local copy of the branch that came from the upstream repository (nicknamed "`origin`" in this case). Let's push the changes.

## Push

`push`

![push](git_push.png)

The `push` command takes the changes you have made to your local repository and attempts to update a remote repository with them. If you're the only person working with both of these (which is how a solo GitHub project would work), then push should always succeed.

In [12]:
%%bash
cd /tmp/github_tutorial
git push
git status

On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working directory clean


To git@github.com:rahuldave/Testing.git
   7ba94f7..15471cd  master -> master


## Remote 

If you're working with other people, then it's possible that they have made changes to the remote repository between the time you first cloned it and now. `push` will fail. 


---

We have seen so far that our repository has one "remote", or upstream repository, which has been identified with the word `origin`, as seen in `.git/config`. We now wish to add another remote, which we shall call `tutorial`, which points to the original repository we forked from. We want to do this to pull in changes, in-case something changed there.

In [13]:
%%bash
cd /tmp/github_tutorial
git remote add tutorial git@github.com:rfarouni/github_tutorial.git
cat .git/config

[core]
	repositoryformatversion = 0
	filemode = true
	bare = false
	logallrefupdates = true
	ignorecase = true
	precomposeunicode = true
[remote "origin"]
	url = git@github.com:rahuldave/Testing.git
	fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
	remote = origin
	merge = refs/heads/master
[remote "course"]
	url = git@github.com:cs109/Testing.git
	fetch = +refs/heads/*:refs/remotes/course/*


Notice that the `master` branch only tracks the same branch on the `origin` remote. We havent set up any connection with the `course` remote as yet.

Now lets figure out how to get changes from an upstream repository, be it our `origin` upstream that a collaborator has `push`ed too, or another `course` remote to which your memory-added head TF has posted a change!

## Fetch

----
`fetch`

![fetch](git_fetch.png)

Let's say that you and your collaborator both edited the same line of the same file at the same time in different ways. On your respective machines, you both add and commit your different changes, and your collaborator pushes theirs to the upstream repository. When you run `fetch`, git adds a record of their changes to your local repository *alongside* your own. These are called *branches*, and they represent different, coexisting versions of your project. The `fetch` command adds your collaborator's branch to your local repository, but keeps yours as well.

In [14]:
%%bash
cd /tmp/github_tutorial
git fetch course

From github.com:cs109/Testing
 * [new branch]      master     -> course/master


You can see that a copy of a new remote branch has been made below, by providing the `-avv` argument to `git branch`.

In [15]:
%%bash
cd /tmp/github_tutorial
git branch -avv

* master                15471cd [origin/master] Said hello to myself
  remotes/course/master 7ba94f7 added a line in readme
  remotes/origin/HEAD   -> origin/master
  remotes/origin/master 15471cd Said hello to myself


Indeed, the way git works is by creating copies of remote branches locally. Then it just compares to these "copy" branches to see what changes have been made.


## Merge

----
`merge`

![merge](git_merge.png)

Having multiple branches is fine, but at some point, you'll want to combine the changes that you've made with those made by others. This is called merging.

There are two general cases when merging two branches: 

1. The two branches are different but the changes are in unrelated places; 
2. The two branches are different and the changes are in the same locations in the same files.

The first scenario is easy. Git will simply apply both sets of changes to the appropriate places and put the resulting files into the staging area for you. Then you can commit the changes and push them back to the upstream repository. Your collaborator does the same, and everyone sees everything.

The second scenario is more complicated. Let's say the two changes set some variable to different values. Git can't know which is the correct value. One solution would be to simply use the more recent change, but this very easily leads to self-inconsistent programs. A more conservative solution, and the one git uses, is to simply leave the decision to the user. When git detects a conflict that it cannot resolve, `merge` fails, and git places a modified version of the offending file in your project directory. **This is important:** the file that git puts into your directory is not actually *either* of the originals. It is a new file that has special markings around the locations that conflicted. We shall not consider this case in this lab, deferring it to a time just before you work with other collaborators on your projects.


In [16]:
%%bash
cd /tmp/github_tutorial
git merge course/master
git status

Already up-to-date.
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working directory clean


We seem to be ahead of our upstream-tracking repository by 2 commits..why?

In [17]:
%%bash
cd /tmp/github_tutorial
git log -3

commit 15471cd871b642a45e2477041a6667bec184d7ad
Author: Rahul Dave <rahuldave@gmail.com>
Date:   Wed Sep 2 22:43:29 2015 -0400

    Said hello to myself

commit 7ba94f7e4ed9506b0b3eab02e03d516c2b38977e
Author: Rahul Dave <rahuldave@gmail.com>
Date:   Tue Sep 1 22:53:03 2015 -0400

    added a line in readme

commit 3a2909aa790f041f1426e51565674165e52acdc8
Author: Rahul Dave <rahuldave@gmail.com>
Date:   Fri Aug 28 02:44:21 2015 -0400

    Added a test file to demonstrate git features


Aha: one commit came from the `course` upstream, and one was a merge commit. In the case you had edited the README.md at the same time and comitted locally, you would have been asked to resolve the conflict in the merge (the second case above).

Lets push these changes to the origin now

In [18]:
%%bash
cd /tmp/github_tutorial
git push
git status

On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working directory clean


Everything up-to-date


You can combine a fetch and a merge together by simply doing a git `pull`. This will fail if you and your collaborator have worked on the same file (since you will have to merge by hand), but is a great shortcut when the files  worked on are different. I use it all the times on a personal level too, to shift work between two different machines, as long as I am not working on both at the same time. The usual use case is day work on a work computer, and then evening work at home on the laptop. Read the docs if you are interested.

## Branch

As you might have seen by now, everything in git is a branch. We have branches on remote (upstream) repositories, copies of remote branches in our local repository, and branches on local repositories which (so far) track remote branches (or more precisely local copies of remote repositories)

In [19]:
%%bash
cd /tmp/github_tutorial
git branch -avv

* master                15471cd [origin/master] Said hello to myself
  remotes/course/master 7ba94f7 added a line in readme
  remotes/origin/HEAD   -> origin/master
  remotes/origin/master 15471cd Said hello to myself


And all of these branches are nothing but commit-streams in disguise, as can be seen above. Its a very simple model which leads to a lot of interesting version control patterns.

Since branches are so light-weight, the recommended way of working on software using git is to create a new branch for each new feature you add, test it out, and if good, merge it into master. Then you deploy the software from master. But we have been using branches under the hood. Lets now lift the hood.

----
`branch`

![branch](git_branch.png)

Branches can also be created manually, and they are a useful way of organizing unfinished changes.

The `branch` command has two forms. The first:

`git branch`

simply lists all of the branches in your local repository. If you run it without having created any branches, it will list only one, called `master`. This is the default branch. You have seen the use of `git branch -avv` to show all branches.

The other form creates a branch with a given name:

It's important to note that the other branch is not *active*. If you make changes, they will still apply to the `master` branch, not `my-new-branch`. To change this, you need the next command.

## Checkout

----
`checkout`

![checkout](git_checkout.png)

Checkout switches the active branch. Since branches can have different changes, `checkout` may make the working directory look very different. For instance, if you have added new files to one branch, and then check another branch out, those files will no longer show up in the directory. They are still stored in the `.git` folder, but since they only exist in the other branch, they cannot be accessed until you check out the original branch.

You can combine creating a new branch and checking it out with the shortcut:

Ok so lets try this out on our repository....

In [20]:
%%bash
cd /tmp/github_tutorial
git branch mybranch1

See what branches we have created...

In [21]:
%%bash
cd /tmp/github_tutorial
git branch 

* master
  mybranch1


Jump onto the `mybranch1` branch...

In [22]:
%%bash
cd /tmp/Testing
git checkout mybranch1
git branch

  master
* mybranch1


Switched to branch 'mybranch1'


Notice that it is bootstrapped off the `master` branch and has the same files.

In [23]:
%%bash
cd /tmp/github_tutorial
ls

LICENSE
README.md
hello.md


You could have created this branch using `git checkout -b mybranch1`. Lets see the status here.

In [24]:
%%bash
cd /tmp/github_tutorial
git status

On branch mybranch1
nothing to commit, working directory clean


Lets add a new file...note that this file gets added on this branch only

In [25]:
%%bash
cd /tmp/github_tutorial
echo '# Hello world' > world.md
git status

On branch mybranch1
Untracked files:
  (use "git add <file>..." to include in what will be committed)

	world.md

nothing added to commit but untracked files present (use "git add" to track)


We add the file to the index, and then commit the files to the local repository on the `mybranch` branch.

In [26]:
%%bash
cd /tmp/github_tutorial
git add .
git status


On branch mybranch1
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

	new file:   world.md



In [27]:
%%bash
cd /tmp/github_tutorial
git commit -m "Added another test file to demonstrate git features" -a
git status

[mybranch1 b673a9d] Added another test file to demonstrate git features
 1 file changed, 1 insertion(+)
 create mode 100644 world.md
On branch mybranch1
nothing to commit, working directory clean


Ok we have committed. Lets try to push!

In [28]:
%%bash
cd /tmp/github_tutorial
git push

fatal: The current branch mybranch1 has no upstream branch.
To push the current branch and set the remote as upstream, use

    git push --set-upstream origin mybranch1



Oops that failed. Why? git didnt know what to push to on origin, and didnt want to assume we wanted to call the branch `mybranch1` on the remote. We need to tell that to git explicitly, as it tells us to.

In [29]:
%%bash
cd /tmp/github_tutorial
git push --set-upstream origin mybranch1

Branch mybranch1 set up to track remote branch mybranch1 from origin.


To git@github.com:rahuldave/Testing.git
 * [new branch]      mybranch1 -> mybranch1


Aha, now we are set my with both a remote and a local for `mybranch1`

In [30]:
%%bash
cd /tmp/github_tutorial
git branch -avv

  master                   15471cd [origin/master] Said hello to myself
* mybranch1                b673a9d [origin/mybranch1] Added another test file to demonstrate git features
  remotes/course/master    7ba94f7 added a line in readme
  remotes/origin/HEAD      -> origin/master
  remotes/origin/master    15471cd Said hello to myself
  remotes/origin/mybranch1 b673a9d Added another test file to demonstrate git features


In [31]:
%%bash
cd /tmp/github_tutorial
git checkout master

Your branch is up-to-date with 'origin/master'.


Switched to branch 'master'


###Recovering from a mistake

Now suppose for a second that this `mybranch1` was created by someone else. We wanted to get it down using `fetch` and play. But we called `pull`, which did an automatic merge for us.

In [32]:
%%bash
cd /tmp/github_tutorial
git pull origin mybranch1

Updating 15471cd..b673a9d
Fast-forward
 world.md | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 world.md


From github.com:rahuldave/Testing
 * branch            mybranch1  -> FETCH_HEAD


In [33]:
%%bash
cd /tmp/github_tutorial
git status

On branch master
Your branch is ahead of 'origin/master' by 1 commit.
  (use "git push" to publish your local commits)
nothing to commit, working directory clean


oops, that was not what we wanted. We undo it using `git reset`, to go back to the state at the last commit.

In [34]:
%%bash
cd /tmp/github_tutorial
git reset --hard origin/master

HEAD is now at 15471cd Said hello to myself


In [35]:
%%bash
cd /tmp/github_tutorial
git status

On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working directory clean


###Changing only one file

The last git command I want to talk about, since it might come handy in the course, is the situation in which you dont want to merge an entire branch from the upstream, but just one file from it. There is a direct use case for it. Indeed, suppose I've made an error in this lab and want to correct it. So i fix it in the upstream. In the meanwhile you have edited some other files. You dont care to manually ignore my older copies of those files. So you want to fix just one file from this new branch. This is how you do it.

First you fetch from the remote (i ought to be doing this with the course remote but just want to illustrate it here)

In [36]:
%%bash
cd /tmp/github_tutorial
git fetch origin

In [37]:
%%bash
cd /tmp/github_tutorial
git branch -avv

* master                   15471cd [origin/master] Said hello to myself
  mybranch1                b673a9d [origin/mybranch1] Added another test file to demonstrate git features
  remotes/course/master    7ba94f7 added a line in readme
  remotes/origin/HEAD      -> origin/master
  remotes/origin/master    15471cd Said hello to myself
  remotes/origin/mybranch1 b673a9d Added another test file to demonstrate git features


In [38]:
%%bash
cd /tmp/github_tutorial
git checkout origin/mybranch1 -- world.md

Why does the syntax have `origin/mybranch1`? Remember that multiple remotes may have the same branch...so you want to be specific. So we check out `origin`'s `mybranch1`, and only want `world.md`.

In [39]:
%%bash
cd /tmp/github_tutorial
git status

On branch master
Your branch is up-to-date with 'origin/master'.
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

	new file:   world.md



Note that the file as automatically added to the index. We'll commit and push.

In [40]:
%%bash
cd /tmp/github_tutorial
git commit -m "want world in master" 

[master 877fc2d] want world in master
 1 file changed, 1 insertion(+)
 create mode 100644 world.md


In [41]:
%%bash
cd /tmp/github_tutorial
git status

On branch master
Your branch is ahead of 'origin/master' by 1 commit.
  (use "git push" to publish your local commits)
nothing to commit, working directory clean


In [42]:
%%bash
cd /tmp/github_tutorial
git push

To git@github.com:rahuldave/Testing.git
   15471cd..877fc2d  master -> master


Note that in git versions > 2.0, which you must use, `git push` will push the current branch to its appropriate remote, all of which can be seen through `git branch -avv` or by looking at the `config` file.

In [43]:
%%bash
cd /tmp/github_tutorial
cat .git/config

[core]
	repositoryformatversion = 0
	filemode = true
	bare = false
	logallrefupdates = true
	ignorecase = true
	precomposeunicode = true
[remote "origin"]
	url = git@github.com:rahuldave/Testing.git
	fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
	remote = origin
	merge = refs/heads/master
[remote "course"]
	url = git@github.com:cs109/Testing.git
	fetch = +refs/heads/*:refs/remotes/course/*
[branch "mybranch1"]
	remote = origin
	merge = refs/heads/mybranch1


----

## Git habits

** * Commit early, commit often. * **

** * Commit unrelated changes separately. * **

** * Do not commit binaries and other temporary files. * **

** * Ignore files which should not be committed * **

** * Always make a branch for new changes * **

** * Write good commit messages * **
