# Version control with Git <a class="tocSkip", name="chap:git"></a>

<h1>Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#What-is-Version-Control?" data-toc-modified-id="What-is-Version-Control?-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>What is Version Control?</a></span></li><li><span><a href="#Why-Version-Control?" data-toc-modified-id="Why-Version-Control?-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Why Version Control?</a></span></li><li><span><a href="#git" data-toc-modified-id="git-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>git</a></span></li><li><span><a href="#Your-first-repository" data-toc-modified-id="Your-first-repository-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Your first repository</a></span></li><li><span><a href="#git-commands" data-toc-modified-id="git-commands-5"><span class="toc-item-num">5&nbsp;&nbsp;</span><span>git</span> commands</a></span><ul class="toc-item"><li><span><a href="#git-command-structure-{#ssec:git_comds}" data-toc-modified-id="git-command-structure-{#ssec:git_comds}-5.1"><span class="toc-item-num">5.1&nbsp;&nbsp;</span><span>git</span> command structure {#ssec:git_comds}</a></span></li></ul></li><li><span><a href="#Ignoring-Files" data-toc-modified-id="Ignoring-Files-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Ignoring Files</a></span><ul class="toc-item"><li><span><a href="#Dealing-with-binary-files" data-toc-modified-id="Dealing-with-binary-files-6.1"><span class="toc-item-num">6.1&nbsp;&nbsp;</span>Dealing with binary files</a></span></li><li><span><a href="#Dealing-with-large-files" data-toc-modified-id="Dealing-with-large-files-6.2"><span class="toc-item-num">6.2&nbsp;&nbsp;</span>Dealing with large files</a></span></li></ul></li><li><span><a href="#Removing-files" data-toc-modified-id="Removing-files-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>Removing files</a></span></li><li><span><a href="#Accessing-history-of-the-repository" data-toc-modified-id="Accessing-history-of-the-repository-8"><span class="toc-item-num">8&nbsp;&nbsp;</span>Accessing history of the repository</a></span></li><li><span><a href="#Reverting-to-a-previous-version" data-toc-modified-id="Reverting-to-a-previous-version-9"><span class="toc-item-num">9&nbsp;&nbsp;</span>Reverting to a previous version</a></span></li><li><span><a href="#Branching" data-toc-modified-id="Branching-10"><span class="toc-item-num">10&nbsp;&nbsp;</span>Branching</a></span></li><li><span><a href="#Running-git-commands-on-a-different-directory" data-toc-modified-id="Running-git-commands-on-a-different-directory-11"><span class="toc-item-num">11&nbsp;&nbsp;</span>Running git commands on a different directory</a></span></li><li><span><a href="#Running-git-commands-on-multiple-repositories-at-once" data-toc-modified-id="Running-git-commands-on-multiple-repositories-at-once-12"><span class="toc-item-num">12&nbsp;&nbsp;</span>Running git commands on multiple repositories at once</a></span><ul class="toc-item"><li><span><a href="#Practicals" data-toc-modified-id="Practicals-12.1"><span class="toc-item-num">12.1&nbsp;&nbsp;</span>Practicals</a></span></li></ul></li><li><span><a href="#Practical-wrap-up" data-toc-modified-id="Practical-wrap-up-13"><span class="toc-item-num">13&nbsp;&nbsp;</span>Practical wrap-up</a></span></li><li><span><a href="#Readings-&amp;-Resources" data-toc-modified-id="Readings-&amp;-Resources-14"><span class="toc-item-num">14&nbsp;&nbsp;</span>Readings &amp; Resources</a></span></li></ul></div>

## What is Version Control?

Version control, also known as revision control or source control, is the management and tracking of changes to documents, computer programs, large web sites, and other collections of information in an automated way.

Any project (collections of files in directories) under version control has changes and additions/deletions to its files and directories recorded and archived over time so that you can recall specific versions later. With version control of biological computing projects, you can:

1. record of all changes made to a set of files and directories, including text (usually ASCII) data files, so that you can access any previous version of the files

* branch (and merge) new projects

* "roll back" data, code, documents that are in plain text format (other file formats can also be versioned; see section on binary files below).

Version control (usually git) is in fact the technology embedded in the versioning of various word processor and spreadsheet applications (e.g., Google Docs, Sheets, Overleaf).

![A general idea of how version control works.](VC.png){width=".5\textwidth"}


## Why Version Control?

<span>![image](cvs.png){width=".5\textwidth"}\
maktoons.blogspot.com/2009/06/if-dont-use-version-control-system.html</span>

Or here’s another one:
<http://www.phdcomics.com/comics/archive/phd101212s.gif>

## git

We will use git, developed by Linus Torvalds, the “Linu” in Linux. In git each user stores a complete local copy of the project, including the history and all versions. So you do not rely as much on a centralized (remote) server. We will use bitbucket.org – it gives you unlimited free private repositories if you register with an academic email! First, install and configure `git`:

## Your first repository

Time to bring your `CMEECourseWork` under version control:

Nothing has been sent to a remote server yet (see [section below](#ssec:git_comds))! So let's go to your git service (bitbucket or github) and setup:

$\star$ Login to your bitbucket or github account

Set up your `ssh` based access

bitbucket:
<https://confluence.atlassian.com/bitbucket/set-up-ssh-for-git-728138079.html>

github: <https://help.github.com/articles/connecting-to-github-with-ssh>

Then create repository there with name <span>CMEECourseWork</span>

Then grab the repository url and use <span>git remote add origin
https...</span>

bitbucket:
<https://confluence.atlassian.com/bitbucket/set-up-a-repository-877174034.html>

github:
<https://help.github.com/articles/adding-an-existing-project-to-github-using-the-command-line/>

You are done. Now let’s learn to use git!

<span>git</span> commands
-------------------------

Here are some basic git commands:

  -------------------------------- --------------------------------------------
  <span>git init</span>            Initialize a new repository
  <span>git clone</span>           Download a repository from a remote server
  <span>git status</span>          Show the current status
  <span>git diff</span>            Show differences between commits
  <span>git blame</span>           Blame somebody for the changes!
  <span>git log</span>             Show commit history
  <span>git commit</span>          Commit changes to current branch
  <span>git branch</span>          Show branches
  <span>git branch name</span>     Create new branch
  <span>git checkout name</span>   Switch to a different commit/branch
  <span>git pull</span>            Upload from remote repository
  <span>git push</span>            Send changes to remote repository
  -------------------------------- --------------------------------------------

### <span>git</span> command structure {#ssec:git_comds}

Here is a graphical outline of the git command structure. Note that only
when you <span>push</span> or <span>fetch</span> do you need an internet
connection, as before that you are only archiving in a local (hidden)
repository.

![image](git.png){width=".6\textwidth"}

Keep in mind, the main mantra is, “commit often, comment always”!

![image](git_xkcd.png){width=".6\textwidth"}

Ignoring Files
--------------

You will have some files you don’t want to track (log files, temporary
files, executables, etc). You can ignore entire classes of files with
<span>.gitignore</span> (be in your <span>CMEECourseWork</span>!):

In [None]:
$ echo -e "*~ \n*.tmp" > .gitignore

$ cat .gitignore
*~
*.tmp

$ git add .gitignore

$ touch temporary.tmp

$ git add *
The following paths are ignored by one of your .gitignore 
files:
temporary.tmp
Use -f if you really want to add them.
fatal: no files added

You can also create a global gitignore file that lists rules for files
to be ignored in every Git repository on your computer:
<https://help.github.com/articles/ignoring-files/>

### Dealing with binary files

A binary file is computer-readable but not human-readable, that is, it
cannot be read by opening them in a text viewer. Examples of binary
files include compiled executables, zip files, images, word documents
and videos. In contrast, text files are stored in a form (usually ASCII)
that is human-readable by opening in a suitable text reader (e.g.,
geany, gedit). Without some git extensions and configurations (coming up
next), binary files cannot be properly version-controlled because each
version of the entire file is saved <span>*as is*</span> in a hidden
directory in the repository (<span>.git</span>).

However, with some more effort, git can be made to work for binary
formats like \*.docx or image formats such as \*.jpeg, but it is harder
to compare versions; have a look at
<https://git-scm.com/docs/gitattributes> and
<https://git-scm.com/book/en/v2/Customizing-Git-Git-Attributes>[^1]

Also see:
<https://opensource.com/life/16/8/how-manage-binary-blobs-git-part-7>

### Dealing with large files

As such, git was designed for version control of workflows and software
projects, <span>*not*</span> large files (say, &gt;100mb) (which may be
plain-text or binary). Binary files are particularly problematic because
each version of the file is saved <span>*as is*</span> in `.git`, when
you have a large number of versions it means that there are the same
number of binary files in the hidden directory (for example 100 $\times$
&gt;100mb files!).

In this course at least, you should not try to keep large files
(especially binary files under version control). You will run into this
problem in the GIS week (where you will have to handle and store large
raster image files) in particular [^2]. We suggest that you include
files larger than some size in your <span>.gitignore</span>. For
example, you can use the following bash command:

In [None]:
find . -size +100M | cat >> .gitignore  

The 100M means 100 mb – you can reset it to whatever you want.

You may also explore alternatives such as <span>git-annex</span> (e.g.,
see <https://git-annex.branchable.com/>), and <span>git-lfs</span>
(e.g., see <https://www.atlassian.com/git/tutorials/git-lfs>).

Removing files
--------------

To remove a file (i.e. stop version controlling it) use <span>git
rm</span>:

In [None]:
$ echo "Text in a file to remove" > FileToRem.txt

$ git add FileToRem.txt

$ git commit -am "added a new file that we'll remove later"
master 5df9e96 added a new file that we'll remove later
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 FileToRem.txt

$ git rm FileToRem.txt
rm 'FileToRem.txt'

$ git commit -am "removed the file"
master b9f0b1a removed the file
 1 files changed, 0 insertions(+), 1 deletions(-)
 delete mode 100644 FileToRem.txt

I typically just do all my stuff and then just use <span>git add
-A</span>

Accessing history of the repository
-----------------------------------

To see particular changes introduced, read the repo’s log :

In [None]:
$ git log
commit 08b5c1c78c8181d4606d37594681fdcfca3149ec
Author: Your Name <your.login@imperial.ac.uk>
Date:   Wed Oct 8 16:41:51 2014 -0500

    removed the file

commit 13f701775bce71998abe4dd1c48a4df8ed76c08b
Author: Your Name <your.login@imperial.ac.uk>
Date:   Wed Oct 5 16:41:16 2015 -0500

    added a new file that we'll remove later

commit a228dd3d5b1921ef18c5efd926ef11ca47306ed5
Author: Your Name <your.login@imperial.ac.uk>
Date:   Wed Oct 5 10:03:40 2015 -0500

    Added README file

For a more detailed version, add <span>-p</span> at the end.

Reverting to a previous version
-------------------------------

If things go horribly wrong with new changes, you can revert to the
previous, “pristine” state:

In [None]:
$ git reset --hard
$ git commit -am "returned to previous state" #Note I used -am here

If instead you want to move back in time (temporarily), first find the
“hash” for the commit you want to revert to, and then check-out:

In [None]:
$ git status
# On branch master
nothing to commit (working directory clean)

$ git log
commit c797824c9acbc59767a3931473aa3c53b6834aae
Author: Your Name <your.login@imperial.ac.uk>
Date:   Wed Aug 22 16:59:02 2014 -0500
.
.
.

$ git checkout c79782

Now you can play around. However, if you commit changes, you create a
“branch” (git plays safe!). To go back to the future, type <span>git
checkout master</span>

Branching
---------

Imagine you want to try something out, but you’re not sure it will work
well. For example, say you want to rewrite the Introduction of your
paper, using a different angle, or you want to see whether switching to
a library for a piece of code improves speed. What you then need is
branching, which creates a project copy in which you can experiment:

In [None]:
$ git branch anexperiment

$ git branch
  anexperiment
* master

$ git checkout anexperiment 
Switched to branch 'anexperiment'

$ git branch 
* anexperiment
  master

$ echo "Do I like this better?" >> README.txt 

$ git commit -am "Testing experimental branch"
[anexperiment 9f17dc1] Testing experimental branch
 1 files changed, 2 insertions(+), 0 deletions(-)

If you decide to merge the new branch after modifying it:

In [None]:
$ git checkout master

$ git merge anexperiment
Updating 08b5c1c..9f17dc1
Fast-forward
 README.txt |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

$ cat README.txt 
My CMEE 2015-16 Coursework Repository
Do I like this better?

If there are no conflicts (i.e., some files that you changed also
changed in the master in the meantime), you are done, and you can delete
the branch:

In [None]:
$ git branch -d anexperiment
Deleted branch anexperiment (was 9f17dc1).

If instead you are not satisfied with the result, and you want to
abandon the branch:

In [None]:
$ git branch -D anexperiment

When you want to test something out, always branch! Reverting changes,
especially in code, is typically painful. Merging can be tricky,
especially if multiple people have simultaneously worked on a particular
document. In the worst-case scenario, you may want to delete the local
copy and re-clone the remote repository.

![image](git_xkcd_1.png){width=".4\textwidth"}

Running git commands on a different directory
---------------------------------------------

Since <span>git</span> version 1.8.5, you can run git directly on a
different directory than the current one using absolute or relative
paths. For example, using a relative path, you can do:

In [None]:
git -C ../SomeDir/ status

Running git commands on multiple repositories at once
-----------------------------------------------------

For git pulling in multiple subdirectories (each a separate repository):

In [None]:
$ find . -mindepth 1 -maxdepth 1 -type d -print -exec git -C {} pull \;

Breaking down these commands one by one,

<span>find .</span> searches the current directory\
<span>-type d</span> to find directories, not files\
<span>-mindepth 1</span> for setting min search depth to one
sub-directory\
<span>-maxdepth 1</span> for setting max search depth to one
sub-directory\
<span>-exec git -C {} pull $\textbackslash$ </span> runs a custom git
command for every git repo found\

### Practicals

1.  The only practical submission for <span>git</span> is the <span>
    .gitgnore</span> and overall git repository <span>readme</span> file
    — make sure these in your coursework repository.

In [None]:
And of course, if you haven’t gotten git with bitbucket going, you
won’t be able to submit any of your practicals anyway!

Practical wrap-up
-----------------

Invite me (s.pawar@imperial.ac.uk) to your <span>CMEECourseWork</span>
repository

The <span>CMEEMasteRepo</span> will contain data and code files for
upcoming practicals

You will clone <span>CMEEMasteRepo</span> using <span>git clone
git@bitbucket.org:mhasoba/cmee2015masterepo.git</span>

You will thereafter <span>git pull</span> <span>CMEEMasteRepo</span>

You will <span>git pull</span> inside <span>CMEEMasteRepo</span>
thereafter (always use <span>git status</span> first)

<span>cp</span> files from <span>CMEEMasteRepo</span> to your <span>
CMEECourseWork</span> as and when needed — don’t work in the amster
repo, as you will lose your work when I next update it!

Readings & Resources
--------------------

There is a wealth of information on <span>git</span> out there - just
google it!

Excellent book on Git: <http://git-scm.com/book>

Also, <https://www.atlassian.com/git/>

A git tutorial: <https://try.github.io>

[^1]: There you will find the following phrase: “...one of the most
    annoying problems known to humanity: version-controlling Microsoft
    Word documents.” . LOL!

[^2]: None of the computing weeks assessments will require you to use
    such large files anyway