# Introduction to Version Control and Git

**The following content is adapted (and greatly contracted) from git course maintained by** [Software Carpentry](http://software-carpentry.org)

Version control systems start with a base version of the document and
then save just the changes you made at each step of the way. You can
think of it as a tape: if you rewind the tape and start at the base
document, then you can play back each change and end up with your
latest version.

![Changes Are Saved Sequentially](figures/play_changes.png)

Once you think of changes as separate from the document itself, you
can then think about "playing back" different sets of changes onto the
base document and getting different versions of the document. For
example, two users can make independent sets of changes based on the
same document.

![Different Versions Can be Saved](figures/versions.png)

If there aren't conflicts, you can even play two sets of changes onto the same base document.

![Multiple Versions Can be Merged](figures/merge.png)

A version control system is a tool that keeps track of these changes for us and
helps us version and merge our files. It allows you to
decide which changes make up the next version, called a
[commit]({{ page.root }}/reference/#commit), and keeps useful metadata about them. The
complete history of commits for a particular project and their metadata make up
a [repository]({{ page.root }}/reference/#repository). Repositories can be kept in sync
across different computers facilitating collaboration among different people.

# Versioning edits with Git

##### Questions:
- "How do I record changes in Git?"
- "How do I record notes about what changes I made and why?"

##### Objectives:
- "Go through the modify-add-commit cycle for one or more files."
- "Explain where information is stored at each stage of Git commit workflow."

##### Keypoints:
- "Files can be stored in a project's working directory (which users see), the staging area (where the next commit is being built up) and the local repository (where commits are permanently recorded)."
- "`git add` puts files in the staging area."
- "`git commit` saves the staged content as a new commit in the local repository."
- "Always write a log message when committing changes."
- "View previous commits using the `git log` command."

##  Repository setup

For a quick refresher in git we will use a repository we have all, already contributed to...

In [2]:
git clone https://github.com/leej3/nimh_repro_wrkshpAug2017.git
cd nimh_repro_wrkshpAug2017
git status

Cloning into 'nimh_repro_wrkshpAug2017'...
remote: Counting objects: 123, done.[K
remote: Compressing objects:   7% (1/13)   [Kremote: Compressing objects:  15% (2/13)   [Kremote: Compressing objects:  23% (3/13)   [Kremote: Compressing objects:  30% (4/13)   [Kremote: Compressing objects:  38% (5/13)   [Kremote: Compressing objects:  46% (6/13)   [Kremote: Compressing objects:  53% (7/13)   [Kremote: Compressing objects:  61% (8/13)   [Kremote: Compressing objects:  69% (9/13)   [Kremote: Compressing objects:  76% (10/13)   [Kremote: Compressing objects:  84% (11/13)   [Kremote: Compressing objects:  92% (12/13)   [Kremote: Compressing objects: 100% (13/13)   [Kremote: Compressing objects: 100% (13/13), done.[K
remote: Total 123 (delta 4), reused 10 (delta 1), pack-reused 109[K
Receiving objects:   0% (1/123)   Receiving objects:   1% (2/123)   Receiving objects:   2% (3/123)   Receiving objects:   3% (4/123)   Receiving objects:   4% (5/123)   Re

## The Git life-cycle

> ##### The staging area helps to keep track of different changes
> 
> If you think of Git as taking snapshots of changes over the life of a
> project, "git add" specifies *what* will go in a snapshot (putting things in
> the staging area), and "git commit" then *actually takes* the snapshot, and
> makes a permanent record of it (as a commit). If you don't have anything
> staged when you type "git commit", Git will prompt you to use "git commit -a"
> or "git commit --all", which is kind of like gathering *everyone* for the
> picture! However, it's almost always better to explicitly add things to the
> staging area, because you might commit changes you forgot you made. Try to
> stage things manually, or you might find yourself searching for "git undo
> commit" more than you would like!

> ![](figures/git_local_overview.png)



We shall go through the typical git life-cycle of 
+ making a change
+ adding our change to the staging area
+ committing the staged changes

To start let us all edit the participant.txt and add our favourite animal noise...

Once we've done that we stage our change:

Having staged the file we can now commit our change providing a useful commit message:

When we run "git commit", Git takes everything we have told it to save by using
"git add" and stores a copy permanently inside the special `.git` directory.
This permanent copy is called a commit (or
revision and its short identifier is
an alpha-numeric string within the square brackets on the first line of the output above.

We use the `-m` flag (for "message") to record a short, descriptive, and
specific comment that will help us remember later on what we did and why. If we
just run "git commit" without the `-m` option, Git will launch 
whatever editor we have configured as `core.editor` with which we can write a
longer message.

[Good commit messages](http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html) start with a brief (<50 characters) summary of
changes made in the commit.  If you want to go into more detail, add
a blank line between the summary line and your additional notes.

Now when we run "git status" we see:

In [17]:
git status

On branch master
nothing to commit, working directory clean


Our file is now once more in an unmodified
state. When we look at our repository's history we can observe our commit. For
this, we use "git log" (we can include the "-p" flag to view the actual changes we made to the file):

"git log" lists all commits  made to a repository in reverse chronological
order. The listing for each commit includes the commit's full identifier (which
starts with the same characters as the short identifier printed by the `git
commit` command earlier), the commit's author, when it was created, and the log
message Git was given when the commit was created.

## Other useful commands for work with a local git repo

We have covered some of the most basic commands for working with git for our own repository. There are many other commands that make common tasks more convenient though:

![](figures/git_local_repository.png)

Modified from [blog-pedrezo.com](http://blog.podrezo.com/git-introduction-for-cvssvntfs-users/)

# Collaboration with Git and GitHub

##### Questions:
- "How do I share my changes with others on the web?"
- "How can I use version control to collaborate with other people?"
- "What do I do when my changes conflict with someone else's?"

##### Objectives:
- "Explain what remote repositories are and why they are useful."
- "Push to or pull from a remote repository."
- "Clone a remote repository."
- "Collaborate  by forking a repository and submitting a pull request."
- "Explain what conflicts are and when they can occur."
- "Resolve conflicts resulting from a merge."

##### Keypoints:
- "A local Git repository can be connected to one or more remote repositories."
- "Use the HTTPS protocol to connect to remote repositories until you have learned how to set up SSH."
- "`git push` copies changes from a local repository to a remote repository."
- "`git pull` copies changes from a remote repository to a local repository."
- "`git clone` copies a remote repository to create a local repository with a remote called `origin` automatically set up."
- "Conflicts occur when two or more people change the same file(s) at the same time."
- "The version control system does not allow people to overwrite each other's changes blindly, but highlights conflicts so that they can be resolved."

<img src="figures/git_remote_intro.png" alt="Drawing" style="width: 800px;"/>


##  Repositories on GitHub

Version control really comes into its own when we begin to collaborate with
other people.  We already have most of the machinery we need to do this; the
only thing missing is to copy changes from one repository to another.

Systems like Git allow us to move work between any two repositories.  In
practice, though, it's easiest to use one copy as a central hub, and to keep it
on the web rather than on someone's laptop.  Most programmers use hosting
services like [GitHub](http://github.com), [BitBucket](http://bitbucket.org) or
[GitLab](http://gitlab.com/) to hold those master copies.

> ##### HTTPS vs. SSH

> We use HTTPS here because it does not require additional configuration.  After
> the workshop you may want to set up SSH access, which is a bit more secure, by
> following one of the great tutorials from
> [GitHub](https://help.github.com/articles/generating-ssh-keys),
> [Atlassian/BitBucket](https://confluence.atlassian.com/display/BITBUCKET/Set+up+SSH+for+Git)
> and [GitLab](https://about.gitlab.com/2014/03/04/add-ssh-key-screencast/)
> (this one has a screencast).

![](figures/git-operations.png)

## Working with more than one remote repository

![](figures/git_remote_collaboration.png)

### Overview of collaboration workflow

1. The collaborator forks the owners repository.
1. The collaborator makes an edit.
1. The collaborator submits a pull request for this edit.
1. The owner reviews and accepts this edit to merge the changes into their original repository.