# Introduction to Version Control and Git

#### Questions:
- "What is version control and why should I use it?"
- "How do I get set up to use Git?"
- "Where does Git store information?"

#### Objectives:
- "Understand the benefits of an automated version control system."
- "Understand the basics of how Git works."
- "Configure `git` the first time it is used on a computer."
- "Understand the meaning of the `--global` configuration flag."
- "Create a local Git repository."

#### Keypoints:
- "Version control is like an unlimited 'undo'."
- "Version control also allows many people to work in parallel."
- "All changes tracked by git are stored in the hidden repository '.git'."
-   "Use `git config` to configure a user name, email address, editor, and other preferences once per machine."
- "`git init` initializes a repository."
- "`git status` shows the status of a repository."

We'll start by exploring how version control can be used
to keep track of what one person did and when.
Even if you aren't collaborating with other people,
automated version control is much better than this situation:

[![Piled Higher and Deeper by Jorge Cham, http://www.phdcomics.com/comics/archive_print.php?comicid=1531](../fig/fig/phd101212s.png)](http://www.phdcomics.com)

"Piled Higher and Deeper" by Jorge Cham, http://www.phdcomics.com

We've all been in this situation before: it seems ridiculous to have
multiple nearly-identical versions of the same document. Some word
processors let us deal with this a little better, such as Microsoft
Word's "Track Changes" or Google Docs' [version
history](https://support.google.com/docs/answer/190843?hl=en).

Version control systems start with a base version of the document and
then save just the changes you made at each step of the way. You can
think of it as a tape: if you rewind the tape and start at the base
document, then you can play back each change and end up with your
latest version.

![Changes Are Saved Sequentially](../fig/fig/play-changes.svg)

Once you think of changes as separate from the document itself, you
can then think about "playing back" different sets of changes onto the
base document and getting different versions of the document. For
example, two users can make independent sets of changes based on the
same document.

![Different Versions Can be Saved](../fig/fig/versions.svg)

If there aren't conflicts, you can even play two sets of changes onto the same base document.

![Multiple Versions Can be Merged](../fig/fig/merge.svg)

A version control system is a tool that keeps track of these changes for us and
helps us version and merge our files. It allows you to
decide which changes make up the next version, called a
[commit]({{ page.root }}/reference/#commit), and keeps useful metadata about them. The
complete history of commits for a particular project and their metadata make up
a [repository]({{ page.root }}/reference/#repository). Repositories can be kept in sync
across different computers facilitating collaboration among different people.

> #### The Long History of Version Control Systems
>
> Automated version control systems are nothing new.
> Tools like RCS, CVS, or Subversion have been around since the early 1980s and are used by many large companies.
> However, many of these are now becoming considered as legacy systems due to various limitations in their capabilities.
> In particular, the more modern systems, such as Git and [Mercurial](http://swcarpentry.github.io/hg-novice/)
> are *distributed*, meaning that they do not need a centralized server to host the repository.
> These modern systems also include powerful merging tools that make it possible for multiple authors to work within
> the same files concurrently.
{: .callout}

> #### Paper Writing
>
> *   Imagine you drafted an excellent paragraph for a paper you are writing, but later ruin it. How would you retrieve
>     the *excellent* version of your conclusion? Is it even possible?
>
> *   Imagine you have 5 co-authors. How would you manage the changes and comments they make to your paper?
>     If you use LibreOffice Writer or Microsoft Word, what happens if you accept changes made using the
>     `Track Changes` option? Do you have a history of those changes?
{: .challenge}

## Setting up Git on our computer
When we use Git on a new computer for the first time, we need to configure a
[few things](http://swcarpentry.github.io/git-novice/02-setup/). Below are a
few examples of configurations we will set as we get started with Git:

*   our name and email address,
*   to colorize our output,
*   what our preferred text editor is,
*   and that we want to use these settings globally (i.e. for every project)

On a command line, Git commands are written as `git verb`,
where `verb` is what we actually want to do. So we should type:

In [5]:
git config --global color.ui "auto"


# Fill out your own details for the three commands below and remove the pound sign
# git config --global user.name "our name"
# git config --global core.editor "atom --wait"
# git config --global user.email "our email address"


This user name and email will be associated with your subsequent Git activity,
which means that any changes pushed to [GitHub](http://github.com/),
[BitBucket](http://bitbucket.org/), [GitLab](http://gitlab.com/) or another Git
host server in a later lesson will include this information. If you are
concerned about privacy, please review GitHub's
[instructions](https://help.github.com/articles/keeping-your-email-address-private/)
 for keeping your email address private.


The four commands we just ran above only need to be run once: the flag `--global` tells Git
to use the settings for every project, in your user account, on this computer.

You can check your settings at any time:

You can reconfigure these settings whenever you wish.

> #### Proxy
>
> In some networks you need to use a
> [proxy](https://en.wikipedia.org/wiki/Proxy_server). If this is the case, you
> may also need to tell Git about the proxy:
>
> > git config --global http.proxy proxy-url
> git config --global https.proxy proxy-url


> To disable the proxy, use
>
> > git config --global --unset http.proxy
> git config --global --unset https.proxy



> #### Git Help and Manual
>
> Always remember that if you forget a git command, you can access the list of command by using -h and access the git manual by using --help :
>
> > git config -h
> git config --help


## Setting up a project that we will track with git

To start using Git we must first create a git repository for a given project. 
We'll  first setup a basic project which we can start to track using git. 

The project setup consists of: 

1.  Creating a project directory
2. Changing our present directory to be this project directory
3. Adding a python script to this project directory.


#### 1.

In [6]:
mkdir reproducibility_workshop 

#### 2.

In [7]:
cd reproducibility_workshop    

#### 3.
This step can be complete by running the code below in bash or by opening a text editor, adding the code below (excluding the first and last line), and by saving the file to the project directory as generate_figure.py

In [8]:
cat << EOF > generate_figure.py
# coding: utf-8
import matplotlib as mpl
mpl.use('Agg')
import seaborn

tips = seaborn.load_dataset('tips')
seaborn_plot = seaborn.pairplot(tips)
seaborn_plot.savefig('tips_pairplot.png')
EOF

## Initializing a Git repository

Now we are in the project directory "reproducibility_workshop", and we have a python script that will generate a plot for us we can begin to use git to start tracking changes.

To turn this project directory into a git repository we now use the 'git init' command:

In [9]:
git init

Initialized empty Git repository in /gpfs/gsfs2/users/DSST/piday_course/reproducibility_workshop/.git/


In [10]:
ls

generate_figure.py


It appears that nothing has changed. But if we add the `-a` flag to show all
files, including hidden ones, we can see that Git has created a hidden directory
called `.git`:

In [11]:
ls -a

.  ..  generate_figure.py  .git


Git stores information about the project in this special sub-directory. If we
ever delete it, we will lose the project's history.

We can check that everything is set up correctly by asking Git to tell us the
status of our project. It should display the following text:

In [12]:
git status

On branch master

Initial commit

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	[31mgenerate_figure.py[m

nothing added to commit but untracked files present (use "git add" to track)


## Versioning edits with Git

#### Questions:
- "How do I record changes in Git?"
- "How do I record notes about what changes I made and why?"

#### Objectives:
- "Go through the modify-add-commit cycle for one or more files."
- "Explain where information is stored at each stage of Git commit workflow."

#### Keypoints:
- "Files can be stored in a project's working directory (which users see), the staging area (where the next commit is being built up) and the local repository (where commits are permanently recorded)."
- "`git add` puts files in the staging area."
- "`git commit` saves the staged content as a new commit in the local repository."
- "Always write a log message when committing changes."
- "View previous commits using the `git log` command."

As we saw previously the status of our git repository shows us that we have
an untracked file . The first step in tracking a file in Git is to add it to the Git staging area.

![The file lifecycle in git](../fig/git_add.png)
Modified figure from git-scm.com

In order to add a file to the Git staging area we use "git add":

In [17]:
git add generate_figure.py

We check how this changed the way Git see our current project with the "git status" command once again:

In [18]:
git status

On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

	[32mnew file:   generate_figure.py[m



Git now knows that it's supposed to keep track of 'generate_figure.py', but
it hasn't recorded these changes permanently in its repository yet. To
permanently store the current state of the generate_figure.py file in the
Git repository we need to commit the changes that are staged. We use the `git
commit` command for this:

In [19]:
git commit -m "add script to generate figure"

[master (root-commit) 085fc56] add script to generate figure
 1 file changed, 8 insertions(+)
 create mode 100644 generate_figure.py


> ## The staging area helps to keep track of different changes
> 
> If you think of Git as taking snapshots of changes over the life of a
> project, "git add" specifies *what* will go in a snapshot (putting things in
> the staging area), and "git commit" then *actually takes* the snapshot, and
> makes a permanent record of it (as a commit). If you don't have anything
> staged when you type "git commit", Git will prompt you to use "git commit -a"
> or "git commit --all", which is kind of like gathering *everyone* for the
> picture! However, it's almost always better to explicitly add things to the
> staging area, because you might commit changes you forgot you made. Try to
> stage things manually, or you might find yourself searching for "git undo
> commit" more than you would like!

When we run "git commit", Git takes everything we have told it to save by using
"git add" and stores a copy permanently inside the special `.git` directory.
This permanent copy is called a [commit]({{ page.root }}/reference/#commit) (or
[revision]({{ page.root }}/reference/#revision)) and its short identifier is
an alpha-numeric string within the square brackets on the first line of the output above.

We use the `-m` flag (for "message") to record a short, descriptive, and
specific comment that will help us remember later on what we did and why. If we
just run "git commit" without the `-m` option, Git will launch `atom` (or
whatever other editor we configured as `core.editor`) so that we can write a
longer message.

[Good commit messages][commit-messages] start with a brief (<50 characters) summary of
changes made in the commit.  If you want to go into more detail, add
a blank line between the summary line and your additional notes.

Now when we run "git status" we see:

In [20]:
git status

On branch master
nothing to commit, working directory clean


Not only is the generate_figure.py file now tracked but it is also
no longer part of the output of  "git status". It is now in the unmodified
state. When we look at our repository's history we can observe our commit. For
this, we use "git log":

In [21]:
git log

[33mcommit 085fc5646fe9ad20abf294300babc464edf35fa1[m
Author: our name <our email address>
Date:   Thu Apr 27 16:22:05 2017 -0400

    add script to generate figure


"git log" lists all commits  made to a repository in reverse chronological
order. The listing for each commit includes the commit's full identifier (which
starts with the same characters as the short identifier printed by the `git
commit` command earlier), the commit's author, when it was created, and the log
message Git was given when the commit was created.

## Where Are My Changes?
At this point there is has been no obvious change to the filesystem:

In [22]:
ls

generate_figure.py


There are no obvious changes observed in the project directory because Git
saves information about files' history in the special `.git` directory
mentioned earlier so that our filesystem doesn't become cluttered (and so that
we can't accidentally edit or delete an old version).
  

## The Git Lifecycle

We have now seen the different states that files typically inhabit as Git
tracks them. The default file state is unmodified. Any time we make a change to
any of our files tracked by Git we will observe that they are listed as
modified. We must stage and then commit such changes to return the files to
their unmodified state.

The cycle of making changes to files, staging these changes, and then
committing them is continually repeated and our project continues to develop
with each file being represented in the Git repository as a combination of
committed changes. We will start working through such a cycle now by making
another edit to our analysis script. For now we'll just add a comment to
document the fact that we are using data from the the Open Neuroimaging
Laboratory at http://openneu.ro.

[The file lifecycle in git](../fig/git_workflow.png)
Figure from git-scm.com

>## Editing our script file
> If not already open in our text editor atom, we should now open it using the IPython `%edit` magic:
> > %edit generate_figure.py



Once we have finished editing our script we should observe something like the following:

In [None]:
cat generate_figure.py

At this point we will see that Git now views this as a modified file:

In [None]:
git status

In [None]:
We previously used "git add" to add an untracked file to the staging area. This
time we will use it to add a modified file to the staging area.

In [None]:
git add generate_figure.py

In [None]:
Finally to complete the Git life-cycle for this current change-set we will commit our staged changes:

In [None]:
git commit -m "add comment about Open Neuroimaging Laboratory"

In [None]:
We can use the "git status" and "git log" commands to confirm that an
additional commit is stored in the Git repository and no staged or unstaged
changes exist for the file generate_figure.py.

In [None]:
## What if I don't want Git to track some of my changes?
There are many reasons we might want git to overlook certain files or sub-
directories in our project.

 One such case is if our data contains Personally identifiable information
(PII). While Git helps us to share our code but we can't do this if we have
added PII to the Git repository. To help with this we can explicitly include a
directory in which we will add such data or perhaps even code so that we
prevent accidentally tracking such content. Let's create such a directory and
add our dataset to it so that we don't accidentally include things we don't
want to.

In [None]:
Path('data_not_in_repo').mkdir()


In [None]:
The metasearch directory is itself a git repository. We definitely don't want to track this. While this is sometimes something we might want to do, in our case it would be best to make sure that the metasearch directory remains untracked. To move this directory using the pathlib library is  a little cumbersome so we shall use the ipython `%mv` magic to do this:

In [None]:
 %mv metasearch data_not_in_repo


In [None]:
Now to make sure that git does not track this directory we add its name to a
file called .gitignore in our current directory:
 

In [None]:
Path('.gitignore').write_text('data_not_in_repo')

In [None]:
Now when we check the status we no longer see the metasearch directory as
untracked by the repository. Furthermore, we will not be able to add any of the
files in this directory into the git repository.


  

In [None]:
> ## Editing and Staging
> 
> We have made a data_not_in_repo directory and we have moved the metasearch
> directory into it. We want to stage these commands for subsequent commit
> Fill in the blank spaces in the code below to achieve this:
> 
> > %hist -n g data_not_in_repo
> %hist -n g metasearch
> %save -a generate_figure.py ____
> git add ____

{: .challenge}

In [None]:
> ##  Committing
> 
>  What commit message should we use for the changes we staged in the last
>  question? Have a think about and then choose one of the following for a subsequent commit:
>  
> 1. "Using pathlib"
> 2. "Create a data_not_in_repo directory to avoid tracking some files and move the metasearch directory here"
> 3. "Make and add to data_not_in_repo"
> 4. "Change metasearch.py"
> 
> 
> > ## Solution
> > 
> > Answer 1 is not descriptive enough. Answer 2 is too long and this wasn't a
> > particularly extensive change. While answer 4 could be considered useful in
> > some contexts it is answer 3 that strikes the balance well between being
> > concise and descriptive
> > 
> {: .solution}
{: .challenge}


[commit-messages]: http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html

In [None]:
# Keeping track of changes with Git

In [None]:
Questions:
- "How can I identify old versions of files?"
- "How do I review my changes?"
- "How can I recover old versions of files?"
objectives:
- "Explain what the HEAD of a repository is and how to use it."
- "Identify and use Git commit numbers."
- "Compare various versions of tracked files."
- "Restore old versions of files."
keypoints:
- "`git diff` displays differences between commits."
- "`git checkout` recovers old versions of files."
---

In [None]:
*   As we saw in the previous lesson, we can refer to commits by their
identifiers.
*  The most recent commit of the working directory can also be referred to by using the identifier
   `HEAD`.
* Every previous commit can in turn be referenced by adding the `~` symbol and incrementing the number
    * So the most recent commit is HEAD, the previous one is HEAD~1 (pronounced HEAD minus 1), and so on.
* The diff command allows us to observe differences between difference versions of our working directory

In [None]:
## View differences between files

To see the difference between our modified files and our last commit we can use

In [None]:
git diff HEAD 

In [None]:
We can specify a specific file for this:

In [None]:
git diff HEAD generate_figure.py

In [None]:
We can look for difference between to previous revisions too:

In [None]:
git diff HEAD~3 HEAD~2 generate_figure.py

In [None]:
We can also refer to commits using those long strings of digits and letters
that `git log` displays. These are unique IDs for the changes, and "unique"
really does mean unique: every change to any set of files on any computer has a
unique 40-character identifier. Our first commit was given the ID
XXX, so let's try this:

In [None]:
git diff XXX 

In [None]:
That's the right answer, but typing out random 40-character strings is
annoying, so Git lets us use just the first few characters:

In [None]:
git diff X

In [None]:
Finally, a convenient way to search all versions of a file at once for a particular string is to use the `-p` flag for the `git log` command:

In [None]:
log = git log -p
[x for x in log if 'clone' in x]

In [None]:
## Working with the history

All right! So we can save changes to files and see what we've changed—now how
can we restore older versions of things? Let's suppose we accidentally
delete our file:

In [None]:
%save metasearch_analysis 15 # Type Y to confirm

In [None]:
`git status` now tells us that the file has been changed,
but those changes haven't been staged:

In [None]:
git status

In [None]:
We can put things back the way they were by using `git checkout`:

In [None]:
git checkout HEAD generate_figure.py

In [None]:
As you might guess from its name, `git checkout` checks out (i.e., restores) an
old version of a file. In this case, we're telling Git that we want to recover
the version of the file recorded in `HEAD`, which is the last saved commit. If
we want to go back even further, we can use a commit identifier instead:

In [None]:
git checkout HEAD~3 generate_figure.py

In [None]:
> ## Don't Lose Your HEAD
> 
> Above we used
> 
> > $ git checkout HEAD~3 generate_figure.py

> 
> to revert generate_figure.py to its previous state. If you forget to
> specify the file in that command, Git will tell you that "You are in
> 'detached HEAD' state." In this state, you shouldn't make any changes. You
> can fix this by reattaching your head using ``git checkout master``

In [None]:
It's important to remember that we must use the commit number that identifies
the state of the repository *before* the change we're trying to undo. A common
mistake is to use the number of the commit in which we made the change we're
trying to get rid of. Git messages written in the imperative help with this. 

If we want to go back to a version of our file before we started working on a
new feature that has not worked out and caused lots of bugs the commit message
"Start adding excellent new feature" is probably the first one to avoid and we
should jump one step further back in our history.

In [None]:
So, to put it all together, here's how Git works in cartoon form:

[http://figshare.com/articles/How_Git_works_a_cartoon/1328266](../fig/fig/git_staging.svg)

## Other useful points to note

* Remember to keep building up history's with the git lifecycle. When you
  finally decide you really need to use git to recover some code of many months
  ago you'll be grateful for your diligence then.
* [Learngitbranching](http://learngitbranching.js.org) is a great place to learn more advanced manipulation in git.
* Many editors have plugins to extend the functionality. Once you are
  comfortable with the basics of Git, they can really improve the experience of
  using git. Frequently the best way to use the more obscure commands is to go
  back to the command line though. Many times the only straight-forward
  solution is to a problem you are having is to type an incantation to Git at
  the command line.

> ## Simplifying the Common Case
>
> If you read the output of `git status` carefully,
> you'll see that it includes this hint:
>
> > (use "git checkout -- <file>..." to discard changes in working directory)
> > {: .bash}
>
> As it says,
> `git checkout` without a version identifier restores files to the state saved in `HEAD`.
> The double dash `--` is needed to separate the names of the files being recovered
> from the command itself:
> without it,
> Git would try to use the name of the file as the commit identifier.

In [None]:
The fact that files can be reverted one by one
tends to change the way people organize their work.
If everything is in one large document,
it's hard (but not impossible) to undo changes to the introduction
without also undoing changes made later to the conclusion.
If the introduction and conclusion are stored in separate files,
on the other hand,
moving backward and forward in time becomes much easier.

> ## Recovering Older Versions of a File
>
> Jennifer has made changes to the Python script that she has been working on for weeks, and the
> modifications she made this morning "broke" the script and it no longer runs. She has spent
> ~ 1hr trying to fix it, with no luck...
>
> Luckily, she has been keeping track of her project's versions using Git! Which commands below will
> let her recover the last committed version of her Python script called
> `data_cruncher.py`?
>
> 1. `$ git checkout HEAD`
>
> 2. `$ git checkout HEAD data_cruncher.py`
>
> 3. `$ git checkout HEAD~1 data_cruncher.py`
>
> 4. `$ git checkout <unique ID of last commit> data_cruncher.py`
>
> 5. Both 2 and 4
{: .challenge}

In [None]:
> ## Reverting a Commit
>
> Jennifer is collaborating on her Python script with her colleagues and
> realises her last commit to the group repository is wrong and wants to
> undo it.  Jennifer needs to undo correctly so everyone in the group
> repository gets the correct change.  `git revert [wrong commit ID]`
> will make a new commit that undoes Jennifer's previous wrong
> commit. Therefore `git revert` is different than `git checkout [commit
> ID]` because `checkout` is for local changes not committed to the
> group repository.  Below are the right steps and explanations for
> Jennifer to use `git revert`, what is the missing command?
>
> 1. ________ # Look at the git history of the project to find the commit ID
>
> 2. Copy the ID (the first few characters of the ID, e.g. 0b1d055).
>
> 3. `git revert [commit ID]`
>
> 4. Type in the new commit message.
>
> 5. Save and close
{: .challenge}

In [None]:
> ## Understanding Workflow and History
>
> What is the output of cat venus.txt at the end of this set of commands?
>
> > $ cd planets
> $ nano venus.txt #input the following text: Venus is beautiful and full of love
> $ git add venus.txt
> $ nano venus.txt #add the following text: Venus is too hot to be suitable as a base
> $ git commit -m "comments on Venus as an unsuitable base"
> $ git checkout HEAD venus.txt
> $ cat venus.txt #this will print the contents of venus.txt to the screen
> > {: .bash}
>
> 1.
>
> > Venus is too hot to be suitable as a base

>
> 2.
>
> > Venus is beautiful and full of love

>
> 3.
>
> > Venus is beautiful and full of love
> Venus is too hot to be suitable as a base

>
> 4.
>
> > Error because you have changed venus.txt without committing the changes

{: .challenge}

In [None]:
> ## Checking Understanding of `git diff`
>
> Consider this command: `git diff HEAD~3 mars.txt`. What do you predict this command
> will do if you execute it? What happens when you do execute it? Why?
>
> Try another command, `git diff [ID] mars.txt`, where [ID] is replaced with
> the unique identifier for your most recent commit. What do you think will happen,
> and what does happen?
{: .challenge}

In [None]:
> ## Getting Rid of Staged Changes
>
> `git checkout` can be used to restore a previous commit when unstaged changes have
> been made, but will it also work for changes that have been staged but not committed?
> Make a change to `mars.txt`, add that change, and use `git checkout` to see if
> you can remove your change.
{: .challenge}

In [None]:
> ## Explore and Summarize Histories
>
> Exploring history is an important part of git, often it is a challenge to find
> the right commit ID, especially if the commit is from several months ago.
>
> Imaging the `planets` project has more than 50 files.
> You would like to find a commit with specific text in `mars.txt` is modified.
> When you type `git log`, a very long list appeared,
> How can you narrow down the search?
>
> Recorded that the `git diff` command allow us to explore one specific file,
> e.g. `git diff mars.txt`. We can apply the similar idea here.
>
> > $ git log mars.txt
> > {: .bash}
>
> Unfortunately some of these commit messages are very ambiguous e.g. `update files`.
> How can you search through these files?
>
> Both `git diff` and `git log` are very useful and they summarize different part of the history for you.
> Is that possible to combine both? Let's try the following:
>
> > $ git log --patch mars.txt
> > {: .bash}
>
> You should get a long list of output, and you should be able to see both commit messages and the difference between each commit.
>
> Question: What does the following command do?
>
> > $ git log --patch HEAD~3 HEAD~1 *.txt
> > {: .bash}
{: .challenge}

In [None]:
# Collaboration with Git and GitHub

In [None]:
#### Questions:
- "How do I share my changes with others on the web?"
- "How can I use version control to collaborate with other people?"
- "What do I do when my changes conflict with someone else's?"

In [None]:
#### Objectives:
- "Explain what remote repositories are and why they are useful."
- "Push to or pull from a remote repository."
- "Clone a remote repository."
- "Collaborate pushing to a common repository."
- "Explain what conflicts are and when they can occur."
- "Resolve conflicts resulting from a merge."

In [None]:
#### Keypoints:
- "A local Git repository can be connected to one or more remote repositories."
- "Use the HTTPS protocol to connect to remote repositories until you have learned how to set up SSH."
- "`git push` copies changes from a local repository to a remote repository."
- "`git pull` copies changes from a remote repository to a local repository."
- "`git clone` copies a remote repository to create a local repository with a remote called `origin` automatically set up."
- "Conflicts occur when two or more people change the same file(s) at the same time."
- "The version control system does not allow people to overwrite each other's changes blindly, but highlights conflicts so that they can be resolved."

In [None]:
Version control really comes into its own when we begin to collaborate with
other people.  We already have most of the machinery we need to do this; the
only thing missing is to copy changes from one repository to another.

Systems like Git allow us to move work between any two repositories.  In
practice, though, it's easiest to use one copy as a central hub, and to keep it
on the web rather than on someone's laptop.  Most programmers use hosting
services like [GitHub](http://github.com), [BitBucket](http://bitbucket.org) or
[GitLab](http://gitlab.com/) to hold those master copies; we'll explore the pros
and cons of this in the final section of this lesson.

In [None]:
Let's start by sharing the changes we've made to our current project with the
world.  Log in to GitHub, then click on the icon in the top right corner to
create a new repository called `repro_course`:


* Name your repository "repro_course" and then click "Create Repository":

* This effectively makes a directory with a `.git` repository in it.

* As soon as the repository is created, GitHub displays a page with a URL and some
information on how to configure your local repository:

In [None]:
Our local repository still contains our earlier work on `generate_figure.py`, but the
remote repository on GitHub doesn't contain any files yet:

The next step is to connect the two repositories.  We do this by making the
GitHub repository a [remote]({{ page.root }}/reference/#remote) for the local repository.
The home page of the repository on GitHub includes the string we need to
identify it.

Click on the 'HTTPS' link to change the [protocol]({{ page.root }}/reference/#protocol) from
SSH to HTTPS.

In [None]:
> ## HTTPS vs. SSH
> 
> We use HTTPS here because it does not require additional configuration.  After
> the workshop you may want to set up SSH access, which is a bit more secure, by
> following one of the great tutorials from
> [GitHub](https://help.github.com/articles/generating-ssh-keys),
> [Atlassian/BitBucket](https://confluence.atlassian.com/display/BITBUCKET/Set+up+SSH+for+Git)
> and [GitLab](https://about.gitlab.com/2014/03/04/add-ssh-key-screencast/)
> (this one has a screencast).

In [None]:
Copy that URL from the browser, go into the local `repro_course` repository, and run
this command:

In [None]:
git remote add origin https://github.com/github-name/repro_course.git

In [None]:
Make sure to use the URL for your repository i.e. the only
difference should be your username instead of `github-name`.

We can check that the command has worked by running `git remote -v`:

In [None]:
git remote -v

In [None]:
The name `origin` is a local nickname for your remote repository: we could use
something else if we wanted to, but `origin` is by far the most common choice.

Once the nickname `origin` is set up, this command will push the changes from
our local repository to the repository on GitHub:

In [None]:
git push origin master

In [None]:
> ## Proxy
> 
> If the network you are connected to uses a proxy there is an chance that your
> last command failed with "Could not resolve hostname" as the error message. To
> solve this issue you need to tell Git about the proxy:
> 
> > git config --global http.proxy http://user:password@proxy.url
> git config --global https.proxy http://user:password@proxy.url

> 
> When you connect to another network that doesn't use a proxy you will need to
> tell Git to disable the proxy using:
> 
> > git config --global --unset http.proxy
> git config --global --unset https.proxy


In [None]:
> ## Password Managers
> 
> If your operating system has a password manager configured, `git push` will
> try to use it when it needs your username and password.  For example, this
> is the default behavior for Git Bash on Windows. If you want to type your
> username and password at the terminal instead of using a password manager,
> type:
> 
> > unset SSH_ASKPASS

> 
> in the terminal, before you run `git push`.  Despite the name, [git uses
> `SSH_ASKPASS` for all credential
> entry](http://git-scm.com/docs/gitcredentials#_requesting_credentials), so
> you may want to unset `SSH_ASKPASS` whether you are using git via SSH or
> https.
> 
> You may also want to add `unset SSH_ASKPASS` at the end of your `~/.bashrc`
> to make git default to using the terminal for usernames and passwords.

In [None]:
> ## The '-u' Flag
> 
> You may see a `-u` option used with `git push` in some documentation.  It is
> related to concepts we cover in our intermediate lesson, and can safely be
> ignored for now.

In [None]:
We can pull changes from the remote repository to the local one as well:

In [None]:
git pull origin master

In [None]:
Pulling has no effect in this case because the two repositories are already
synchronized.  If someone else had pushed some changes to the repository on
GitHub, though, this command would download them to our local repository.

In [None]:
## Collaboration with git

In [None]:
For the next step, get into pairs.  One person will be the "Owner" and the other
will be the "Collaborator". The goal is that the Collaborator add changes into
the Owner's repository. We will switch roles at the end, so both persons will
play Owner and Collaborator.

In [None]:
> ## Practicing By Yourself
> 
> If you're working through this lesson on your own, you can carry on by opening
> a second terminal window.
> This window will represent your partner, working on another computer. You
> won't need to give anyone access on GitHub, because both 'partners' are you.


The Owner needs to give the Collaborator access.
On GitHub, click the settings button on the right,
then select Collaborators, and enter your partner's username.

[Adding Collaborators on GitHub](../fig/fig/github-add-collaborators.png)

In [None]:
To accept access to the Owner's repo, the Collaborator
needs to go to [https://github.com/notifications](https://github.com/notifications).
Once there she can accept access to the Owner's repo.

Next, the Collaborator needs to download a copy of the Owner's repository to
 her machine. This is called "cloning a repo". We'll clone it to a directory
 called github_collaboration in our home directory (replacing 'username' with
 the Owner's username):

In [None]:
%cd
%mkdir github_collaboration
git clone https://github.com/username/repro_course.git ~/github_collaboration
%cd github_collaboration

In [None]:
Open generate_figure.py in an editor and add a comment. Stage and commit
the comment.

In [None]:
git add generate_figure.py
git commit -m "test the powers of collaboration"

In [None]:
Then push the change to the *Owner's repository* on GitHub:

In [None]:
git push origin master

In [None]:
Note that we didn't have to create a remote called `origin`: Git uses this
name by default when we clone a repository.  (This is why `origin` was a
sensible choice earlier when we were setting up remotes by hand.)

Take a look to the Owner's repository on its GitHub website now (maybe you need
to refresh your browser.) You should be able to see the new commit made by the
Collaborator.

To download the Collaborator's changes from GitHub, the Owner now enters:

In [None]:
git pull origin master

In [None]:
Now the three repositories (Owner's local, Collaborator's local, and Owner's on
GitHub) are back in sync.

In [None]:
> ## A Basic Collaborative Workflow
> 
> In practice, it is good to be sure that you have an updated version of the
> repository you are collaborating on, so you should `git pull` before making
> our changes. The basic collaborative workflow would be:
> 
> * update your local repo with `git pull origin master`,
> * make your changes and stage them with `git add`,
> * commit your changes with `git commit -m`, and
> * upload the changes to GitHub with `git push origin master`
> 
> It is better to make many commits with smaller changes rather than
> of one commit with massive changes: small commits are easier to
> read and review.

In [None]:
## Dealing with conflict

In [None]:
As soon as people can work in parallel, it's likely someone's going to step on someone
else's toes.  This will even happen with a single person: if we are working on
a piece of software on both our laptop and a server in the lab, we could make
different changes to each copy.  Version control helps us manage these
[conflicts]({{ page.root }}/reference/#conflicts) by giving us tools to
[resolve]({{ page.root }}/reference/#resolve) overlapping changes.

In [None]:
To see how we can resolve conflicts, we must first create one.  The file
`generate_figure.py` currently looks like this in both partners' copies of our `repro_course`
repository:

In [None]:
%less generate_figure.py

In [None]:
* Let's add a line to one partner's copy only:

In [None]:
%edit generate_figure.py
%less generate_figure.py

In [None]:
and then push the change to GitHub:

In [None]:
git add generate_figure.py
git commit -m "Adding a line in our home copy"

In [None]:
git push origin master

In [None]:
Now let's have the other partner
make a different change to their copy
*without* updating from GitHub:

In [None]:
%edit generate_figure.py
%less generate_figure.py

In [None]:
We can commit the change locally:

In [None]:
git add generate_figure.py
git commit -m "Adding a line in the second local copy"

In [None]:
but Git won't let us push it to GitHub:

In [None]:
git push origin master

In [None]:
[The Conflicting Changes](../fig/fig/conflict.svg)

In [None]:
Git detects that the changes made in one copy overlap with those made in the
other and stops us from trampling on our previous work. What we have to do is
pull the changes from GitHub, [merge]({{ page.root }}/reference/#merge) them
into the copy we're currently working in, and then push that. Let's start by
pulling:

In [None]:
git pull origin master

In [None]:
`git pull` tells us there's a conflict, and marks that conflict in the affected
file.

Our change—the one in `HEAD`—is preceded by `<<<<<<<`. Git has then inserted
`=======` as a separator between the conflicting changes and marked the end of
the content downloaded from GitHub with `>>>>>>>`. (The string of letters and
digits after that marker identifies the commit we've just downloaded.)

It is now up to us to edit this file to remove these markers and reconcile the
changes. We can do anything we want: keep the change made in the local
repository, keep the change made in the remote repository, write something new
to replace both, or get rid of the change entirely. Let's replace both with a
comment stating that we resolved our first of many git conflicts.

In [None]:
%less generate_figure.py

In [None]:
To finish merging, we add `generate_figure.py` to the changes being made by
the merge and then commit:

In [None]:
git add generate_figure.py
git status

In [None]:
git commit -m "Merging changes from GitHub"

In [None]:
Now we can push our changes to GitHub:

In [None]:
git push origin master

In [None]:
Git keeps track of what we've merged with what, so we don't have to fix things
by hand again when the collaborator who made the first change pulls again:

In [None]:
git pull origin master

In [None]:
We get the merged file:

In [None]:
%less generate_figure.py

In [None]:
We don't need to merge again because Git knows someone has already done that.

Version control's ability to merge conflicting changes is another reason users
tend to divide their programs and papers into multiple files instead of storing
everything in one large file. There's another benefit too: whenever there are
repeated conflicts in a particular file, the version control system is
essentially trying to tell its users that they ought to clarify who's
responsible for what, or find a way to divide the work up differently.

In [None]:
> ## GitHub GUI
> 
> Browse to your `repro_course` repository on GitHub.
> Under the Code tab, find and click on the text that says "XX commits" (where "XX" is some number).
> Hover over, and click on, the three buttons to the right of each commit.
> What information can you gather/explore from these buttons?
> How would you get that same information in the shell?
{: .challenge}

In [None]:
> ## GitHub Timestamp
> 
> Create a remote repository on GitHub.  Push the contents of your local
> repository to the remote.  Make changes to your local repository and push
> these changes.  Go to the repo you just created on Github and check the
> [timestamps]({{ page.root }}/reference/#timestamp) of the files.  How does GitHub record
> times, and why?
{: .challenge}

In [None]:
> ## Push vs. Commit
> 
> In this lesson, we introduced the "git push" command.
> How is "git push" different from "git commit"?
{: .challenge}

In [None]:
> ## Fixing Remote Settings
> 
> It happens quite often in practice that you made a typo in the
> remote URL. This exercice is about how to fix this kind of issues.
> First start by adding a remote with an invalid URL:
> 
> > git remote add broken https://github.com/this/url/is/invalid

> 
> Do you get an error when adding the remote? Can you think of a
> command that would make it obvious that your remote URL was not
> valid? Can you figure out how to fix the URL (tip: use `git remote
> -h`)? Don't forget to clean up and remove this remote once you are
> done with this exercise.
{: .challenge}

In [None]:
> ## GitHub License and README files
> 
> In this section we learned about creating a remote repository on GitHub, but when you initialized your
> GitHub repo, you didn't add a README.md or a license file. If you had, what do you think would have happened when
> you tried to link your local and remote repositories?
{: .challenge}

In [None]:
> ## Switch Roles and Repeat
> 
> Switch roles and repeat the whole process.
{: .challenge}

In [None]:
> ## Review Changes
> 
> The Owner push commits to the repository without giving any information
> to the Collaborator. How can the Collaborator find out what has changed with
> command line? And on GitHub?
{: .challenge}

In [None]:
> ## Comment Changes in GitHub
> 
> The Collaborator has some questions about one line change made by the Owner and
> has some suggestions to propose.
> 
> With GitHub, it is possible to comment the diff of a commit. Over the line of
> code to comment, a blue comment icon appears to open a comment window.
> 
> The Collaborator posts its comments and suggestions using GitHub interface.
{: .challenge}

In [None]:
> ## Version History, Backup, and Version Control
> 
> Some backup software can keep a history of the versions of your files. They also
> allows you to recover specific versions. How is this functionality different from version control?
> What are some of the benifits of using version control, Git and Github?
{: .challenge}

In [None]:
> ## Solving Conflicts that You Create
> 
> Clone the repository created by your instructor.
> Add a new file to it,
> and modify an existing file (your instructor will tell you which one).
> When asked by your instructor,
> pull her changes from the repository to create a conflict,
> then resolve it.
{: .challenge}

In [None]:
> ## Conflicts on Non-textual files
> 
> What does Git do
> when there is a conflict in an image or some other non-textual file
> that is stored in version control?
{: .challenge}

> ## A Typical Work Session
> 
> You sit down at your computer to work on a shared project that is tracked in a
> remote Git repository. During your work session, you take the following
> actions, but not in this order:
> 
> - *Make changes* by appending the number `100` to a text file `numbers.txt`
> - *Update remote* repository to match the local repository
> - *Celebrate* your success with beer(s)
> - *Update local* repository to match the remote repository
> - *Stage changes* to be committed
> - *Commit changes* to the local repository
> 
> In what order should you perform these actions to minimize the chances of
> conflicts? Put the commands above in order in the *action* column of the table
> below. When you have the order right, see if you can write the corresponding
> commands in the *command* column. A few steps are populated to get you
> started.
> 
> |order|action . . . . . . . . . . |command . . . . . . . . . . |
> |-----|---------------------------|----------------------------|
> |1    |                           |                            |
> |2    |                           | `echo 100 >> numbers.txt`  |
> |3    |                           |                            |
> |4    |                           |                            |
> |5    |                           |                            |
> |6    | Celebrate!                | `AFK`                      |
{: .challenge}