# Intro to Git and GitHub

### Humna Awan
#### Society of Physics Students, Rutgers University
#### April 16, 2019

# Git: the tool you didn't know you needed

## Git
A version control software

## GitHub
Cloud based service for version control via Git.

#### Sources of this material:
This tutorial is adapted from a tutorial presented by Jake Vanderplas in the [LSST Data Science Fellowship Program](https://github.com/LSSTC-DSFP/LSSTC-DSFP-Sessions/tree/master/Session1 ).

## Lets cover some basics before dealing with Git and GitHub directly

## What is Version Control?

We all implement it, e.g.,
* paper_draft_v1.pdf
* paper_draft_v2.pdf
* paper_draft_v2a.pdf
* paper_draft_v2a_1.pdf
...


and we all know the perils of that kind of file-handling.

## What is Version Control?

#### From Wikipedia:
“Revision control, also known as version control, source control
or software configuration management (SCM), is the
**management of changes to documents, programs, and other information stored as computer files.**”

#### Reproducibility?

* Tracking and recreating every step of your work
* In the software world: it's called *Version Control*!

What do (good) version control tools give you?

* Peace of mind (backups)
* Freedom (exploratory branching)
* Collaboration (synchronization)


## Git is an enabling technology: use version control for everything
* Code management
* Custom style files and macros
* Collaborative writing (group projects, proposals)
* Everyday research
* Personal website history tracking

## The plan for this tutorial

- Overview of Git key concepts

- Hands-on work with Git, primarily with a use of GitHub as a remote

## Git usage

- 5 "stages" of using Git:
            
    1. Local, single-user, linear workflow
    2. Single local user, branching
    3. Using remotes as a single user
    4. Remotes for collaborating in a small team
    5. Full-contact github: distributed collaboration with large teams
        

## My advice

- Use GitHub as a remote, _always_ (unless you want to set up your own remote). 
    - Allows you to develop as you choose your workflow (linear or with branching)
    - Allows for small and large collaborations


## High level picture: overview of key concepts

The **commit**: *a snapshot of work at a point in time*

![](files/images/commit_anatomy.png)




Credit: ProGit book, by Scott Chacon, CC License.

## High level picture: overview of key concepts

A **repository**: a group of *linked* commits

![](files/images/threecommits.png)




Note: these form a Directed Acyclic Graph (DAG), with nodes identified by their *hash*.

## High level picture: overview of key concepts

A **hash**: a fingerprint of the content of each commit *and its parent*

In [None]:
import hashlib

# Our first commit
data1 = 'This is the start of my paper.'
meta1 = 'date: 1/1/12'
commit1 = '%s%s'%(data1, meta1)
hash1 = hashlib.sha1(commit1.encode('utf-8')).hexdigest()

print('Hash:', hash1)

In [None]:
# Our second commit, linked to the first
data2 = 'Added content my paper ...'
meta2 = 'date: 1/2/12'
# Note we add the parent hash here!
commit2 = '%s%s%s'%(data1, meta2, hash1)
hash2 = hashlib.sha1(commit2.encode('utf-8')).hexdigest()

print('Hash:', hash2)

And this is pretty much the essence of Git!

## Lets Get To It


#### Lets first get you a version of this notebook
- Please go [my GitHub repository containing this tutorial](https://github.com/humnaawan/git-tutorial )
- Click on the `clone or download` button.
- Click on `Download as zip` button.
- Open the ipython notebook in the folder you just downloaded.

## Git and GitHub: Access

Lets make sure that we have the necessary tools at hand:

* [Install git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git ) on your laptop if you don't already have it.
    - If you don't know if you have git, run `git --version` in your terminal. If it runs, then you're good to go.
    
    
* [Sign up for a GitHub account](https://github.com/ ) if you don't already have it.
    - You might want  to use your `.edu` email, as then you can get student access to unlimited repositories (more soon).

## Git and GitHub: Access

* [Set up SSH connection to your GitHub account](https://help.github.com/en/articles/connecting-to-github-with-ssh )

First check if you have a publich SSH (when running this in your terminal, run without the ! at the beginning)

In [None]:
!ls -al ~/.ssh | grep .pub

- If you dont have a public SSH key, please [generate one and add it to the ssh agent](https://help.github.com/en/articles/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent )

Then, [add the SSH key to your GitHub account](https://help.github.com/en/articles/adding-a-new-ssh-key-to-your-github-account )

## Configuring Git

The minimal amount of configuration for git to work without pestering you is to tell it who you are.  All the commands here modify the ``.gitconfig`` file in your home
directory.

Lets first check the contents of the config file (again, when running this in your terminal, run without the ! at the beginning):

In [None]:
!cat ~/.gitconfig

If the `.gitconfig` file doesn't have your info, please modify it (again, when running these in your terminal, run without the ! at the beginning):

In [None]:
%%bash
# you'll need to comment the next two lines out if you're running in this ipython notebook
#git config --global user.name "John Doe"
#git config --global user.email "johndoe@uw.edu"

Check the config file again to be sure (again, without the ! at the beginning if running in your terminal):

In [None]:
!cat ~/.gitconfig

Github offers in its help pages instructions on how to configure the credentials helper for [Mac OSX](https://help.github.com/articles/set-up-git#platform-mac) and [Windows](https://help.github.com/articles/set-up-git#platform-windows).

## Stage 1: Local, single-user, linear workflow
Simply type `git` to see a full list of all the 'core' commands.  We'll now go through most of these via small practical exercises:

In [None]:
!git

### `git init`: create an empty repository

In [None]:
%%bash
rm -rf test
git init test

**Note:** if you're running the upcoming cells in the terminal, you need to cd into the `test` folder only once.

Since we are putting all of them here in a single notebook for the purposes of the tutorial, they will all be prepended with the first two lines:

    %%bash
    cd test

that tell iPython to do that each time.  But you should ignore those two lines and type the rest of each cell yourself in your terminal.

Let's look at what git did:

In [None]:
%%bash
cd test

ls

In [None]:
%%bash
cd test

ls -la

In [None]:
%%bash
cd test

ls -l .git

Now let's create our first file in the test directory with a text editor ... here, we are doing it programatically, but you'd normally be editing by hand.

In [None]:
%%bash
cd test

echo "My first bit of text" > file1.txt

In [None]:
%%bash
cd test

ls -al

### `git status`: see what git notices

In [None]:
%%bash
cd test

git status

### `git add`: tell git about our new file

In [None]:
%%bash
cd test

git add file1.txt

Lets check the `status` again

In [None]:
%%bash
cd test

git status

### `git commit`:  record our changes in git's database

In [None]:
%%bash
cd test

git commit -m "This is our first commit"

In the commit above, we  used the `-m` flag to specify a message at the command line.

If we don't do that, git will open the editor we specified in our configuration above and require that we enter a message.

By default, git refuses to record changes that don't have a message to go along with them (though you can obviously 'cheat' by using an empty or meaningless string: git only tries to facilitate best practices, it's not your nanny).

Lets check the `status` again

In [None]:
%%bash
cd test

git status

### `git log`: what has been committed so far

In [None]:
%%bash
cd test

git log

### `git diff`: what have I changed?
Let's do a little bit more work. Again, in practice you'll be editing the files by hand, here we do it via shell commands for the sake of automation (and therefore the reproducibility of this tutorial!)

In [None]:
%%bash
cd test

echo "And now some more text..." >> file1.txt

And now we can ask git what is different:

In [None]:
%%bash
cd test

git diff

Lets check the `status` again

In [None]:
%%bash
cd test

git status

### The cycle of git virtue: work, add, commit, work,  add, commit, ...

- Work
- Check status (and diff for more details)
- Add file(s)
- Commit staged file(s)
- Repeat

In [None]:
%%bash
cd test

git add 'file1.txt'
git commit -m "I have made great progress on this critical matter."

### `git log` revisited
First, let's see what the log shows us now:

In [None]:
%%bash
cd test

git log

Sometimes it's handy to see a very summarized version of the log:

In [None]:
%%bash
cd test

git log --oneline --topo-order --graph

### Defining an alias

Git supports *aliases:* new names given to command combinations. Let's make this handy shortlog an alias, so we only have to type `git slog` and see this compact log:

In [None]:
%%bash
cd test

# We create our alias (this saves it in git's permanent configuration file):
git config --global alias.slog "log --oneline --topo-order --graph"

# And now we can use it
git slog

### `git mv` and `rm`: moving and removing files
While `git add` is used to add fils to the list git tracks, we must also tell it if we want their  names to change or for it to stop tracking them.  In familiar Unix fashion, the `mv` and `rm` git commands do precisely this:

In [None]:
%%bash
cd test

git mv file1.txt file-newname.txt
git status

Note that these changes must be committed too, to become permanent!  In git's world, until something hasn't been committed, it isn't permanently recorded anywhere.

In [None]:
%%bash
cd test

git add 'file-newname.txt'
git commit -m "I like this new name better"
echo "Let's look at the log again:"
git slog

And `git rm` works in a similar fashion.

### Exercise
Add a new file `file2.txt`, add and commit it, make some changes to it, add and commit them again, and then remove it (and don't forget to add/commit this last step!).

## 2. Single Local user, branching
What is a branch?  Simply a *label for the 'current' commit in a sequence of ongoing commits*:

![](files/images/masterbranch.png)

### Mulitple Branches
There can be multiple branches alive at any point in time; the working directory is the state of a special pointer called HEAD.  In this example there are two branches, *master* and *testing*, and *testing* is the currently active branch since it's what HEAD points to:

![](files/images/HEAD_testing.png)

Once new commits are made on a branch, HEAD and the branch label move with the new commits:

![](files/images/branchcommit.png)

This allows the history of both branches to diverge:

![](files/images/mergescenario.png)

But based on this graph structure, git can compute the necessary information to merge the divergent branches back and continue with a unified line of development:
    
![](files/images/mergeaftermath.png)

### Branching Example

Let's now illustrate all of this with a concrete example.  Let's get our bearings first:

In [None]:
%%bash
cd test

git status
ls

We are now going to try two different routes of development: on the `master` branch we will add one file and on the `experiment` branch, which we will create, we will add a different one.  We will then merge the experimental branch into `master`.

Lets first see what branch(es) exist already

In [None]:
%%bash
cd test

git branch --list

The * points out the branch you're on

### Lets create a new branch and see what changes

In [None]:
%%bash
cd test

git branch experiment
git checkout experiment

In [None]:
%%bash
cd test

git status
ls

Notice how we are no longer on the `master` branch.

We can also check the branch list again

In [None]:
%%bash
cd test

git branch --list

### Lets now add content while we are on the `experiment` branch; add and commit it

In [None]:
%%bash
cd test

# add file
echo "Some crazy idea" > experiment.txt

# add and committ
git add 'experiment.txt'
git commit -m "Trying something new"

# lets see the log
git slog

Lets see what files exist

In [None]:
%%bash
cd test

ls

### Lets now go back to `master`

In [None]:
%%bash
cd test

git checkout master

# lets see the log
git slog

and add to it

In [None]:
%%bash
cd test

# new file
echo "All the while, more work goes on in master..." >> file-newname.txt

# add and commit
git add 'file-newname.txt'
git commit -m "The mainline keeps moving"

# check the log
git slog

Lets see what files exist

In [None]:
%%bash
cd test

ls

### Lets `merge` the two branches now so that all the work is in `master`

In [None]:
%%bash
cd test

git merge experiment
git slog

Once you're done with a branch, you can delete it (e.g. by running `git branch -d <branch-name>`)

## 3. Using remotes as a single user

Lets now connect our local repository to a *remote repository*: a pointer to another copy of the repository that lives on a different location (e.g. on GitHub).

### ``git remote``: view/modify remote repositories

In [None]:
%%bash
cd test

ls
echo "Let's see if we have any remote repositories here:"
git remote -v

Since the above cell didn't produce any output after the `git remote -v` call, it means we have no remote repositories configured.

### Configuring a remote
Go to the [new repository page](https://github.com/new) and make a repository called `test` (stick with the default of a Public repository).

Do **not** check the box that says `Initialize this repository with a README`, since we already have an existing repository locally.  That option is useful when you're starting first at Github and don't have a repo made already on a local computer.


We can now follow the instructions mentioned in the repo page, and add the remote to our local repo:

In [None]:
%%bash
cd test

git remote add origin git@github.com:humnaawan/test.git

Let's see the remote situation again:

In [None]:
%%bash
cd test

git remote -v

### Pushing changes to a remote repository

Now push the ``master`` branch to the remote named ``origin``:

In [None]:
%%bash
cd test

git push origin master

We can now [see this repository publicly on github](https://github.com/humnaawan/test).

### Using Git to Sync Work

Let's see how this can be useful for backup and syncing work between two different computers.  I'll simulate a 2nd computer by working in a different directory...

In [None]:
%%bash

# here I clone my 'test' repo but with a different name, test2, to simulate a 2nd computer
git clone https://github.com/humnaawan/test.git test2
cd test2
pwd
git remote -v

Let's now make some changes in one 'computer' and synchronize them on the second.

In [None]:
%%bash
cd test2  # working on computer #2

echo "More new content on my experiment" >> experiment.txt
git add 'experiment.txt'
git commit -m "More work, on machine #2"

Now we put this new work up on the github server so it's available from the internet

In [None]:
%%bash
cd test2

git push origin master

### Now let's fetch that work from machine #1:

In [None]:
%%bash
cd test

git pull origin master

## An important aside: conflict management
While git is very good at merging, if two different branches modify the same file in the same location, it simply can't decide which change should prevail.  At that point, human intervention is necessary to make the decision.  Git will help you by marking the location in the file that has a problem, but it's up to you to resolve the conflict.  Let's see how that works by intentionally creating a conflict.

We start by creating a branch and making a change to our experiment file:

In [None]:
%%bash
cd test

# create a new branch
git branch trouble
git checkout trouble

# create a file
echo "This is going to be a problem..." >> experiment.txt

# add and commit
git add 'experiment.txt'
git commit -m "Changes in the trouble branch"

And now we go back to the master branch, where we change the *same* file:

In [None]:
%%bash
cd test

git checkout master

echo "More work on the master branch..." >> experiment.txt

git add 'experiment.txt'
git commit -m "Mainline work"

In [None]:
%%bash
cd test

git status

### The conflict...

So now let's see what happens if we try to merge the `trouble` branch into `master`:

In [None]:
%%bash
cd test

git merge trouble

In [None]:
%%bash
cd test

git status

### Let's see what git has put into our file:

In [None]:
%%bash
cd test

cat experiment.txt

At this point, we go into the file with a text editor, decide which changes to keep, and make a new commit that records our decision.  I've now made the edits, in this case I decided that both pieces of text were useful, but integrated them with some changes:

In [None]:
%%bash
cd test

cat experiment.txt

Let's then make our new commit:

In [None]:
%%bash
cd test

git add 'experiment.txt'
git commit -m "Completed merge of trouble, fixing conflicts along the way"

# lets check the log now
git slog

### Merge Tools

*Note:* While it's a good idea to understand the basics of fixing merge conflicts by hand, in some cases you may find the use of an automated tool useful.

Git supports multiple [merge tools](https://www.kernel.org/pub/software/scm/git/docs/git-mergetool.html): a merge tool is a piece of software that conforms to a basic interface and knows how to merge two files into a new one.  Since these are typically graphical tools, there are various to choose from for the different operating systems, and as long as they obey a basic command structure, git can work with any of them.

## 4. Collaborating on github with a small team

Here we will set up a shared collaboration with one partner -- choose someone sitting next to you.  We will have two people, let's call them P1 and P2.

### 1. Synchronization

We begin with a simple synchronization example.  Working together, follow these steps:

#### Creating a new repository
- P1: create a new repository on github called ``p1-test``
- P1: create a file ``README.md`` in your `p1-test` repository, add/commit it, and push it to the remote.
- P1: on github, go to the settings for ``p1-test``, and add your partner (P2) to the list of collaborators

#### Cloning your partner's repository
- P2: clone the P1s ``p1-test`` repository using ``git clone [url]``
- P2: makes changes to the ``README.md`` file and add/commit locally.
- P2: push changes to github.
- P1: pull P1's changes to the local repository.

Now P1 and P2 should both have the same ``README.md`` file on their own computer. Repeat with the roles swapped.

### 2. Dealing with conflicts

Next, we will have both parties make non-conflicting changes each, and commit them locally.  Then both try to push their changes:

- P1: create & add/commit a new file, ``p1.txt`` to the local repo
- P2: create & add/commit a new file, ``p2.txt`` to the local repo
- P1: push the latest commit to github
- P2: try to push to github.  What happens?

The problem is that P2's changes create a commit that conflicts with P1's, so git refuses to apply them.

P2 must do

`git pull origin master`
    
And then deal with the conflict manually, then push again.

## 5. Full-contact github: distributed collaboration with large teams

On large teams, you don't always want all contributors to have access to the main repository.  So how do you move forward?  Using Pull Requests.

Again, we'll do this as an exercise with your partner:

We'll practice this here, by having P1 now *fork* P2's repository.

2. **P1:** go to P2's github page for the `p2-test` repo,  and click the *fork* button.  You now have your own remote version of the repository, that looks like ``http://github.com/P1/p2-test.git``

3. **P1:** use ``git clone [url]`` to get a local version of *your fork* on your own computer. 

4. **P1:** use ``git remote add upstream [url]`` to add a pointer to P2's remote (the original)

5. **P1:** type ``git remote -v``, and you should see both your own fork (called ``origin``) and P2's fork (called ``upstream``)

6. **P1:** create a new branch called ``p1_changes``

7. **P1:** add a file called ``p1.txt``, add/commit, and use ``git push origin p1_changes`` to push to the remote.

8. **P1:** reload the github page for *your own* fork: there should now be a button that says "compare and pull request".  Click it and fill it out.

9. **P2:** go to your own notifications page (the blue circle in the upper-left of GitHub) and you should see a notification of P1's Pull request.  Check the diff, add some comments, and merge the changes.

10. **P1:** on your computer, checkout the master branch, and update it from P2's fork with ``git pull upstream master``

Congratulations!  You're now a collaborator!

This is how virtually all open source collaboration proceeds on Github!

## Further resources

- See Jake Vanderplas's [tutorial notebook](https://github.com/jakevdp/git-intro/blob/master/git-intro.ipynb ) for additional material/references; I've adapted the notebook for this tutorial.
- Phil Marshall's [gettingStarted repo](https://github.com/drphilmarshall/GettingStarted ); there are very helpful Q/A + a video tutorial of some of the basics.