# Intro to Git and GitHub


$~~~~~~~~~~~~~~~~~~~~~~$

#### DESI Pre-Meeting, Session III
#### Friday, June 11, 2021

$~~~~~~~~~~~~~~~~~~~~~~$

_Tutorial developed by Humna Awan; adapted from [a tutorial presented by Jake Vanderplas in the LSST Data Science Fellowship Program](https://github.com/jakevdp/git-intro/blob/master/git-intro.ipynb) as well as materials from [Phil Marshall's gettingStarted repo](https://github.com/drphilmarshall/GettingStarted )._

## Git: the tool you didn't know you needed

Lets cover some basics before dealing with Git and GitHub directly

## What is Version Control?

We all implement it, e.g.,
* paper_draft_v1.pdf
* paper_draft_v2.pdf
* paper_draft_v2a.pdf
* paper_draft_v2a_1.pdf
...


$~~~~~~~~~~~~~~~~~~~~~~$

and we all know the perils of that kind of file-handling.

## What is Version Control?

#### From Wikipedia:
“.. management of changes to documents, programs, and other information stored as computer files.”

#### Reproducibility?

* Tracking and recreating every step of your work
* In the software world: it's called *Version Control*!

#### What do (good) version control tools give you?

* Peace of mind (backups)
* Freedom (exploratory branching)
* Collaboration (synchronization)

## Git is a version control software

It allows preserving the various states of your work, and can work for basically anything including
* Code management
* Custom style files and macros
* Collaborative writing (group projects, proposals)
* Everyday research
* Personal website history tracking

## GitHub is a cloud based service for version control via Git.



## The plan for this tutorial

- [A quick overview of Git key concepts](#overview)
- [Getting setup with Git/GitHub](#github)
- [A (very) simple exercise to get you going](#exercise)
- [Short intro to `desihub`](#desihub)
- [Further resources](#resources)

<a id="overview"></a>

## High level picture: overview of key concepts

Non-git workflow for say a *single* Word document
- Add text
- Save it (preserving the snapshot of the work at this time)
- Add more text
- Save it (preserving the snapshot of the work at this new time _while overwriting the old version_)

In a version controlled framework
- Add text
- **Commit** it (preserving the snapshot of the work at this time)
- Add more text
- **Commit** it (preserving the snapshot of the work at this new time)

Then, a **repository** is a group of *linked* commits. You can easily see what has changed between commits, revert from one to other, etc.

![](files/images/threecommits.png)




Credit: ProGit book, by Scott Chacon, CC License.

The git repository is generally hosted on a **remote**, which syncs changes across the copies of the git repository (that are connected to it)

When you make local changes, you **push** them to the remote.

When there are changes others have made that you want to get in your local copy, you **pull** them from the remote.

$~~~~~~~~~~~~~~~~~$

Disclaimer: a lot of new terms here - but don't panic! You'll get a feel for some of these concepts once we go through the exercise.

$~~~~~~~~~~~~~~~~~$

First things first though ... 

<a id="github"></a>

## Git and GitHub: Access

Lets make sure that we have the necessary tools at hand for the exercise coming up:

* [Install git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git ) on your laptop if you don't already have it.
    - If you don't know if you have git, run `git --version` in your terminal. If it runs, then you're good to go.

If there's additional time, we will work on getting set up on GitHub etc. For that, you'd need to

* [Sign up for a GitHub account](https://github.com/ ) if you don't already have it.
    - You might want  to use your `.edu` email, as then you can easier access to unlimited private repositories and other perks available via the [student developer pack](https://education.github.com/pack).

<a id="exercise"></a>

## Simple Exercise

Each of us will create a local git repository, add a simple file, commit our local changes, make changes to the same file, and commit them.

To get started,

### Open your terminal (where you will type in the commands) or download this notebook at [https://tinyurl.com/git-desi](https://tinyurl.com/git-desi) to run things.

Lets first see what options git offers

In [None]:
!git

We'll go through some of them these here.

### `git init`: create an empty repository

In [None]:
%%bash
# remove repo if existing; needed for re-runs of this cell
rm -rf test

# initiate repo
git init test

**Note:** if you're running the upcoming cells in the terminal, you need to cd into the `test` folder only once.

Since we are putting all of them here in a single notebook for the purposes of the tutorial, they will all be prepended with the first two lines:

    %%bash
    cd test

that tell iPython to do that each time.  But you should ignore those two lines and type the rest of each cell yourself in your terminal.

Move into the repo and see what git did.

In [None]:
%%bash
# move into the repo; not applicable if working in a terminal and already in test folder
cd test

# list out all the contents
ls

In [None]:
%%bash
# move into the repo; not applicable if working in a terminal and already in test folder
cd test

# list out the contents
ls -la

In [None]:
%%bash
# move into the repo; not applicable if working in a terminal and already in test folder
cd test

# list out the contents
ls -l .git

### Create a new file

In [None]:
%%bash
# move into the repo; not applicable if working in a terminal and already in test folder
cd test

# create a file named file1.txt
echo "My first bit of text." > file1.txt

Lets look at the folder now

In [None]:
%%bash
# move into the repo; not applicable if working in a terminal and already in test folder
cd test

# list out the content
ls -al

### `git status`: see what git notices

In [None]:
%%bash
# move into the repo; not applicable if working in a terminal and already in test folder
cd test

# check status
git status

Note that you are on **branch** named `master` (there's effort underway to move away from the name `master` to `main`; see more [here](https://github.com/github/renaming)).

### `git add`: tell git about your new file

In [None]:
%%bash
# move into the repo; not applicable if working in a terminal and already in test folder
cd test

# add file1.txt
git add file1.txt

Lets check the `status` again

In [None]:
%%bash
# move into the repo; not applicable if working in a terminal and already in test folder
cd test

# check status
git status

Note how the file is no longer untracked - it is **staged** to be committed.

### `git commit`:  record our changes in git's database

In [None]:
%%bash
# move into the repo; not applicable if working in a terminal and already in test folder
cd test

# commit file1.txt
git commit -m "Our first commit - added a file."

In the commit above, we  used the `-m` flag to specify a message at the command line.

If we don't do that, git will open the editor we specified in our configuration above and require that we enter a message.

By default, git refuses to record changes that don't have a message to go along with them (though you can obviously 'cheat' by using an empty or meaningless string: git only tries to facilitate best practices, it's not your nanny).

**Note: commit message should be short and informative (for your future self and others)!**

Lets check the `status` again

In [None]:
%%bash
# move into the repo; not applicable if working in a terminal and already in test folder
cd test

# check status
git status

### `git log`: what has been committed so far

In [None]:
%%bash
# move into the repo; not applicable if working in a terminal and already in test folder
cd test

# check the log
git log

_You should see your first commit!_

### `git diff`:  see changes to committed files

In [None]:
%%bash
# move into the repo; not applicable if working in a terminal and already in test folder
cd test

# lets add some more text to the file
echo "And now some more text..." >> file1.txt

In [None]:
%%bash
# move into the repo; not applicable if working in a terminal and already in test folder
cd test

# see what has changed in the repo
git diff

Lets check git status again

In [None]:
%%bash
# move into the repo; not applicable if working in a terminal and already in test folder
cd test

# check status
git status

Lets commit the new changes

In [None]:
%%bash
# move into the repo; not applicable if working in a terminal and already in test folder
cd test

# add and commit changes to file1.txt
git add 'file1.txt'
git commit -m "Added more text."

Lets look at the git log again

In [None]:
%%bash
# move into the repo; not applicable if working in a terminal and already in test folder
cd test

# check the log
git log

### Let's recap what we've learned so far ...

The cycle of git virtue: work, add, commit, work, add, commit, ...
- Work
- Check status (and diff for more details)
- Add new/changed file(s)
- Commit staged file(s)
- Repeat

Commands seen so far:
- `git init`
- `git status`
- `git add`
- `git commit`
- `git log`
- `git diff`

You'll need a few more to really get going but we don't have time to practice them:
- `git clone <github-url>`: clone a git repo
- `git remote -v`: list all the remotes
- `git remote add <remote-name-you-assign> <github-url>`: add remote (e.g. a GitHub repo)
- `git push`: push our changes to the remote
- `git pull`: pull changes from the remote and integrate them into your working files. (If you don't want to have a direct merge, you can try `git fetch` which will only download new data from remote).

There are other handy commands like `git diff --staged`, `git commit --amend`, `git stash`, etc. that you can investigate too.

$~~~~~~~~~~~~~~~~~~~$

For practice with progressively more difficult exercises, follow e.g. [this tutorial](https://github.com/humnaawan/git-tutorial/blob/master/git-intro.ipynb) and references therein.

<a id="desihub"></a>

## desihub

DESI members are a part of the GitHub DESI organization: https://github.com/desihub

(If you are not a part of the GitHub organization yet, please send your GitHub handle to Stephen Bailey.)

Some notes:
- Non-developer members _cannot_ push directly to repositories, so you'll employ the following workflow:
    - **Fork** the repo you want to contribute to. This will essentially give you a copy of the repo that you can directly change.
    - Make changes. Commit them. Push them to your fork as the remote.
    - Issue a **Pull Request (PR)** to signal to the repo's developers re your changes.
    - The developers will review changes, possibly do a code review, and then **merge** them into the main repo.


**We do NOT have time to demonstrate this workflow here** but you can practice it on [Phil Marhsall's gettingStarted repo](https://github.com/drphilmarshall/GettingStarted#seriouslylost) (where instructions are provided on how exactly to get to issuing a PR).

- As a non-developer, you should be able to create repos under `desihub` and work in them as a developer (meaning that you can push directly to the `master` branch).

<a id="resources"></a>

## Further resources

- See [Jake Vanderplas's tutorial notebook](https://github.com/jakevdp/git-intro/blob/master/git-intro.ipynb ) for additional material/references.
- [Phil Marshall's gettingStarted repo](https://github.com/drphilmarshall/GettingStarted ); there's very helpful Q/A on the main page + a video tutorial of some of the basics. You can practive issuing a PR here, or on [Humna Awan's tutorial repo](https://github.com/humnaawan/git-tutorial).


p.s. additional things to learn about: creating and working in branches, creating GitHub issues and linking them to commits, conflict resolutions. The resources linked above should help with all these!

$~~~~~~~~~~~~~~~$

Whew - that was a lot! Questions?

## Additional todos if there's time left

(adapated from the links in [further resources above](#resources))
- [Setting up SSH connection to your GitHub](#ssh)
- [Configuring Git](#gitconfig)
- [Exercise: adding a GitHub remote](#add-remote)
- [Exercise: syncing work via Git](#sync)
- [Exercise: conflict management](#conflict)

<a id="ssh"></a>

## Git and GitHub: Access

* [Set up SSH connection to your GitHub account](https://help.github.com/en/articles/connecting-to-github-with-ssh )

First check if you have a publich SSH (when running this in your terminal, run without the ! at the beginning)

In [None]:
!ls -al ~/.ssh | grep .pub

- If you dont have a public SSH key, please [generate one and add it to the ssh agent](https://help.github.com/en/articles/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent )

Then, [add the SSH key to your GitHub account](https://help.github.com/en/articles/adding-a-new-ssh-key-to-your-github-account )

<a id="gitconfig"></a>

## Configuring Git

The minimal amount of configuration for git to work without pestering you is to tell it who you are.  All the commands here modify the ``.gitconfig`` file in your home
directory.

Lets first check the contents of the config file (again, when running this in your terminal, run without the ! at the beginning):

In [None]:
!cat ~/.gitconfig

If the `.gitconfig` file doesn't have your info, please modify it (again, when running these in your terminal, run without the ! at the beginning):

In [None]:
%%bash
# you'll need to comment the next two lines out if you're running in this ipython notebook
#git config --global user.name "John Doe"
#git config --global user.email "johndoe@uw.edu"

Check the config file again to be sure (again, without the ! at the beginning if running in your terminal):

In [None]:
!cat ~/.gitconfig

Github offers in its help pages instructions on how to configure the credentials helper for [Mac OSX](https://help.github.com/articles/set-up-git#platform-mac) and [Windows](https://help.github.com/articles/set-up-git#platform-windows).

<a id="add-remote"></a>

## Using remotes as a single user

Lets now connect our local repository to a *remote repository*: a pointer to another copy of the repository that lives on a different location (e.g. on GitHub).

### ``git remote``: view/modify remote repositories

In [None]:
%%bash
# move into the repo; not applicable if working in a terminal and already in test folder
cd test

# list out what we have in the folder
ls

# check the remotes
echo "Let's see if we have any remote repositories here:"
git remote -v

Since the above cell didn't produce any output after the `git remote -v` call, it means we have no remote repositories configured.

### Configuring a remote
Go to the [new repository page](https://github.com/new) and make a repository called `test` (you can stick with the default of a Public repository).

Do **not** check the box that says `Initialize this repository with a README`, since we already have an existing repository locally.  That option is useful when you're starting first at Github and don't have a repo made already on a local computer.



We can now follow the instructions mentioned in the repo page, and add the remote to our local repo:

In [None]:
%%bash
# move into the repo; not applicable if working in a terminal and already in test folder
cd test

# update the url link to include your github username
git remote add origin git@github.com:<your-github-username>/test.git

Let's see the remote situation again:

In [None]:
%%bash
# move into the repo; not applicable if working in a terminal and already in test folder
cd test

# check remotes again
git remote -v

$~~~~~~~~~~~~~$

A few things to note here:
- The remote is named `origin` by default. This name will play a role when you push/pull form _this_ remote.
- The permissions you have for this remote are lised in parathesis. Since you have full access to the repo, you can push and pull (similar to fetch, as mentioned above).

### Pushing changes to a remote repository

Now push the ``master`` branch to the remote named ``origin``.

In [None]:
%%bash
# move into the repo; not applicable if working in a terminal and already in test folder
cd test

git status

In [None]:
%%bash
# move into the repo; not applicable if working in a terminal and already in test folder
cd test

# push local changes to the remote named origin
### NOTE: if you set up a passphrase for your GitHub RSA, running the following will throw an error
##        so please run the following in your terminal after cd'ing into test
git push origin master

We should be able to now see your local contents in the repo publicly on github.

<a id="sync"></a>

### Using Git to Sync Work

Let's see how this can be useful for backup and syncing work between two different computers.  We'll simulate a 2nd computer by working in a different directory...

In [None]:
%%bash

# here we clone our 'test' repo but with a different name, test2, to simulate a 2nd computer
git clone https://github.com/<your-github-username>/test.git test2
# move into the new clone
cd test2

# print out the path
pwd

# list the remote
git remote -v

Let's now make some changes in one 'computer' and synchronize them on the second.

In [None]:
%%bash
# move into the 2nd repo; not applicable if working in a terminal and already in test2 folder
cd test2  # working on computer #2

# create a new file and commit it
echo "More new content on my experiment." >> experiment.txt
git add 'experiment.txt'
git commit -m "More work, on machine #2."

Now we put our new work up on the GitHub server so it's available from the internet.

In [None]:
%%bash
# move into the 2nd repo; not applicable if working in a terminal and already in test2 folder
cd test2

# push the local changes to the master branch on the remote named origin
### NOTE: if you set up a passphrase for your GitHub RSA, running the following will throw an error
##        so please run the following in your terminal after cd'ing into test2
git push origin master

### Now let's pull that work from machine #1:

In [None]:
%%bash
# move into the first repo
cd test

# pull from the master branch of the remote named origin
### NOTE: if you set up a passphrase for your GitHub RSA, running the following will throw an error
##        so please run the following in your terminal after cd'ing into test
git pull origin master

<a id="conflict"></a>

## Conflict management
While git is very good at merging, if two different branches modify the same file in the same location, it simply can't decide which change should prevail.  At that point, human intervention is necessary to make the decision.  Git will help you by marking the location in the file that has a problem, but it's up to you to resolve the conflict.  Let's see how that works by intentionally creating a conflict.

We start by **creating a branch** and making a change to our experiment file:

In [None]:
%%bash
# move into the first repo; not applicable if working in a terminal and already in test folder
cd test

# create a new branch; name it trouble
git branch trouble
# move to the new branch
git checkout trouble

# create a file
echo "This is going to be a problem..." >> experiment.txt

# add and commit to this branch
git add 'experiment.txt'
git commit -m "Changes in the trouble branch."

And now we go back to the master branch, where we change the *same* file:

In [None]:
%%bash
# move into the repo; not applicable if working in a terminal and already in test folder
cd test

# move to the master branch
git checkout master

# add to the file and commit changes
echo "More work on the master branch..." >> experiment.txt

git add 'experiment.txt'
git commit -m "Added content in master."

In [None]:
%%bash
# move into the repo; not applicable if working in a terminal and already in test folder
cd test

# check status
git status

### The conflict...

So now let's see what happens if we try to merge the `trouble` branch into `master`:

In [None]:
%%bash
# move into the repo; not applicable if working in a terminal and already in test folder
cd test

# try merging the branch trouble into the branch master
### NOTE: running the following will throw an error but its okay. the main point is that
##        you should see a message mentioning that git has noticed a conflict
git merge trouble

In [None]:
%%bash
# move into the repo; not applicable if working in a terminal and already in test folder
cd test

# get status
git status

### Let's see what git has put into our file:

In [None]:
%%bash
# move into the repo; not applicable if working in a terminal and already in test folder
cd test

# lets look into the file
cat experiment.txt

In [None]:
%%bash
# move into the repo; not applicable if working in a terminal and already in test folder
cd test

# git diff should show the changes more clearly; helpful when the file in conflict is long
# and you just want to know the conflict
git diff

At this point, since we have a short file, we can go just into the file with a text editor (e.g. via your terminal), decide which changes to keep, and make a new commit that records our decision. For simplicity, you could keep all the changes, alongside adding a note re the conflict mgmt. Dont forget to remove the conflict markers <<<<<<<, =======, >>>>>>>.

Lets look at our cleaned, conflict-free file:

In [None]:
%%bash
# move into the repo; not applicable if working in a terminal and already in test folder
cd test

# look at the new file content
cat experiment.txt

Let's then make our new commit:

In [None]:
%%bash
# move into the repo; not applicable if working in a terminal and already in test folder
cd test

# add commit the file
git add 'experiment.txt'
git commit -m "Completed merge of trouble, fixing conflicts along the way."

# lets check the log now; note: the additional markers help demonstrate the workflow
git log --oneline --topo-order --graph

**Note: you can add an alias in your gitconfig so that e.g. `git slog` would run `git log --oneline --topo-order --graph`.** You can do this by running `git config --global alias.slog "log --oneline --topo-order --graph"`.

### Merge Tools

*Note:* While it's a good idea to understand the basics of fixing merge conflicts by hand, in some cases you may find the use of an automated tool useful.

Git supports multiple [merge tools](https://www.kernel.org/pub/software/scm/git/docs/git-mergetool.html): a merge tool is a piece of software that conforms to a basic interface and knows how to merge two files into a new one.  Since these are typically graphical tools, there are various to choose from for the different operating systems, and as long as they obey a basic command structure, git can work with any of them.