# Learning Git

I was recently asked how I would introduce git to new data scientists. 

It's been so long since I was a noob to it, and really only picking up commands and understanding them as needed - that I decided to think it through from basics. Below are some broad steps, I'll try and write posts with links to resources on each of these.

A key part of understanding Git is that there are multiple interpretations on what is happening when using git - [[1]](http://www2.math.uu.se/~thulin/mm/breiman.pdf), the so called Rashomon effect. Also bear in mind that:

* a repository is a container, a hidden folder and sometimes referred to as the commit tree
* there are many effective workflows for using git
* there are multiple ways to achieve a desired outcome
* several features of git that aren't explicitly related to one another
* only some features are needed for certain workflows
* some workflows rely on features (such as pull requests) that are not available in git, but through hosting services such as github and bitbucket
* the git API has ~200 commands that act on the git repository object, about 40 of those commands are needed for 99% of use cases.
* (~15 of those 40 commands have 4/5 regularly used flag options.)

Perhaps the best description of git is "it's a type of version control system for a project with features that enable collaboration between individual contributors" - [[2]](https://www.atlassian.com/git).

In the end, the best way to operate git is to just use it. 

## A Version Control System

A version control system (VCS) on a project allows making checkpoints, recording the state of the project - files and sub-folders, as it evolves over time. These checkpoints are recorded to a store using the difference between the current state and the previous checkpoint. Any VCS has methods to navigate to different states of the checkpoints. When called to a certain checkpoint, these methods change the repository container (files and sub-folders) to the state of the checkpoint.

Git is a type of VCS - it records checkpoints as commits and allows navigating them. These checkpoint commits point back to earlier commits, until the root commit, which is the first checkpoint. Therefore a sequence of commits forms a data structure that is a tree - a special kind of mathematical graph, often referred to as as the repository. Hence a commit has only one parent - another commit, unless a merge is being made.

To make a checkpoint, the incantation is a two-step procedure. First, files are marked as in a staging area. Second, once there is a coherent small piece of work in the staging area, this unit of work is checkpointed as a commit to the repository. The idea is that staging allows forming commits as a set of changes to a number of specific files, as opposed to changes that may not be relevant. In practice, many find that staging is an extra step - all files modified/created since a commit are committed to the next commit.

Nonetheless, in addition to navigating commits, some methods are needed to add, remove files to the staging area. In addition sometimes a block of changes can be stashed to a local store so that navigating commits can still be allowed.

## The Repository Is a Commit Graph

As mentioned, these commits form a graph in the repository. The commits are Nodes and they point to parent commits. Navigating to a commit changes the repository to the state of the previous commit.


## Branches: Pointers To Nodes

A branch is just a moveable pointer to a particular leaf tip Node on the commit tree. When on a branch, the HEAD (reference to a commit) is automatically changed if a new commit is made.

A detached HEAD state is when a commit is navigated to without a branch. It is also possible to change the tip node of a branch by forcing a change.

## Combining Branches

![Repo in Folder](https://i.stack.imgur.com/nWYnQ.png)

To deeply understand combining branches, we need to understand the storage locations and data model of git. However, as a user - the API, given through commands, is often enough:

There are three main ways to combine branches:

* merge 
* rebase 
* pull (which is a fetch followed by a merge for a remote branch)

### Fetch

Remote repositories have local tracking branches when linked to a local repository. A fetch command updates the tracking branch to match the commits up to the tip of the remote repository's branch.

### Merge

It is possible to create a new commit which combines two branches. Merges happen when a commit is added to a branch that incorporates the tip commit of another branch. If there are conflicts, the git user has to choose how to fix those. 

Once the merge is completed, the merge commit is the leaf tip on the branch being merged into. Depending on the workflow the branch that was merged from can be deleted.


### Pull

A pull is a fetch from remote repository branch, followed by a merge. If the merge fails, then the user is prompted to resolve the conflict or abort.

### Rebase And Reset 





## Commonly Used Git Commands 


## Remote Repository

## Remote Branches

## Collaborative Git: GitHub

## WorkFlows

### Centralized Workflow

### Feature Branch Workflow

### Forking Workflow

### Git Flow Workflow

### GitHub Workflow

### GitLab Workflow

### One Workflow