# Pre-requisites

To follow along, please do the following before we start:

- [Install JupyterLab](https://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html)
- [Install git for your platform](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)
- [Create a GitHub account](https://github.com/)

# Git & GitHub for Data Scientists

## Topics

- [Git](https://git-scm.com/book/en/v2)
- [GitHub](https://docs.github.com/en)
- [ReviewNB](https://github.com/marketplace/review-notebook-app)
- [nbdime](https://nbdime.readthedocs.io/en/latest/)

# Questions to be Answered

## Conceptual
- What is Git?
- What is GitHub?
- What is version control?
- What's a repository?
- What's a commit?
- What's a fork?
- What's a branch?
- What's a diff?
- What's a conflict?
- What's a pull request?

## Practical
- How do I push changes for others to see?
- How do I merge the changes others have made?
- How do I notify others that I have made changes?
- How can I get an overview of the repository?
- What's the difference between branches and forks and which one should I use?

# What is GitHub?

GitHub is a Git repository hosting platform. 
- Provides a central place to store your source code.
- Enables collaboration with others through *pull requests*.
- **[Also, it renders notebooks!](https://github.com/DataCircles/traffic_collisions_viz_team/blob/master/notebooks/EDA_traffic_collision_workshop.ipynb)**

# Getting Started to Follow Along

1. Start your Jupyter or JupyterLab:
```sh
jupyter lab
```
1. Open a terminal to run commands.

# Example

Niwako and her team are collaborating on a Covid-19 dataset.

![](git-shad-niwako-repo-setup.svg)

# Demo - Commit & Push

1. Create a new repo. – https://github.com/new
1. Clone the repo locally.
```sh
git clone <repo> github-jupyter-covid
```
1. Create a new notebook file.
   - Make a plot using data from [here](https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports_us).
1. Stage, commit, and push the file back to GitHub.
```sh
git add .
git commit -am "<commit message>"
git push
```
1. Take a look at the Notebook on GitHub.

# What is Git?

- A distributed version control system.

## What's a version control system?

- Software that keeps track of the history of files in a repository.

# What's a repo (repository)?

![](git-repository.svg)

- The place where all the files in your project are stored,
- Along with every version of those files that were committed,
- Including files in other branches.

# What's a branch?

![](git-branches-merge.svg)

- A separate line of commits off the main branch.
- When done, we *merge* the branch:
  - Either, back to the main branch.
  - Or, into the child branch.

# How does it work?

![](git-distributed-repos.svg)

- GitHub hosts your team's repository.
- Your fork of the team's repo on GitHub is a **full copy** of the team's repo along with the full history of every file.
- Your locally cloned repo of your fork is also a **full copy** of the ones on GitHub.

# What are Pull Requests?

- A request to review changes before merging changes into the main branch.

## What's a diff?

- Reveals just the changes that were made, which makes the reviewer's life easier.

[![](diff.png)](https://github.com/music-markdown/music-markdown/pull/87/files)


# Revisiting our Example

![](git-shad-niwako-repo-pr.svg)

# Demo - Fork & Pull Request

1. Fork the [github-jupyter/covid](https://github.com/github-jupyter/covid)
   - This is Shad's forked repo.
1. Clone the repo locally
```sh
git clone <repo> shadanan-covid
```
1. Update the covid notebook -- add a chart dividing confirmed cases by population.
   - Use state population data from [kaggle](https://www.kaggle.com/lucasvictor/us-state-populations-2018/data?select=State+Populations.csv) or from the original [source](https://worldpopulationreview.com/states).
1. Use GitHub to create a pull request against upstream.
1. Observe the state of your repo with:
```sh
git log --graph --all
```

# Demo - ReviewNB

- Notebooks are [JSON](https://en.wikipedia.org/wiki/JSON) - they don't diff well.

1. Install [ReviewNB](https://github.com/marketplace/review-notebook-app).
1. Use ReviewNB to view the diff and make a comment.

# Synchronizing Changes

![](git-synchronization.svg)

## `git fetch`

![](git-synchronization-1.svg)

## `git pull` (on master)

![](git-synchronization-2.svg)

## `git push` (on branch)

![](git-synchronization-3.svg)

# What's a conflict?

![](git-branches-conflict.svg)

- A change that cannot be automatically reconciliated.

# Dealing with Conflicts

- Jupyter notebooks aren't easy to merge because they are JSON docs.
- ***The best way to deal with conflicts is to avoid them.***
  - Don't work on the same notebook at the same time as someone else.
  - Have everyone on your team make changes in a notebook with their initials.
  - Have a single person be in charge of merging the final changes into the source of truth notebook.

# But sometimes, you end up in a situation where you have no other choice. 

Let's see what options we have...

# Demo - Simulating a Conflict

- For demonstrations purposes, we need to simulate a conflict. 
- Niwako will add Positivity Rate to the notebook before we have merged Shad's per capita change in to upstream.
- Because Niwako owns her repo, she pushes to master without doing a pull request.

# Revisiting our Example

![](git-shad-niwako-repo-conflict.svg)

# Demo Cont'd - nbdime (***N***ote***B***ook ***DI***ff & ***ME***rge)

1. Add `github-jupyter/covid` to our remotes.
```sh
git remote add upstream git@github.com:github-jupyter/covid.git
```
1. Use `git fetch upstream` to get Niwako's changes.
1. Use `git log --graph --all` to view the state of all the repos.
1. Try merging upstream/master
```sh
git merge upstream/master
```
   - Observe that the notebook is now broken.
1. Abort the merge.
```sh
git merge --abort
```
1. [Install nbdime](https://nbdime.readthedocs.io/en/latest/installing.html).
```sh
pip3 install --upgrade nbdime
nbdime extensions --enable
```
1. [Enable nbdime for the current repo](https://nbdime.readthedocs.io/en/latest/vcs.html)
```sh
nbdime config-git --enable
```
1. Run the merge again.
```sh
git merge origin/master
```
1. Use nbdime's merge tool:
```sh
git mergetool --tool nbdime -- *.ipynb
```

# Git Cheat Sheet

- Git Config
  - Global: `~/.gitconfig`
  - Repo: `.git/config`
- Status
```sh
git status
```
- Log / Dag
```sh
git log --graph --oneline --all
```
- Committing:
```sh
git add [file]
git commit
```
- [GitHub's Cheet Sheet](https://github.github.com/training-kit/downloads/github-git-cheat-sheet.pdf)

# Summary

![](git-github-summary.svg)

# Bonus Demo - black

- An opinionated Python code formatter.
- Useful when collaborating because it eliminates code style discussions in reviews.
- Installation instructions are [here](https://jupyterlab-code-formatter.readthedocs.io/en/latest/index.html).

# Configure `git dag`

Run:

```sh
git log --graph --all
```

*Or, for something really special, put the following in to your ~/.gitconfig file:*

```
[alias]
	dag = log --graph --abbrev-commit --decorate --date=relative --format=format:'%C(bold blue)%h%C(reset) -%C(auto)%d%C(reset) %C(bold white)%s%C(reset)%n          %C(dim white)%an%C(reset) <%ae> -%C(reset) %C(cyan)%aD%C(reset) %C(green)(%ar)%C(reset)' --all
```

Now you can run:

```sh
git dag
```