# Pre-requisites

To follow along, please do the following before we start:

- [Install JupyterLab](https://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html)
- [Install git for your platform](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)
- [Create a GitHub account](https://github.com/)

# Git & GitHub for Data Scientists

## Topics

- [Git](https://git-scm.com/book/en/v2)
- [GitHub](https://docs.github.com/en)
- [ReviewNB](https://github.com/marketplace/review-notebook-app)
- [nbdime](https://nbdime.readthedocs.io/en/latest/)

# Goal - Understand This Picture

![](git-github-summary.svg)

# Questions to be Answered

## Conceptual
- What is Git?
- What is GitHub?
- What is version control?
- What's a repository?
- What's a commit?
- What's a fork?
- What's a branch?
- What's a diff?
- What's a conflict?
- What's a pull request?

## Practical
- How do I push changes for others to see?
- How do I merge the changes others have made?
- How do I notify others that I have made changes?
- How can I get an overview of the repository?
- What's the difference between branches and forks and which one should I use?

# What is GitHub?

GitHub is a Git repository hosting platform. 
- Provides a central place to store your source code.
- Enables collaboration with others through *pull requests*.

# Why GitHub? 

I honestly don't know.

*It's such a pain...*

## Good for:
- Plain text files
- Source code (e.g. Python, Java, JavaScript, C/C++, HTML, etc.)

## Not so good for:
- Basically, anything else, including:
- Images
- Video
- Jupyter notebooks

**[But, it renders notebooks!](https://github.com/DataCircles/traffic_collisions_viz_team/blob/master/notebooks/EDA_traffic_collision_workshop.ipynb)**

# Demo - Commit & Push

1. Create a new repo. – https://github.com/new
1. Clone the repo locally.
```sh
git clone <repo> github-jupyter-covid
```
1. Create a new notebook file.
   - Make a plot using data from [here](https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports_us).
1. Stage, commit, and push the file back to GitHub.
```sh
git add .
git commit -am "<commit message>"
git push
```
1. Take a look at the Notebook on GitHub.

# What is Git?

- A distributed version control system.

# What's a version control system?

- Software that keeps track of the history of files in a repository.

# What's a repository?

![](git-repository.svg)

- An ordered collection of snapshots of files.
- Each snapshot is called a commit.
- Each commit is a point in time whose state can be restored.

# What makes it distributed?

![](git-distributed-repos.svg)

- The repository isn't just stored on a central server like GitHub.
- Your locally cloned repo is a **full copy** of the one on GitHub along with the full history of every file.
- Forks are also fully copies of a repository.

# What's a branch?

![](git-branches-merge.svg)

- A separate line of commits off the main branch.
- When done, we *merge* the branch:
  - Either, back to the main branch.
  - Or, into the child branch.

# Demo - Branching & Merging Locally

1. Initialize a repo.
1. Create a README file and commit it.
1. Make a branch and commit something.
1. Switch to master and commit something.
1. Observe the change with `git log` or `git dag`
1. Merge from master into branch.

## Follow-up

1. View changes at previous commits.

# Configure `git dag`

Run:

```sh
git log --graph --oneline --all
```

*Or, for something really special, put the following in to your ~/.gitconfig file:*

```
[alias]
	dag = log --graph --abbrev-commit --decorate --date=relative --format=format:'%C(bold blue)%h%C(reset) -%C(auto)%d%C(reset) %C(bold white)%s%C(reset)%n          %C(dim white)%an%C(reset) <%ae> -%C(reset) %C(cyan)%aD%C(reset) %C(green)(%ar)%C(reset)' --all
```

Now you can run:

```sh
git dag
```

# What are Pull Requests?

- A request to review changes before merging changes into the main branch.

## What's a diff?

- Reveals just the changes that were made, which makes the reviewer's life easier.

[![](diff.png)](https://github.com/music-markdown/music-markdown/pull/87/files)


# Demo - Fork & Pull Request

1. Fork the github-jupyter/covid
1. Clone the repo locally
    ```sh
    git clone <repo> shadanan-covid
    ```
1. Update the covid notebook -- add a chart dividing confirmed cases by population.
   - Use state population data from [here](https://www.kaggle.com/lucasvictor/us-state-populations-2018/data?select=State+Populations.csv).
1. Use GitHub to create a pull request against upstream.

# Demo - ReviewNB

- Notebooks are [JSON](https://en.wikipedia.org/wiki/JSON) - they don't diff well.

1. Install [ReviewNB](https://github.com/marketplace/review-notebook-app).
1. Use ReviewNB to view the diff and make a comment.
1. Merge the changes.

# Synchronizing Changes

![](git-synchronization.svg)

## `git fetch`

![](git-synchronization-1.svg)

## `git pull` (on master)

![](git-synchronization-2.svg)

## `git push` (on branch)

![](git-synchronization-3.svg)

# Git's Fundamental Concept

- Tools that help you manage commits, and synchronize those commits between repositories.

# What's a conflict?

![](git-branches-conflict.svg)

- A change that cannot be automatically reconciliated.

# Dealing with Conflicts

- Jupyter notebooks aren't easy to merge because they are JSON docs.
- ***The best way to deal with conflicts is to avoid them.***
  - Don't work on the same notebook at the same time as someone else.
  - Have everyone on your team make changes in a notebook with their initials.
  - Have a single person be in charge of merging the final changes into the source of truth notebook.

# NO! Jupyter Notebooks and Data Science are too easy. I insist on merging notebooks the ***hard way!***

*Or you end up in a situation where you have no other choice...*

# Demo - Merge Conflict

1. We need to simulate a conflict.
1. Create a branch in upstream that doesn't have our Per Capita change.
1. Add Positivity Rate to the notebook.
1. Make a Pull Request

# Demo Cont'd - nbdime (***N***ote***B***ook ***DI***ff & ***ME***rge)

1. Try merging origin/master
```sh
git merge origin/master
```
   - Observe that the notebook is now broken.
1. Abort the merge.
```sh
git merge --abort
```
1. [Install nbdime](https://nbdime.readthedocs.io/en/latest/installing.html).
```sh
pip3 install --upgrade nbdime
nbdime extensions --enable
```
1. [Enable nbdime for the current repo](https://nbdime.readthedocs.io/en/latest/vcs.html)
```sh
nbdime config-git --enable
```
1. Run the merge again.
```sh
git merge origin/master
```
1. Use nbdime's merge tool:
```sh
git mergetool --tool nbdime -- *.ipynb
```

# Summary

![](git-github-summary.svg)

# Bonus Demo - black

- An opinionated Python code formatter.
- Useful when collaborating because it eliminates code style discussions in reviews.
- Installation instructions are [here](https://jupyterlab-code-formatter.readthedocs.io/en/latest/index.html).