# Git & GitHub

## Resources

> These two are everything you will probably ever need with `git` (maybe some StackOverflow sometimes... :) )

- [Pro Git Book](https://git-scm.com/book/en/v2) - one of the best resources about git (also source for some content seen here)
- [GitHub documentation](https://docs.github.com/en) - same as above, but for GitHub

## git

> `git` is a distributed __version control system__ (VCS) and de facto standard for code collaboration right now

It is a program one can use within command line, which we know a little bit about by now!

### Version Control System

> __System__ which records changes to file(s) over time, so you can always access any previous code __version__ 

![](images/distributed_vcs.png)

> Distributed means, that each node (client) __mirrors full repository and it's history__

## git features

Other VCS systems (and less popular currently) __stored information about file changes__, see below:

![](images/delta_based_vcs.png)

__`git` records "snapshots" of the code, known as commits__; after each commit it takes a "picture" of the whole system and stores reference to it:

![](images/git_snapshots.png)

Other important traits:
- __All of the operations are performed locally__:
    - Obtain the repository from some central server
    - Perform local changes and save them (__those are not reflected anywhere besides your computer (until you `push` them to central server))__
    - Once again: all of the changes are local!
- __`git` only adds data!__
    - If you remove a file, this operation could be seen as: "add file deletion"
    - __You will not lose data if you commit your changes frequently__ (so don't worry about removing if the changes are under VCS)
    - You ESPECIALLY will not lose it if you push your changes to some central server (yes, we are getting towards GitHub slowly)

### State of any file in `git`

`file` under `git`'s version control can be in one of the three stages (directly taken from Pro Git Book, marvelous work once again):
- __Modified__ means that you have changed the file but have not committed it to your database yet
- __Staged__ means that you have marked a modified file in its current version to go into your next commit snapshot
- __Committed__ means that the data is safely stored in your local database.

## Hands-on experience with git

>  Git is something that you learn by using within the context of projects. __Don't get too attached to what we are doing right now (understand what's going on though). At the end of this notebook you will see a full workflow incorporating `git` and `github`__.

Let's see how to:
- `init`ialize our project with `git` using `git init`
- Create some file (__it will be in modified stage__)
- Check status of `git` using `git status`
- Add the file using `git add FILE` (__our file will be in staged state__)
- Commit changes using `git commit -m "YOUR_GIT_COMMIT_MESSAGE"` (__our file will be in commited (saved) state with some message describing the changes__)

## Tips
- You can just use `git add -A` to add every file or `git add .` to add every file in your current folder (a single `.` usually refers to this folder, whilst `..` refers to the parent folder)
- __First letter of the commit message should be capitalised__
- Use the present tense in Git commit messages: When using `git commit`'s commit message use "Add", "Change" etc. instead of "Added", "Changed"
- Commit is used to provide semantic meaning about what change was applied to the repository. __Don't use something like: `Fix on line 22`__ (we can already see that when looking at the commit). Instead: `Fix connection timeouts` is way better.
- Use `git commit -m "YOUR_MESSAGE_HERE"` frequently: each commit should represent a __small__ step in the right direction
    - e.g. `git commit -m 'small change 1'` ... `git commit -m 'small change 12'` rather than `git commit -m 'completed entire project'`
- __to say that one more time... commits should be small__ 

> __One semantic change, one commit rule__

## Reverting changes

> If you accidentally add too many files and commit out of rush, you can easily revert your changes

For that, we can use `git reset` command!

- `git reset HEAD~` (the HEAD is actually written, it's not a placeholder here) - reverts last `git commit` and unstages (reverts `git add`) the files (you have to change `git add` them again); __no changes to files will be done except that, so don't worry, it WILL NOT delete them!__
- `git reset [FILE]` - reverts `git add`; if `FILE` specified, unstages it, without any arguments, unstages everything

## Small exercise

Add the following (or maybe a little different, __those are yours after all__) aliases to your `.bashrc`:
- `git add` -> `ga`
- `git commit` -> `gc`
- `git reset HEAD~` -> `grc`
- `git reset` -> `gr`
- `git status` -> `gst`
- `git log --oneline` -> `glog` (single line log of all commits in this repository)

## GitHub... is not the same thing as Git!

> In this section you will see a good central server where you can store your code: GitHub

__Please create a new account on GitHub on your own__

> __All projects you will create during duration of this course should be started here (by creating a project repository)!__

## Creating a repository (repo)

- Click on the "+" sign on the right and selection "Create New Repository"

![](images/github_top_bar.png)

You will be taken to repository creation page with some options. 

- __Public vs Private__: determine whether everyone can see (fork and contribute to) this project or just you (you can add more collaborators to a private repo later)
- __Initializing repository__:
    - __Always add `.gitignore` at this step for specific language (see below)!__
    - It is advised to add some form of license (MIT is often the license of choice, but you should check out possibilities, see assessments)
    - You can also create `README.md`, but you might also add it later after cloning the project

## .gitignore

> `.gitignore` is a file which prevents adding language-related "junk" (files which are result of running and not necessary for the project) to git

- During this course you should always use `python`'s `.gitignore`
- Add specific files, or filepath expressions which can include regex, (like `data`) to this file, - one on each line

## Cloning a repository

> Click on the green `code` button and copy the link in the HTTPS tab (__you should SSH after you set it up on your own!__)

![](images/github_repo_page.png)

You can clone your repository using `git clone` command:

```bash
git clone LINK
```

This will create a local version of the code in a folder with the same name as the repo.

### GitHub tips

- __2 GB size limit for repository!__
- __Store your large files somewhere else (like AWS's S3)__
- __Use `git lfs` (https://git-lfs.github.com/) for larger files which do not change often!__

## Branches

One of the power features of `git` are __branches__:

> Branches are __movable pointers to commits__; You can think of them as separate path in code development (which you can later `merge` with another branch)

- By default, `git` creates a branch called `main` (it used to be called `master`) after `git init`. __You should always keep it as your main branch!__
- `HEAD` (which we have seen it previously) is a pointer to where in the commit history we currently are

![](images/git_branch_pointers.png)

__Using branches one can__:
- Work on new features separately from other features and developers (separation is good!)
- Make the whole thing more structured and easy to follow
- __Not pollute the `master` branch__ with untested/experimental/work in progress (WIP) code

> Use branches __ALL THE TIME__

## Working with branches

> `git branch NAME_OF_BRANCH` is a command responsible for handling branches

Let's see what happens after we issue `git branch testing`:

![](images/git_branch_testing_created.png)

Few things to notice:
- __We are still on `master` branch as indicated by `~HEAD`!__
- New branch is merely a pointer to the branch

In order to switch to this branch, we can issue `git checkout` command:

```bash
git checkout testing
```

> Tip: `git checkout -b NAME_OF_BRANCH` creates the new branch and checks it out in one command

Now we are on `testing` branch (`HEAD` points towards it) we can do the usual operations like `git add`, `git commit` on it and come up with result like this:

![](images/git_branch_testing_commited.png)


We can move back to `master` by simply issuing `git checkout master`. Things to note:
- __Your local changes will go back to how they were on `master`!__
- __This doesn't mean your files are lost. They are just commited on another branch!__


Let's commit on this one also, which leaves us with the following (divergent) branch structure:

![](images/git_branch_divergent.png)

### Tips

- You can use `git checkout -b NAME_OF_THE_BRANCH` to create branch from the current one and change to it immediately
- __Pull all changes from the remote repository before creating branch with new feature!__ (this will minimize the risk of merge conflicts)

## Pushing changes to remote

After commit we can push our changes to remote repository (outer server __like GitHub!__).

```bash
git push -u origin BRANCH_YOU_ARE_ON
```

> Usually (always in our course) you will push to your GitHub repository!

When you open your remote GitHub repository you will something like this:

![](images/push_to_github.png)

Now, after you click on green button, you will make a...

## Pull Request

> Pull request (PR) means we are asking repository/project owner (or anyone with appropriate status) to __merge__ changes located on our branch __upstream__ (to the main branch, almost always `master`)

![](images/pull_request_github.png)

At this point (don't worry, you can also do it after PR) you can (amongst other things):
- assign someone to review your work (__do it all the time if you're cooperating on the project__)
- assign someone to work with you
- give appropriate label (everyone knows at a quick glance to which part of the project your change is related to)

> __Most important is the green button for creating a PR__

## Merging

> Merging PR is when we incorporate changes into some other branch (often main branch `master`)

![](images/merging_github.png)

There are three ways to merge (you can check see them when clicking on arrow next to "Merge pull request"

### Create a merge commit

This one does the following:

- Takes all commits (__there might be multiple in one Pull Request!__)
- Creates new commit in the history
- Pushes this new commit onto branch we want to merge on (`master`)

Equivalent in command line:

```bash
git merge --no-ff
```

#### Pros

- Works well when there are __A LOT__ of commits as the whole history is a little more readable
- __Use for large Pull Requests__

#### Cons:

- Commit is pretty generic "merged pull request #43"
- Hard to know what this commit was about when looking at history
- One has to go over to this commit to see individual ones

### Squash and merge

This one does the following:

- Takes all commits (__there might be multiple in one Pull Request!__)
- Creates new commit in the history __which you can name__ (header you are using)
- Pushes this new commit onto branch we want to merge on (`master`)

Equivalent in command line:

```bash
git merge --ff-only
```

#### Pros

- Readable commit name in the main branch
- __Works well if you have a lot of random `commits` like "Fix feature A", "Fix feature A this time for sure" etc.__ as it squashes them into one sensible chunk
- __Use when you are pushing small feature/fix!__

#### Cons:

- Commits are lost
- Works poorly on large Pull Requests (with a lot of stuff happening and changing)

### Rebase and merge (encouraged)

This one does the following:

- Takes all commits (__there might be multiple in one Pull Request!__)
- Applies them one after another onto main branch

Equivalent in command line:

```bash
git checkout <feature_branch>
git rebase <target_branch>
```

#### Pros

- Keeps all details in history
- Use when you have well created commits (and not too many of them)
- __Use when you are pushing small feature/fix!__

#### Cons:

- Might inflate history
- Might be __too detailed__ for some
- It is not clear this commit comes from another branch
- Works poorly on large Pull Requests (with a lot of stuff happening and changing)

## Small exercise

Add the following (or maybe a little different, __they are yours after all__) aliases to your command line:
- `git clone` - `gcl`
- `git branch` -> `gb`
- `git checkout -b` -> `gbco`
- `git checkout master` -> `gcom`
- `git push -u origin $(git rev-parse --abbrev-ref HEAD)'` -> `gp` - pushes changes from the branch you are to new branch in remote repository
- `git pull` -> `gl`

## Exercise

> __Set up workflow for working with AiCore's Course!__

> __Everyone should share their screen during setup!__

> You may skip some steps if you already did them previously

- Create GitHub account
- Setup GitHub with SSH (see [here](https://docs.github.com/en/github/authenticating-to-github/connecting-to-github-with-ssh))
- Create new __private__ "bare repository" (without any initialization like README, licenses etc.)
- Go to your command line and setup `git` with this e-mail (more info [here](https://git-scm.com/book/en/v2/Getting-Started-First-Time-Git-Setup) if you're curious):
    - `git config --global user.name "Your Name"`
    - `git config --global user.email "your@email.com"`
- Make a "mirror clone" of `git clone --bare git@github.com:AI-Core/AiCourse.git` (notice `ssh` way of cloning)
- Change directory to the cloned repository and run: `git push --mirror https://github.com/YOUR_USERNAME/YOUR_REPO`
- __Now you can safely delete AiCourse repository and clone your mirror__

Now, you can work in this repo (and update it to keep with upstream (provided content and updates from AiCore)).

## Workflow with AiCourse

> Below are the steps outlined one has to do __BEFORE AND AFTER EVERY NEW LESSON__

- `git remote update` (updates with our changes)
- `git checkout -b LESSON_NAME`
- Do any updates/changes in the notebooks (your own notes somewhere in the notebook or anything else)
- `git add CHANGED_FILES` - add files/notes/done exercises from the lesson
- `git commit -m "Add notes from lesson LESSON_NAME"` - you can customize that one, though we encourage this way of commit.

# Exercise

One may forget about some of the steps above, so you should create two `bash` functions to do the heavy lifting and put it so it is `source`d like your aliases:

```bash
function_name () {
  commands
}
```

It should take a single argument (you can get argument using `"$1"` which would be string in both cases)

### Functions

- first one should be called `aicore_start` and do the steps __before running `jupyter notebook` (without it)__
- second one should be called `aicore_end` and do the steps __after you close `jupyter notebook`__ (addition of files, commiting and pushing)

## Challenges

### Assessment

> Most of the questions about `git` shown below have definitive answers in [Git Book](https://git-scm.com/book/en/v2)!

- What are checksums?
- How git uses checksums (see [here](https://git-scm.com/book/en/v2/Getting-Started-What-is-Git%3F))?
- What is `tagging` on git(see [here](https://git-scm.com/book/en/v2/Git-Basics-Tagging))?
- What do the following commands do (and what could be their usage):
    - `git stash`
    - `git blame`
- What is project forking on GitHub?
- Check out popular licenses (this page makes it easier: https://choosealicense.com/)

> __Please set up the ones shown below, it will make your life easier__

- GitHub with verified commits using PGP (see [here](https://docs.github.com/en/github/authenticating-to-github/about-commit-signature-verification))

### Non-assessment

- Check out [GitHub CLI](https://github.com/cli/cli) and improve above workflow according to your preferences
- Check out [alias-tips](https://github.com/djui/alias-tips) (provided you have `zsh` shell!). We are humans after all, and this reminder will __drastically help you__ in actually using those aliases
- Check out [this article](https://chris.beams.io/posts/git-commit/) about writing good commit messages