## Extra topics: Code Versioning with Git


fisa (@fisadev, fisadev@gmail.com)

https://github.com/fisadev/python-basic-course

# Class 5 overview

Topic: git / code versioning

- Why do we need a versioning tool?
- Main features most versioning tools provide
- Why Git in particular
- Installing and initial config
- Creating a repo (init)
    - Repo vs working copy.
- Adding versions (commit)
    - Git commit, git status, git diff
- Reeviewing older versions (log, show, gitg, blame)
- Branches and merge
    - Conflicts
    - Tags
- Working with other repos (clone, pull, push)
- Having a central repo, Gitlab and alternatives
- Common workflows
- GitHub, impact on open source
    - Fork, pull requests

## Why using versioning tools?

"A time machine + parallel universes"

A system that allows you to answer these kind of questions:

- Who did this? Why? When? What other stuff did they modify at the same time?
- What changed between today and the previous week? And between version 1.0.3 and 1.2.4?
- When does this feature started failing? Which change broke it?
- What changes did this feature introduce?

And also allows for these things to happen:

- **Several people** working on the same project (even files!) without losing code or manually merging it
- Several **parallel versions**, experiments, features in progress, with ability to track, compare and merge them


## Warning

Versioning tools, and git specially, are **scary** at first sight. It will look complicated.

But believe me, it gets simple with practice, almost like riding a bike.

And it **is super useful**. Once you start, you can't go back to non-versioning.

Today is a must have. Not using versioning is like a big company not using any form of organized accounting system or books.

## How? Main features of most VSs

- Some way of taking **"snapshots"** of our code (commits)
- Some way of having a **"history"** of connected snapshots
- Some way of having **parallel** histories
- Some way of **merging** parallel histories
- Some way of **sharing** histories between computers

![](./versioning_general_view.svg)

## Why Git in particular

- It's super **widely used**, the most common tool. Lots of info and **integration** with other tools (editors, shells, continuous integration, etc).
- It's **distributed** (some others too)
    - No central repo required, but it's still usually done
    - No connection with a server required to work, connect only to share commits (faster, remote, etc)
    - Every repo is a full repo, one dies (even the "server"), nothing is lost
- It's great at **merging**, better than most

## Installing and initial config

1. Linux installation is super simple. Windows not so much, but easy enough. Here we are installing not just git, but also gitg, which is useful.

`sudo apt install git gitg`

2. Configure your user

`git config --global user.email "fisadev@gmail.com"`

`git config --global user.name "Juan Pedro Fisanotti"`


This is only done one time, when you install it. You can change it later, if you need to.

## Creating an empty repo

This is rarely done by hand, only to create a new repo from scratch in your machine (other options later on).

1. Go to the root folder of your project

`cd /home/fisa/devel/my_super_project`

2. Create your repo there

`git init`

This will create a `.git` hidden folder right there. **That folder** is your repo! (no servers running in the background, strange folders in weird places, etc. Just that folder)

## Repo vs working copy

- **Working copy**: the normal code files in your project
- **Repo**: the `.git` folder, which stores all the information of known versions

![](./repo_vs_working_copy.svg)

## Creating snapshots (commits)

(Always working inside your project folder)

1. Add the files you want to include in the new snapshot (code: yes, configs: yes, data: no, temp files: no, compiled files: no, etc)

`git add run.py settings.py submodule_x/web.py submodule_x/utils.py`
    
2. Create the commit, describing the changes you are commiting

`git commit -m "Created the repo with the first working version of my project"`

## Pending changes

Now do some changes to the files, and create some new files too.

You can check the status of your working copy with `git status`.

It shows:

- Files that have changes that aren't in the repo
- New files that aren't in the repo
- Files that are in the repo and no longer in the working copy

And with `git diff`, you get a detail of the changes!

## More on commits

You can keep adding and commiting files.

Each commit has:

- a **hash** id
- the message
- the author (from config)
- the date it was created
- the files at this version (the "snapshot" of your code)
- a reference to its parent commit (previous version from which it was done)
- and some extra metadata


## More on commits

Some useful tips:

- Commit often, don't do huge commits with lost of different things inside. Do small commits. That means more detailed history and messages :)
- You must add the files you want to commit every time (you add them to the staging area. It's like adding them to a "draft" commit, before committing).
- You can add just **some** files, no need to add everything
- Write meaningful messages, this is **super** important so the history is actually useful.


![](./initial_commits.svg)

## Reviewing older versions

To navigate and inspect older versions, you have three main tools:

- `git log`: shows the list of commits, from newer to older (showing hash, message, and some basic data)
- `git show HAS_OF_A_COMMIT`: shows the details of a specific commit (you can use the first 8 chars of the hash)
- graphical tools, like `gitg`, to easily see everything together

And to check for authors and versions of the lines of a file, you have one:

- `git blame A_FILE`

## Branches and merge

Up until now, there was a single main history, with it's last version. It has a name: `master`.

Master is a **branch**.

Branches are pointers to a specific commit, but a little more than that: git knows in which branch you are currently working in, and each time you create a commit, the **branch automatically moves** and points to that last commit you just created. Like this:

![](./single_branch_1.svg)

![](./single_branch_2.svg)

![](./single_branch_3.svg)

## Branches and merge

But you can create as many branches as you want, and move to work in any of them (remember: you usually are "working in" a specific branch).

Create and move to a new branch with this:

`git checkout -b feature_x master`

The first parameter (`feature_x`) is the name of the branch you want to create. The second (`master`) is the name of the branch from which your new branch will start.

That checkout results in this:

![](./new_branch_1.svg)

## Branches and merge

At this point, you are working on `feature_x`, it's the active branch.

If you commit, it will move, but master won't!

![](./new_branch_2.svg)

## Branches and merge

At this point, something worth noting: branches aren't like tree branches. A branch is just a pointer to a specific commit, not the entire "branch".

Worst name ever.

## Branches and merge

And you can move between branches all you want, with `git checkout`, like this:

`git checkout NAME_OF_THE_BRANCH`

Each time you move from branch to branch, something important happens: your **working copy** is updated with the files at the last version of that branch!

*Tip: if you have pending changes, sometimes it won't let you move until you commit them. It makes sense, and there are ways to jump over that, but the easiest one is to commit your changes.*

For example, lets do this:

`git checkout master`

And then do a couple of commits.

This is now our history:

![](./new_branch_3.svg)

## Branches and merge

Master is usually the "official" history, with stuff that is finished and works well.

Why create alternative histories?

- To avoid adding in master stuff that is broken, breaking the code for you or everyone else
- To work on different features at the same time
- To work in real stuff vs experiments that might get discarded
- Different people working at the same time in different stuff (more on that later)
- many more...

Branches are **cheap**. Want to start working on something? do it in a branch. Works? Great! Merge it to master (more on that in a second). Doesn't work? Just leave it there, it's not part of the "official" version, and anyone can see it if they wan't to help/continue/etc.

## Branches and merge

Ok, we have finished feature_x, we want those changes in master. We **merge** them:

1. Go to master

`git checkout master`

2. Merge the changes from feature_x:

`git merge feature_x`

Git will analyze master and feature_x, and port all the code changes that are in feature_x and aren't yet in master. This is powerfull magic. 

The resulting history will look like this:

![](./merge.svg)

## Branches and merge

Master has now:

- All the old stuff that was already in master
- All the new stuff that was changed in feature_x

It's super powerful! It even merges files which had changes in both branches, combining changes in the same final file.


... if there weren't any **CONFLICTS**

## Conflicts


Conflicts usually happen when two branches did changes to the same lines in the same files. Example:

"A commit in master, not present in feature_x, edited `x = y + 1` to `x = y + 2`"

"A commit in feature_x, not present in master, edited `x = y + 1` to `x = y + 1 + z`"

Both changes are valid, but conflict with each other. Git can't guess the correct end result: so it asks us to do it.

**Just changes to the same file WON'T conflict**. They must change the exact same line, in different ways.

## Conflicts

Conflict resolution is somewhat simple:

2. Edit the file, leave it fixed, and save it:


```
    ...
    x = y + 2 + z
    ...
```

This is how we want the file to be after the merge.

## Conflicts

Conflict resolution is somewhat simple:

3. Add the conflicted files, and commit:

`git add file_with_conflicts_1.py file_with_conflicts_2.py  ...`

`git commit -m "Merge with conflicts solved"`

This will finally create the merge commit, with the solved conflicts as we wanted them.

![](./merge.svg)

## Conflicts

Avoiding conflicts:

- Do small branches, merge as soon as possible. Remember: branches are cheap.
- Don't spend weeks without merging and sharing changes with others.
- They will still happen, but smaller conflicts are **way easier** to solve than 100s of lines of conflicts.

## Conflicts

Avoiding conflicts:

- Do small branches, merge as soon as possible. Remember: branches are cheap.
- Don't spend weeks without merging and sharing changes with others.
- They will still happen, but smaller conflicts are **way easier** to solve than 100s of lines of conflicts.

## Tags

Now something easier for a break :)

Tags are names for specific commits. Similar to branches, but they don't move at all.

Think of them as "human readable ids".

Usually used to tag releases ("1.0", "1.1.3", "2.0beta", ...)

## Tags

Add a tag to the current commit like this:

`git tag -a "1.0.3" -m "Release 1.0.3, best version ever"`

You can also add a tag to a past commit:

`git tag -a "1.0.0" HASH_OF_OLDER_COMMIT -m "Release 1.0.0 (ups, forgot to tag it last month)"`

Tags are shown in `gitg` and `git log`, but you can also list them with `git tag -n9`. And of course they can be deleted.


## Tags

And again, they don't move with commits. They will always point to the same commit.

![](./tags.svg)

## Working with other people (remote repos)

Up until now, all our commits and branches lived in our personal repo.

How do we share them with others?

Our repo can have a list of "remotes": repos in other computers, with which it can exchange commits and data.

And there are two main actions you can do with them:

- **Push** commits to them, that you have and they don't have
- **Pull** commits from them, that they have and you don't

There's also another action, that just fetches info from the remote repo, but doesn't apply any commits to our local branches: **Fetch**.

![](./push_pull.svg)

## Working with other people (remote repos)

You could add every other computer from the team as a remote, manually, with `git remote add NAME URL`, and coordinate push/pulls with them. But that's insane, not the usual way.

The usual way: **a server has a "central" repo**, and everyone adds that repo as the remote called `origin`. Then everyone push and pulls from and to that repo.

There's even a shortcut to start working from a remote repo: `git clone URL`, which creates a local repo, adds the remote repo as `origin`, and pulls all their commits.

## Working with other people (remote repos)

This way, when a new team member appears, they just do:

`git clone URL_OF_SERVER`

And done! They have a repo with all the commits to start working.

## Working with other people (remote repos)

And how do we push and pull: we push and pull from/to a local branch and from/to a branch in the remote repo.

For example: we pull changes from origin's master branch, to our local master branch, so we get the latest stable changes.

Or: we push changes from our local feature_x branch, to origin's feature_x remote branch, to share the progress of our feature with someone else.

`git pull REMOTE_NAME BRANCH_NAME`

`git push REMOTE_NAME BRANCH_NAME`

And finally, we can fetch remote data without applying it to any local branch, with:

`git fecth`

This is usually done so our local git knows which branches and other stuff the remote has, and we didn't know yet.

## Working with other people (remote repos)

What if we want to push feature_x, but the server doesn't have a branch called feature_x? Git will create it in the remote repo automatically when we do `git push origin feature_x`.


What if we want to have a branch feature_y that someone else pushed, but we don't have a local feature_y branch? Simple: update the remote data first (new command! `git fetch`), and then we checkout a branch with the same name. Git will know that it has to create a local branch that "matches" the remote branch:

`git fetch`

`git checkout feature_y`

## Working with other people (remote repos)

Example:

![](./remote_1.svg)

![](./remote_2.svg)

![](./remote_3.svg)

![](./remote_4.svg)

![](./remote_5.svg)

![](./remote_6.svg)

![](./remote_7.svg)

![](./remote_8.svg)

![](./remote_9.svg)

## Working with other people (remote repos)

Quite important tip there!: always do a `pull` before doing any changes to master, so you are sure you are working with the latest version of master.

![](./remote_10.svg)

![](./remote_11.svg)

## Working with other people (remote repos)

So, to summarize:

**To start working on something new**:

Create a branch from the latest version of master.

`git checkout master`

`git pull origin master`

`git checkout -b my_feature master`

**To share your not-ready changes so others can help/check/etc**:

Push your branch to the server.

(be sure to be in your branch)

`git push origin my_feature`

**To check a branch from someone else**:

Fetch remote data, and checkout their branch in your repo.

`git fetch`

`git checkout their_feature`

**To get changes someone pushed to your branch**:

Just pull from the remote.

(be sure to be in your branch)

`git pull origin my_feature`

**To finally merge your changes to master and immediatly share them to the remote master**:

Go to master, update it (pull). Then merge your changes, then share them (push).

`git checkout master`

`git pull origin master`

`git merge my_feature`

(solve conflicts if any)

`git push origin master`

## Having a central repo

So, we need a central repo in a server. How do we do that?

Currently, in my opinion, [GitLab](https://about.gitlab.com/) is by far the best option:

- It's open source
- It's easy and nice to use
- Fairly easy to install and maintain (better if using docker)
- Has all the important features and more (including great continuous integration

They provide online hosting for repos, free for open source and paid for companies. But you can just download and install your own GitLab in your servers for free, without any link to their servers. They just provide options if you don't want to host it yourself.

## Common workflows

I showed you the most basic version, but there are others. And you can define your own, but I suggest sticking to the known solutions.

The workflows mostly dictate what branch is used for what (example: master for stable versions) and how branches re created and merged (like using rebase, something we didn't see).

"Gitflow" is a widely spread one. Complex, but quite useful.

## GitHub, forks

Finally: what is GitHub?


"Just" a website that allows you to host your central repo there. Either free for public repos (usually open source software), or paying to have private repos.

But the website isn't open source itself, and it does not provide a version that you can run in your servers. It's a service, you push your repo (code) there, but it's hosted in their servers.

And it's also a **social network**. Most open source projects are there (millions!). Most developers know how to use it. Nowdays, a good GitHub account is even better than a CV*. It shows how you program, what tools you use and how, etc.

(for those who are lucky to be able to contribute to open source with time of their own of at their work. That can be unfair to those who don't)

## GitHub, forks

People fork a lot in GitHub... what's a Fork?

A fork is just a copy of a repo, but owned by a different person. For example: Django is owned by the Django Software Foundation (their own accounts, etc). But I have my own "fork", my full copy of Django's repo, in which I can do changes without asking anyone.

How does that help to contribute? Why isn't that caos?

People will use the official repo. And people will do the changes they want to propose in their own forks. When they think their changes are ready, they create a "Pull Request": basically, a ticket that asks the official repo to pull some specific changes from the fork, into the official repo.

If the official people accept the pull request, the changes are pulled into the official repo, for everyone to enjoy.

If the official people don't want your changes, they simply reject the pull request. You can still use your own fork with the changes made, and even share with other people if they want them. They just won't be in the "official" version.

## Slightly more advanced things to research:

- `git log` has filters, sorting, etc
- `git show` can also be used with tags, branches, and has expressions for stuff like "show the current commit - 3", etc
- `git add -p` can add only parts of changed files
- `git checkout` can be used to checkout any version (tag, hash, branch, etc)
- `git stash` can be helpful to store changes that you don't want to commit right now
- `git rebase` a very different way of doing merges
- hooks: a way of automatically running code on each commit, push, pull, etc