# Version Control Systems (but mostly _git_)

<div align="center">
  <img src="./img/git-platforms.png"></img>
</div>

### What is the motivation behind their creation?
Programmers need a platform, where they can keep their codes, organize projects and distribute and update source codes easily. Tracking of code changes and thus software versions, as well as the speed and transparency of all these operations are also desired features and essential aspects of software developmen. Based on the available functionalities, software that fulfill the latter criteria are referred to as _"version-control systems (VCS)"_ or _"source-control management (SCM) systems"_. Deploying these software on a remote server that is connected to the internet can be used for everything detailed above. If you're trying to write any code during your career, you'll definitely need to use a software like this in your life.

Development on systems like these are fairly straightforward. The main code base is stored on a server, where every developer of the project have access to it. Developers download this code base to their own machines and keep working on that on their own. Now there is a serious problem that should be addressed in any workflow like this (maybe it's not even limited solely to programming). We should always be aware of the fact that developers of a project are constantly changing the same code base in parallel. These "branching" or parallel development states are called as _development branches_ or _forks_ of a software. Developers working on different branches are modifying the same parts of the code a lot of times or developing parts in parallel that are incompatible with each other.

To somehow address these arising _code conflicts_ or _merge conflicts_, you need a system that can handle or _makes it possible to handle_ all these different error sources that are emerge from multiple people working on the same thing simultaneously. These "systems" or software that are capable of doing so are the version control systems.
<div align="center">
  <img src="./img/branches.png" style="height: 560px;"></img>
</div>
<div align="center">
  <font size="2">Visualization of the development branches created by a single developer in a project.<br>Source of image: <a href="https://medium.com/@kamil.mowinski/trunk-based-development-f0366e838890">https://medium.com/@kamil.mowinski/trunk-based-development-f0366e838890</a></font>
</div>


### Comically short history of _git_
Starting in 1998 until 2005, the development of Linux took place on a version control system called [BitKeeper](https://www.bitkeeper.org/). In 2005 BitKeeper decided to stop providing free versions of its client, forcing its users to either charge for a BitKeeper license or switch to another version-control system. In response the development of _git_, a new system was commenced by Linus Torvalds, the creator of Linux. His goal was to create a version-control system that suits all the needs of Linux development and provide features that do not exist in freely available systems. Oh and of course, it had to be faster, than any available version-control systems that time. Two weeks after he started the project, the very first version made it's public debut. Since then, _git_ became the largest and most popular version control system of our time.

<div align="center">
  <img src="./img/git-popularity2.png" style="height: 700px;"></img>
</div>
<div align="center">
  <font size="2">Popularity of <i>git</i> vs other once popular VCS between 2004-2023.<br>Source of image: <a href="https://trends.google.com/trends/explore?date=all&q=%2Fm%2F05vqwg,%2Fm%2F012ct9,%2Fm%2F02rvgkm,%2Fm%2F09d6g,%2Fm%2F08441_">Google Trends</a></font>
</div>

### GitHub, GitLab etc.
Free of charge online hosting providers for code development, implementing _git_'s and their own functionalities are essential in modern programming. On sites like these, users can register for free and access codes through the websites themselves or via a command line interface (CLI). These websites usually provide additional features and storage for paying users, but the most important features are (again, usually) available for free. Since GitHub is the most popular online platform for this, this guide will focus on that in particular.

The motivation behind teaching about _git_ and GitHub is that these two are deeply intertwine the modern programming community, just as eg. StackOverflow does. If you're working as a "programmer" or more like doing any kind of work that is related to programming, you're bound to use _git_ and GitHub in most part of your career. The overwhelming majority of important (open source) software development takes place on GitHub and sometimes on other websites that you can interact with using _git_ (eg. GitLab, Bitbucket etc.). If someone wants to work on any project like this, they obviously have to use _git_ and GitHub. But companies are also using mostly _git_ in their private networks to manage their codebase. GitHub is also great to serve as a _portfolio_ for someone who wants to make a career with their programming skills. It's a perfect platform, where people can showcase their (hopefully better) codes tto the world. If you're working on some code from more, than 1 computer, then _git_ and GitHub is probably the easiest method to exchange code updates between the two computers. Even if someone works on a single computer, GitHub could serve as a completely free and virtually unlimited backup storage for their projects.

---

# Setup _git_ and GitHub

## I.  Creating and setting up a GitHub profile

1. Go to the [GitHub website](https://github.com/).
2. Sign up a profile with a non-cringe username (please) and with an existing email address.
3. Activate your email address etc...
4. That's it, you're done!

## II.  Installing _git_
While there are some GUI software for _git_ on Windows, it's a command line utility at its core. That's why I'm using _git_ this way here. Here's a short guide how to install _git_ for Windows, MacOS or Linux.

1. Install the _git_ software
    1. In the case of Windows, we usually install _Git bash_, a terminal emulator for Windows that is designed for the use of _git_  
        >Install Git Bash from the [git downloads page](https://git-scm.com/downloads).
    2. On Linux, MacOS or on Windows Subsystem for Linux (WSL) we only need to install the _git_ software package using an available package manager
        >```bash
        $ sudo [apt|pacman|yum|...] install git
        ```

2. Setup Git Bash with your "credentials". These are usually your username and/or your email address. In case of a closed project (eg. at a company), where the codes are stored on a private server, your "credentials" is usually just your username on that specific server. In case you're using _git_ to manage codes stored on GitHub/GitLab/etc. these are your username and email address that you registered with on these sites. Registering these "credentials" on a local device can be done by modifying the configuration of your _git_ installation from Git Bash (in case of Windows) or simply from the terminal (in case of Linux or WSL):
```bash
$ git config --global user.name  "github-username"
$ git config --global user.email "email@of-your-github.com"
```
3. _(Optional)_ Other configuration options and first time installation tips can be found on the [git website](https://git-scm.com/book/en/v2/Getting-Started-First-Time-Git-Setup).

4. **Setup your SSH connection and config.** Starting from 2020, GitHub is using SSH keys to authenticate users. This is a more secure way of authentication than HTTPS. If you're using _git_ for the first time, you probably don't have any SSH keys set up yet. But don't worry, we'll get to that later.

## III.  Setting up an SSH key for GitHub
### III.1.  Security over the internet
When you're connecting to a website over the internet, you're actually connecting to a _physical (web)server_, which stores the data of the website that is shown on your computer screen. Most parts of this process are handled by your browser. There are lot of cases however, when you need to connect to a server from a command line/terminal/shell. (Just an fyi: these three terms mean [completely different](https://www.geeksforgeeks.org/difference-between-terminal-console-shell-and-command-line/) things, however they are often used interchangeably.) In these cases, you need to use a _secure protocol_ to connect to the server.

Nowadays, when you're connecting to a website from your browser, you're doing it in a secure way via the so-called _Hypertext Transfer Protocol Secure_ (HTTPS). Just like that, you can connect to servers from your command line/terminal/shell via eg. the _Secure Shell Protocol_ (SSH). These _secure protocols_ ensures that the any information exchanged between the server and the client (you) is encrypted. Of course, there were times when secure connection did not exist, neither in browsers nor command line interfaces. Now it's an essential standard if you're using computers in any way.

### III.2.  SSH and assymetric key-pairs
Similarly to every "protocol" mentioned regarding computer science and technology (just like HTTPS above), SSH is also just a "theoretical model" or "design", which then should be implemented first by an actual software that can run on a computer. The most widely used implementation of SSH is the software called _OpenSSH_. Nowadays OpenSSH is not a single software, but rather a collection of software merged into a so-called _software suit_.

The goal of SSH is to provide its users to securely authenticate themselves, when they're trying to connect to a server _somehow_. However it's mostly used when a user wants to connect to a server via the command line. Its software implementation, OpenSSH provides several [authentication methods](https://www.golinuxcloud.com/openssh-authentication-methods-sshd-config/#OpenSSH_Authentication_Methods) (6 in total), like using a password, or an asymmetric key-pair. In the majority of the use-cases, key-pairs are the preferred authentication method. Besides giving us the comfort of not needing to type our passwords and still connecting securely to a machine<a href=#ft1>$^{[1]}$</a>, using asymmetric key-pairs are also considered to be a much more secure authentication method, than passwords.

In OpenSSH, this key-pair authentication method is based on the _asymmetric cryptography_ or _public-key cryptography_. This means that the authentication method uses a pair of keys: a _public key_, and a _private key_. The former is stored on the server's side, while the latter is on the client's (your) side. The purpose of the private key (held by you) is to authenticate yourself to the server (it's basically like showing your ID card). While on the other side the public key (on the server) is used to check whether your private key (your digital ID card) is valid or not. Of the story is much more complex than that. The _asymmetric cryptography_ itself refers to the practice of encrypting and decrypting messages using a unique, private-public key-pair, but that's a completely different topic for a completely different lecture.

<div align="center">
  <img src="./img/ssh-key-pair.png" style="width: 600px;"></img>
</div>
<div align="center">
  <font size="2">Asymmetric key-pair files (both public and private key) in the <code>.ssh</code> directory on my computer.</font>
</div>
<br>

Asymmetric cryptography is designed in a way that as long as you're keeping your private key a secret (so read-only by you and only you), no potential impersonator can pretend to be you and then gaslight the server into this false idea. Since both the public key and the private key are simple text files stored on a computer, it is possible for someone to actually stole your private key (eg. copy the file after breaking into your account). To abridge this problem, the SSH protocol offers a second layer of protection by adding an optional _passphrase_ (a password essentially) to your asymmetric key pair that you need to type every time, when you try to connect to a server and authenticate yourself using the private key. Even if someone grab hold of your private key file, it's still useless without the (hopefully secure) passphrase.

OpenSSH also provides some quality of life tools that helps you connect to a server even more easily, without any security trade-off. We'll talk about this a bit later too.

#### Footnotes
<font size="2"><p id=ft1>$[1]$: In the case of OpenSSH, this is because you need to choose only a single authentication method out of the 6 available one. If you go with key-pairs, you don't need to use a second authentication method additionally, like typing your password.</p></font>

#### Notes to clear up any confusion
- SSH : A computer protocol. It's a "design" of connecting computers securely over an otherwise unsecured network.
- OpenSSH : The most popular software implementation of the SSH protocol, that's why I'm speaking about only this in this tutorial. Nowadays this is a collection of software under the name "OpenSSH" providing lots of other features too besides the actual implementation of SSH.
- Asymmetric cryptography/public-key cryptography: A cryptographic method/system, which is widely used in computer authentication and which is based on using "asymmetric key-pairs" (also called as "public and private key-pairs"). It's the cryptographic method that OpenSSH uses as its authentication method.

### III.3.  Creating an SSH key-pair to connect your GitHub and a single device
All steps detailed below are similar to Windows, Linux and MacOS. While Windows-users need to execute the commands below in Git Bash, Linux and MacOS users can use their regular command line interface (or "terminal" to sound less ridiculous). Also since Windows 10, you can use the [Windows Subsystem for Linux](https://docs.microsoft.com/en-us/windows/wsl/install-win10) (WSL) to run Linux commands in a Linux environment on your Windows machine. This is a great way to get started with Linux, if you're not familiar with it yet and don't want to install a virtual machine or dual-boot your computer with Linux.

#### III.3.1.  Add an SSH key-pair to GitHub **(tl;dr)**
GitHub has a self-explanatory and easy to follow tutorial about how to add an SSH key to your account. The procedure consists of 2 (+1 optional) step:
1. Generating an SSH key-pair with your GitHub credentials for your device and register your private key in your OpenSSH setup ([Link to tutorial](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent))
2. Add the public key to your GitHub account and test your key-pair. ([Link to tutorial on adding an SSH key to your GitHub account](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/adding-a-new-ssh-key-to-your-github-account) and [Link to tutorial on how to test your SSH key](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/testing-your-ssh-connection).)
3. _(Optional)_ Setup your SSH config. (Configuration tutorial is presented below.)  
    Note: While you can add as many public keys to your GitHub account as many you'd like, you still have to generate a new key-pair for every device you want to use GitHub with via an SSH connection. _(Theoretically you can use the same key-pair for multiple devices, but that's not a road you want to go down ever, trust me. Even if you'd make it work, you'd be more of a security liability than anything else.)_

#### III.3.2.  Add an SSH key-pair to GitHub **(verbose)**
A detailed and verbose tutorial is presented here just for the sake of clarity.
##### **1.  Generating an SSH key-pair**
1. Open Git Bash on Windows or any terminal on Linux.
2. Create a `.ssh` directory in your `~` (home) directory, give the necessary permissions to it and `cd` into it. If you're on Windows, the `~` home directory is not the same as the `C:\Users\your-username` directory. The `~` home directory is the directory where you're (usually) placed at, when you start up your Git Bash. Even if that's not the case, you can just `cd` there using `cd ~`. But that's not even necessary for any of the command here if you them as-is.
> ```bash
> $ mkdir -p ~/.ssh && chmod 700 ~/.ssh && cd ~/.ssh
> ```

(Just like the `.ssh` directory, the public and private keys, the SSH `config` file and other related files and folders need to be given the appropriate permissions for them to work as intended. A comprehensive list can be found in [this topic](https://superuser.com/questions/215504/permissions-on-private-key-in-ssh-folder). Here `700` stands for "can be read, written and executed by only you")

3. Use the `ssh-keygen` software from OpenSSH to generate the key-pair. Several different cryptographic algorithms (like RSA or EdDSA etc.) can be used for key-pair generation, and most of them are implemented in OpenSSH. It's maybe not obvious, but some of the algorithms are more and some of them are less secure than the others. Generally it's advised to always use the safest available algorithm for key-generation. As of March, 2023, the best practice is to use the so-called `Ed25519` digital signature scheme.

    Now, how to use `ssh-keygen` to generate a key-pair? Besides telling `ssh-keygen` via a command line argument, which algorithm you want to use, you also need to pass your GitHub email address that you've used to register with. The command should look like this, where you obviously substitute the dummy email address with a real one linked to GitHub account:
> ```bash
> $ ssh-keygen -t ed25519 -C "your@email.com"
> ```

4. After hitting enter, `ssh-keygen` will ask you 3 questions. The first one is the following:
    1. **Enter a filename to save the key in** (If the prompt is left empty, the default value is `~/.ssh/id_{name-of-the-algorithm}`)

        This question asks you, where you want to save the key-pair and what do you want to name the created $2$ (public and private key) files. On a technical note, here you're actually asked to give an absolute OR a relative path to the files you want to create. Since we want to create them in the `.ssh` directory (and because we're `cd`-ed into it already in the previous step), both `~/.ssh/file-name` and `file-name` are correct answers here. (In the first case obviously the key-pair is generated with the file name `file-name` under the `~/.ssh/` directory, while in the second case, they're generated inside the current working directory, the directory you're currently `cd`-ed into (so, again, in `~/.ssh/`, since we're there currently).)
   
        It is a good practice though to name your key-pairs accordingly to what you want to use them for (which is to use them on GitHub in this case). For this first question, you can simply type something like `~/.ssh/id_github` or just `id_github`.
    2. **Enter a passphrase!**
    3. **Enter this passphrase again!**
    
   As it was already discussed, a passphrase is used to give an additional layer of protection to your key-pair (it's a password for the key-pair basically). Whether you want to give a passphrase to your key-pair or not is up to you. If you don't, just press enter two times to skip both questions. If you want to specify a passphrase, then keep in mind, that you have to type it every time, when you want to use your key-pair. For example, when you want to clone a repository from GitHub, you have to type in the passphrase every time you want to do so. This is a bit annoying, but it's a good practice to use a passphrase for your key-pair, since it makes it harder for an attacker to use your key-pair to access your GitHub account.

5. After you completed the previous steps, two messages about the successful creation of the `id_github` private key and the `id_github.pub` public key files will be displayed. Below that, the public key's SHA256 fingerprint and its randomart will be also printed. The fingerprint is a character chain randomly generated with the SHA256 hashing algorithm, using the public key as an input and it's used by the computer to validate an asymmetric key-pair more easily. Similarly, the randomart is a randomly generated ASCII art image that servers the same purpose, but for humans! A human can much more easily recognize a randomart image than a long character chain and can tell if it's the same as the one generated for the key-pair.

   After you've finished with this setup, your terminal should look something like this:
<div align="center">
  <img src="./img/ssh-keygen.png"></img>
</div>

6. Register your key-pair in OpenSSH for your computer to be aware of its existence and for it to know on which server the key-pair has to be used. This can be done, by starting another software called `ssh-agent` **in the background**. It's slightly more tricky than starting it in the terminal itself. You have to use the following command for this:
> ```bash
> $ eval "$(ssh-agent -s)"
> ```

7. To assure that the generated public and private key-pair have the correct permissions, set them manually now:
> ```bash
> $ chmod 600 ~/.ssh/id_github && chmod 644 ~/.ssh/id_github.pub
> ```

(Here `600` stands for "can be read and written only by you", while `644` stands for "can be read and written by only you and can be only read by anyone else")

##### **2.  Adding your public key to your GitHub account**
After you've registered to GitHub, activated your email address and logged in, you have total control over your GitHub account. Now you can add the public key generated in the steps above as follows:

1. Visit [github.com](https://github.com/) and log in if you did not already.
2. Click on your profile image in the upper right corner of your homepage and go to **Settings**. 
<div align="center">
  <img src="./img/github-ssh-1.png" style="height: 360px;"></img>
</div>
3. On the left hand side of the page, click the **SSH and GPG keys** option.
<div align="center">
  <img src="./img/github-ssh-2.png" style="height: 360px;"></img>
</div>
4. Click on the big, green **New SSH key** button (you can't miss it).
<div align="center">
  <img src="./img/github-ssh-3.png" style="height: 180;"></img>
</div>
5. Copy and paste the contents of the `~/.ssh/id_github.pub` file into the large text box. You can give an arbitrary name to it to identify the key easily.
<div align="center">
  <img src="./img/github-ssh-4.png" style="height: 360px;"></img>
</div>
6. Click on the green **Add SSH key** button and you're done!
7. Test your connection to GitHub **by opening a new Git Bash/terminal** and by trying to `ssh` over GitHub. For the very first time connecting to any server, OpenSSH will always ask, whether you trust this connection or not. In case of this test, you should type `yes` and press enter. If everything goes well, then you'll see an information message starting with `Hi (USERNAME)!`. This indicates, that your SSH connection is well established between your device and GitHub and you're good to go.
> ```bash
> $ ssh -T git@github.com
> #The authenticity of host 'github.com (IP ADDRESS)' can't be established.
> #RSA key fingerprint is SHA256:(PUBLIC KEY FINGERPRING).
> #Are you sure you want to continue connecting (yes/no)? YES
> #Hi (USERNAME)! You've successfully authenticated, but GitHub does not
> #provide shell access.
> ```

   Just some clarification for this command above. If you look it up, you'll find that the `-T` flag of `ssh` means that "it disables pseudo-tty allocation". Okay, what the hell does it mean? It consists of two parts: 
   - First of all, "TTY" simply means "terminal" and you can come across this acronym in many places if you're working with computers and command lines. It originates from the word _teletypewriter_, the "father" of the physical computer terminals and the "grandfather" of terminal emulators, so any "terminal" you run on your computer. Some well-known examples for the latter is eg. _PuTTY_ on Windows or _Terminal_, which was called as _Windows Terminal_ previously, but was renamed in February, 2022. _Terminal_ should also not be confused with _cmd.exe_ (the default command-line interpreter of Windows) or _PowerShell_.
   - Second, the description of the `-T` flag in "normal language" means that this disables sending a terminal start-up request to the remote machine (which is otherwise sent by default). It's a very common and important practice that when you're doing an SSH connection test, you're always doing it so by passing this `-T` flag to the `ssh` command. The reason for this is that large majority of remote servers that people use with `ssh` for whatever reason are rejecting access to a remote terminal. If that's the case, then any `ssh` test without specifying the `-T` flag will be unsuccessful, the remote server will reject our `ssh` request and we'll be confused why our perfect `ssh` setup did not work.
   
   In case of GitHub this is actually unnecessary, because while it actually tells us that "...but GitHub does not provide shell access.", it's still configured so to handle commands of careless users appropriately. Still, following good practices are always **very much advised**, because life won't be so kind to us when it comes to other servers.

#### III.3.3. _(Optional)_ Configure and test your SSH settings

##### **1.  A short introduction**
Something that I would loved to be aware of when I started using SSH is the SSH configuration file. It has two features that are essential quality of life services for any SSH enjoyer:
1. If you're using numerous servers (just like people who work with computers do it every day), it can be tiresome to type full-length SSH commands every time you're trying to connect so a server. This can be shortened to single aliases using a correctly set up SSH `config` file.
2. The  `ssh-agent` software that registers and handles your key-pairs is only activated, when you first run an `ssh` command after a reboot. This means that it needs some encouragement every time you turn on your computer for it to work properly. To diminish the inconvenience of adding keys to the `ssh-agent` every time you're restarting your device, you can define keys in the `config` file and configure them to be automatically added to the agent, no matter what.
<div align="center">
  <img src="./img/ssh-config.png" style="height: 360px;"></img>
</div>

The `config` file can be created (`touch`) under the `~/.ssh/` directory and then given the necessary permissions (`chmod 600`), simply by
> ```bash
> $ touch ~/.ssh/config && chmod 600 ~/.ssh/config
> ```

This file then can be edited with any text editor. The most important feature of this file is that you can collect every credential of a server that are otherwise should be typed explicitly into the command line every time, when you want to connect to the server. Instead of typing all these info, one can define an _alias_ for any _host_ ("server" in other words) inside the `config` file. When you wants to connect to a server now, you only have to specify the alias. Everything else is handled by OpenSSH.

The `config` file has a straightforward syntax:
```
Host alias_1
  Option_1 value_1
  Option_2 value_2
  ...
  
Host alias_2
  Option_1 value_1
  Option_2 value_2
  ...
.
.
.
```

The necessary `Options` that should be specified for any host are the `HostName` (IP or domain name of the server), `User` (your username on the server) and `IdentityFile` (path to the corresponding private key file). A comprehensive list with detailed descriptions of the possible SSH config options can be found on the [official docs page](https://man.openbsd.org/ssh_config).


##### **2.  An example for an arbitrary server**
Let's consider the case that you're trying to connect to the one of the main servers ("head node") at the Wigner Research Centre for Physics. The server-side SSH software is listening to the port $2222$, so you have so besides your credentials and the IP, you have to specify that too. The full command to access an account would be like this:

> ```bash
> $ ssh -p 2222 username@opteron.gpu.wigner.mta.hu
> ```

That's not really convenient to type it 54 times per day. Let's create an entry for that in the `config` file and even attach a private key to it that was previously configured for this server and my device. The entry would look like this:

```
Host wig-whatever
  HostName opteron.gpu.wigner.mta.hu
  Port 2222
  User username-goes-here
  IdentityFile ~/.ssh/wigner-example
```
The first line specifies the arbitrary alias (`wig-whatever`) assigned to a specific _host_ machine. The other entries below that specify the credentials and configurations regarding _how to connect to the host machine_. Now if I want to connect to the server, all I need to type is the following:

> ```bash
> $ ssh wig-whatever
> ```

Much easier, right? The same can be set up for GitHub to ensure that your SSH key-pair is working under any use-case. Since the communication method is for some reason hard-coded in case of _git_ and GitHub, the whole entry should exactly look like this:

```
Host github.com
  HostName github.com
  User git
  PreferredAuthentications publickey
  IdentityFile ~/.ssh/whatever-you-call-you-github-private-key
```

While both the alias and the `HostName` should be specified as `github.com`, you can still name your private key as you would like.

# Using _git_ and GitHub

## I. Basic structure of GitHub
GitHub is consisted of so-called _code repositories_. A code repository (or simply _repository_ or _repo_ for short) is like a "folder" and any user can create an arbitrary number of them. These "folders" or repos are used to organize and isolate individual projects or cohesive lists of files from each other. Repositories can be set either public or private:
- "**Public repo**" means that anyone can download the contents of the repository and see what's inside it.
- "**Private repo**" means that only users who can see the repository are those selected by you.
<div align="center">
  <img src="./img/github-repos-1.png" style="height: 400px;"></img>
</div>

Inside a repo you can find the files itself in it, as well as some information about the repository and the code base (eg. some _readme_ and long description at the bottom of the page, a short description, list of contributors, the proportion of programming languages used in the project and others). 
<div align="center">
  <img src="./img/github-repos-2.png" style="height: 550px;"></img>
</div>

## II. Interacting with GitHub using _git_

The majority of interactions of users with their GitHub repositories are limited to a handful of the most basic _git_ commands. This means that learning how to use _git_ and GitHub takes approximately 5-10 minutes for a complete beginner. Substantially more to also get comfortable with them, but it's just a matter of practice.

The most important interactions of a user with GitHub are the following:
1. **Creating a new repository**.
2. **Downloading a repository** to a _local machine_ (i.e. your computer).
3. **Updating the repository on GitHub**: Uploading the new or modified files to the repository **on GitHub**.
4. **Updating the repository on the local machine**: Downloading new or modified files that exists on GitHub, but not on the local machine.

Other common interactions with a repository:

5. Restoring a previous version of the repo.
6. Exploring differences (due to modification) between the code base in the online repository and on the local machine.
7. Creating and working on different so-called _branches_ of the same code base (won't be detailed in this lecture note)

Every _git_ command starts with `git`, followed by a "subcommand", which specifies what the command will do. Eg. `git clone` downloads (or clones) a repository to your machine (or to a new location/directory). While `git pull` downloads the changes from an online repository to your already existing local one. More on the important commands in the next sections.

### 1. Creating a new repository on GitHub
There are multiple options on how to create a new repository using _git_, but since we're using GitHub and not a private server, probably the easiest way is to leave GitHub create and set up it for us. Eg. on your homepage you can click the "+" sign in the upper right corner of the page and then click on the **New repository** option:
<div align="center">
  <img src="./img/github-new-repo-1.png" style="height: 280px;"></img>
</div>

This will open a new page, where you can configure all basic settings of your new repository. It can be discussed in two parts just to help the clarity. The first part consists of the quite obvious settings. Here you can give a unique name to your repository and optionally give a very short description to it that will be shown on the right hand side of the page if people are opening your repo on GitHub. Here you can also set the visibility of your repository.

GitHub gives an idea, how repository names on GitHub looks like by convention (only small letters and words are separated with an `-` symbol). Here I've used their tip and also set the visibility of this repository to private:
<div align="center">
  <img src="./img/github-new-repo-2.png" style="height: 350px;"></img>
</div>

The second part consists of the non-trivial settings and options. The page prompts you whether you want to initialize this new repository with a README and/or a _gitignore_ and/or a license file? If you do not select any of these and press the green "Create repository" button, GitHub will prompt you with a new page. On this page GitHub explains that it is advised that every repository is created with all of these above and shows you a tutorial on how to do it right now automatically or from a command line.

Okay, what are the purpose of these files and why do we need them at all?
- `README.md` : This file is the primary documentation of the repository. Here you can summarize what are your codes all about, how to use them etc., anything you'd like to tell someone about your codes in particular. Some good examples for serious project READMEs can be found eg. [here](https://github.com/deepmind/alphafold/) or [here](https://github.com/tensorflow/tensorflow). **A repository should be always initialized with a README file**. So at least this checkbox should be always ticked.
- `.gitignore` : _(Optional, but recommended)_ Tells _git_ what files or folders to ignore inside the repository. You can specify both file names and file extensions here with a very basic syntax. Every file that is created locally on a machine, but specified in the `.gitignore` will not be uploaded to the online, GitHub repository, when the user tells _git_ that "okay, refresh and update the online repository with my changes and modifications". It's useful to ignore temporary or cache files, or large data files. **The best practice is to upload only those files that are necessary for the project and are not automatically generated.** If you're working with eg. Jupyter Notebooks, C/C++ or TeX/LaTeX, **unnecessary files will be generated in every case**. You don't want to see them in you repository, so it's advised to ignore them using `.gitignore`. To lend us a helping hand, on GitHub there are lots of pre-built `.gitignore` files that you can select during repository creation from a drop-down menu.
- `LICENSE` : _(Optional, but recommended)_ A specific digital license can be chosen for any project and automatically generated with your credentials for that specific repo. If you just collect your homework to a repo it doesn't matter, but if you're developing something more serious (even during your studies), then it's a nice to have. Usually for smaller projects the MIT license is recommended that you can select during repository creation on GitHub from a drop-down menu.

Here in this screenshot I've initialized the new repository with a README file, added a pre-built `.gitignore` for TeX/LaTeX files and added GNUv3 license just for the sake of example:
<div align="center">
  <img src="./img/github-new-repo-3.png" style="height: 350px;"></img>
</div>

If everything was successfully configured in the previous screen and you press the "Create repository" button, GitHub will redirect you to your new repository, where you should see something like this:
<div align="center">
  <img src="./img/github-new-repo-4.png" style="height: 400px;"></img>
</div>

([Succotash](https://en.wikipedia.org/wiki/Succotash) is apparently a dish of North African origin. It's main ingredients are sweet corn and beans. Thank you GitHub for the fantastic name recommendation, very cool, very swag, very poggers, I like it.)

### 2. Downloading a repository to a local machine
You can download (or "clone") any public repository from GitHub, GitLab, Bitbucket etc. with the `git clone` command. All these storage provider websites use an almost identical layout for repositories, so I'll showcase the "cloning" process using only GitHub.

If you open a public repository or (private repo that you have access to) in your browser, then above the box that shows the list of files in the repository, you'll see multiple buttons in the upper right corner. By clicking on the "Code" button, a pop-up will come up and list your options on how can you download the contents of this repository:
<div align="center">
  <img src="./img/git-clone-1.png" style="height: 400px;"></img>
</div>

You want to choose one of those options, where the command line is used for this task (so HTTPS, SSH or GitHub CLI). Other providers usually have only HTTPS and SSH options, but GitHub also has "GitHub CLI", which is very similar to `git`, but it's specifically designed for GitHub. If you're interested in it, you can read more about it [here](https://cli.github.com/).

For now we'll use the SSH option for 2 reasons:
1. We already established a working SSH connection between GitHub and our machine in the steps above. This is also much safer protocol than HTTPS and it's also easier to use.
2. As it was already mentioned, GitHub stopped supporting HTTPS authentication back in 2020. So if you want to use HTTPS, you have to generate a personal access token (PAT) and use that instead of your password. This is a bit more complicated and much more inconvenient than SSH, so I won't cover it here. 

Copy and pasting the path to the repository after a `git clone` command will create a folder with the same name as the repo itself in your current working directory and download the contents of the repository into that folder:
<div align="center">
  <img src="./img/git-clone-2.png" style="height: 400px;"></img>
</div>

(I'm keeping all the local versions of my GitHub repositories inside a folder name `GitHub` that resides in my home directory, that's why I've `cd`-ed into it.) Now you're ready to start working on the code base locally on your machine!

### 3-4. The Four Horseman of _git_: `pull`, `add`, `commit`, `push`

#### The basics
As it was already mentioned, the majority of _git_ commands that usually executed by developers are the most basic ones. To understand it better what basic _git_ commands do, one has to understand the inner structure of the "stages" of _git_. The image below shows these $3+1$ stages with the corresponding commands that can be used to move back-and-forth between them.
<div align="center">
  <img src="./img/git-stages-bg.png" style="height: 400px;"></img>
</div>
<div align="center">
  <font size="2">The "stages" in <i>git</i> and some related <i>git</i> commands.<br>Source of image: <a href="https://medium.com/@nmpegetis/git-how-to-start-code-changes-commit-and-push-changes-when-working-in-a-team-dbc6da3cd34c">https://medium.com/@nmpegetis/git-how-to-start-code-changes-commit-and-push-changes-when-working-in-a-team-dbc6da3cd34c</a></font>
</div>
<br>

The four most important commands of _git_ helps you to navigate between the four stages of _git_. Both the meaning of these stages and the usage of commands can be understood by speaking about both of them at the same time:
1. `git pull` **(Updating local)** : Downloads and applies all updates (file changes) from an existing online repository to a local clone of the same repo. (At least this is the default behaviour, also referred to as _fast-forward_.)
2. `git add` **(Tracking local changes)** : Adds files to the "staging" area (or simply "stage files"). The staging area serves the purpose of a "checkpoint". It make it possible to track file changes without any irreversible consequences. Files added to the staging area can be restored to their original state if any modification happens to them after they were staged.
3. `git commit` **(Creating snapshot)** : Create a permanent and finalized snapshot of the modified files in the local repository. _Git_ works on a snapshot basis. "Snapshots" are those states of a project that are saved and kept in the repository history. During development you can go back-and-forth between these snapshots to revert the code base back to some previous state. 
4. `git push` **(Updating remote)** : Updates the online repository with the snapshots created in the local repository.

#### Some necessary notes on the Four Horsemen
While `git pull` and `git push` works well on their own by default, `git add` and `git commit` does not. Both of them have many optional, but some necessary flags and arguments that need to be specified in every case (only listing the necessary ones here):
- `git add [<pathspec>...]` : You have to explicitly define which files to add to the staging area. In simple projects you're good to go with the command
> ```bash
> $ git add .
> ```

   which tells _git_ to add **every modified or new file** to the staging area **from the current working directory and every subdirectories below that**. (This mean that you have to execute this command from the main directory of the project to really add all modified files in the whole repository to the staging area.) Of course, sometimes you only want to add specific files to the staging area, not all of them. In that case, **always use `git add` very carefully!**
- `git commit -m "<msg>"` : You have to add a _commit message_ enclosed in `""` apostrophes after the `-m` flag, when using `git commit`. The purpose of this message is to summarize in a **compact** (in 5-6 words total) and **meaningful** way the changes in the commited snapshot, compared to the previous one. A short and meaningful commit would look something like this:
> ```bash
> $ git commit -m "Mark ChatRender#render as ApiStatus.Override"
> ```

   or
> ```bash
> $ git commit -m "Deployed unit tests for cgr.RNG module"
> ```

   or
> ```bash
> $ git commit -m "uploaded 2nd homework and presentation"
> ```

   The emphasis is on the word _meaningful_. Of course, no one has the energy to write perfect commit messages to each of their commits throughout their whole career. But you still have to try, because it makes your repository looking professional also it makes it easier for you and anyone else to get a grasp of the workflow during the improvement of a code base without the need to look at the actual code changes. As a bad example, here are some commit messages that are not really helpful in any way:
<div align="center">
  <img src="./img/commit-m-1.png" style="width: 1000px;"></img>
</div>
<div align="center">
  <img src="./img/commit-m-2.png" style="width: 1000px;"></img>
</div>
<div align="center">
  <img src="./img/commit-m-3.png" style="width: 1000px;"></img>
</div>

### 5-6. Other helpful commands and tricks

#### Tracking file changes made in your local repository
- Using `git status`:
> ```bash
> $ git status
> ```

   This command displays the "status" of the repository, which means it will show you which files reside in the staging area right now or which files are waiting to be pushed to the online repo.

- Using `git diff`:
    - Basic aesthetics:
    > ```bash
    > $ git diff [--stat]
    > ```

      The command `git diff` displays the exact changes to every line in the repository:
      <div align="center">
        <img src="./img/git-diff-1.png" style="width: 1000px;"></img>
      </div>

      adding the `--stat` flag makes it to only display the names of changed files and the changed number of rows in every those files:
      <div align="center">
        <img src="./img/git-diff-2.png" style="width: 1000px;"></img>
      </div>
    - Check modifications between workspace and staging area:
    > ```bash
    > $ git diff
    > ```

    - Check modifications between workspace and local repo:
    > ```bash
    > $ git diff HEAD
    > ```
    
    - Check modifications between staging area and local repo:
    > ```bash
    > $ git diff --staged
    > ```

#### Tracking file changes made in the online repository

- Check changes in an online repository without overwriting local files with `git pull`:
> ```bash
> $ git fetch && git diff HEAD
> ```

   The command `git fetch` downloads the metadata about the snapshots pushed to the online repository, but without downloading any actual files/snapshots. Along with `git diff HEAD` this can be used to check exact differences in an active, online repository without overwriting any local files on accident.

- List snapshots in a repository in a nice way
> ```bash
> $ git log --pretty=oneline --graph --decorate --all
> ```

   Snapshots are checkpoints of the development process you can revert the state of your code base to.`git log` has the capability to display all commits and commit messages in a transparent way.
<div align="center">
  <img src="./img/git-log.png" style="width: 1000px;"></img>
</div>

#### Reverting accidental commands
A huge benefit of _git_ and GitHub is that nothing is really irreversible. Or at least for 30 days after an accident... Even accidental and complete file deletion can be reverted. Of course, the complexity of commands grows as the accident becomes more and more severe. Eg. while it was quite stressful, I was able to restore one of my repositories after accidentally purging all of the files in my local repository and in the online repo as well.

I won't give any specific example here, because **these commands should be approached with an enormous caution**. For every little accident, you can find thorough and detailed descriptions about how to restore your repository to the exact state you want it. But you can really f*** this up, if you're not careful enough.

## Short example of using git and GitHub
1. Imagine you have a repository on GitHub that is managed by only you. You're doing some measurements in a lab at the university and you want to upload your datafiles from the lab computer to your GitHub (because of course, what else a sane person would do in this situation). You hack the lab computer, install _git_, setup an SSH key and download your "ELTE-physlab57" repository to the computer via SSH using the command

```bash
$ git clone git@github.com:username/ELTE-physlab57.git
```

2. You put the files into the downloaded folder and then you upload back all new and modified files in it to GitHub using the following commands:

```bash
$ cd ELTE-physlab57
$ git add .
$ git commit -m "added datafiles from lab computer"
$ git push
```

3. You go home and 6 days later you start working on your lab report, because you have to hand it in before 23:59. (You only have 6 hours until that.) (POV: you're a university student.) You want to download to your own computer the datafiles first from GitHub to start working with them. You already have your repository cloned to your machine, so you just `cd` into its folder and download the datafiles with

```bash
$ cd ELTE-physlab57
$ git pull
```

You're done.