# Git

[Git](https://git-scm.com/) is an open source version control system. Originally authored by Linus Torvalds in 2005. Git was created for use in the development of Linux kernel, but quickly spread into other organisations as well. Although Git can be used to version any files, it is typically used by software engineers to version code.

Version control systems allow track changes and collaborate working with files. Some of the primary features of VCSs are:
- Tracking how file have change over time,
- Being able to revert the file state to specific point,
- Allowing multiple people to collaborate working on the same file base,
- Graciously managing the scenarios when multiple people are trying to make simultaneous changes to same files.

Although Git is defintely not the only VCS available, it is by far the most dominant one. During the 2022 [StackOverflow developer survey](https://survey.stackoverflow.co/2022/#section-version-control-version-control-systems) over 93% respondents stated that they use Git as a primary version control system. In comparison the second most popular VCS was [SVN](https://en.wikipedia.org/wiki/Apache_Subversion) at 5%.

Some companies, like Facebook, choose to use non-Git version control systems like Mercurial. More about the Facebook case can be read [here](https://engineering.fb.com/2014/01/07/core-infra/scaling-mercurial-at-facebook/) and [here](https://github.com/facebook/sapling). Some of the factors why companies might choose alternative VCS are enormous scale of their codebases.

## Git overview



Git versions the data by storing full snapshots of it at various stages.

Most other VCS usually track only the changes to the original files rather than full snapshots. Using this approach the latest file state can be reconstructure by taking the original file and the applying all the changes ever made to that file. This typically does not impact the workflows of how Git is used, but keeping this model in mind, helps to explain some of the aspects of how Git works, i.e. deatched heads, forced changes and so on.

Git has 3 main states the files can be in: *modified*, *staged* and *commited*.

> - Modified means that you have changed the file but have not committed it to your database yet.
> - Staged means that you have marked a modified file in its current version to go into your next commit snapshot.
> - Committed means that the data is safely stored in your local database.

More on this on [git-scm.com](https://git-scm.com/book/en/v2/Getting-Started-What-is-Git%3F).

Once the code is locally *commited*, then it can be pushed to a remote repository to be made available to other people as well. Remote repositories need to be hosted somewhere, and this is where the Git platforms like [github.com](https://github.com), [gitlab.com](https://gitlab.com) and others comes into play. These platforms creating repositories and which people can collaborate with others for free.

Once the code is published (*pushed*) to remote repository, then the changes become immediately available for everyone else to *pull*.

## Basic Git commands

Git is primarily command line tool. There are various GUI interfaces that wrap Git, which might make some use cases easier, i.e. [SourceTree](https://www.sourcetreeapp.com/), [Git Extensions](https://gitextensions.github.io/) or various VS Code extensions.

Command-line interface (CLI) is the simplest (not necessarily the easiest) way to work with Git, so we are going to focus on CLI commands.

Usage of the commands covered here is simplified and covered only at the surface level. All of these commands are much more nuanced than covered here. The goal of this overview is to provide the minimal information needed to start working with Git and syncing changes with remote repository. Official docs go into full details about these commands here [git-scm.com/docs](https://git-scm.com/docs).

#### `clone`

Clones the remote repository. Makes a copy of that repository on your local machine.

Whenever you go to some Git platform, you can usually find a URL for cloning, which looks something like this: `https://github.com/smagurauskas/software-engineering.git`. Depending on the authentication scheme used, these URLs might look completely different.

In this example, the full command to clone this repository would look like this: `git clone https://github.com/smagurauskas/software-engineering.git`.

#### `status`

Displays status of the current working directory. Will show the current working branch and what files are staged or modified.

Usage: `git status`.

#### `add`

Stages files to be committed. Only the staged files would be included in the `commit`. If you have edited multiple files locally, but want to push only part of the changes, then `add`ing only these specific files allows you to cherry pick what you want to commit. After staging the files, they are displayed in green text when using the `status` command.

Adding a single file: `git add path/to/file.txt`.
Adding all files in the current directory: `git add .`.
Adding all the files with `.cs` extension: `git add *.cs`.
Adding the entire working tree: `git add -A`.

#### `commit`

Commits the staged files. Commit creates a revision of all the changes together with message and description of what these changes are about. Commit must be created, before the changes can be pushed to remote repository. Code can also be reverted to a state that it was at the specific commit if needed.

Usage: `git commit -m "Message indicating what was done"`. How to write a good message is a much discussed topic, which will be reviewed a bit later, but the general guidelines are that the message should be relatively short and should unambigiously describe what are the changes included in commit.

Alternatively it could be used by simply typical `git commit`, which would prompt Git to open editor, where message together with longer accompanying description could be specified.

#### `pull`

Pulls changes from remote repository. If the changes are compatible with the local changes it would be applied automatically. If there are some incompatibilities (called *conflicts* in this context), then you will be prompted to resolve them manually.

Usage: `git pull {repository} {branch}`. Typically would looks something like this: `git pull origin main`.

You can pull from the remote branch with which your current branch is synced with, or from any branch that is within the ancestry line. Meaning that if branched my `working_branch` from the `main` branch, then it means it valid for me to pull changes from both `main` and `working_branch` into my local `working_branch`.

#### `push`

Pushes committed changes to the remote repository. In order to push the changes, the local branch should be *synced* with the head of remote respository or the changes will be rejected. In best cases in means that the changes only have to be pulled and in more complicated cases it means that the conflicts will also have to be resolved.

Usage: `git push {repository name} {branch name}`. Typically would look like this: `git push origin my_branch`.

#### Working with branches

- `git branch branch_name` - creates a new branch from your current branch.
- `git switch branch_name` - changes your current working branch.

Shorthand for this can also be used: `git checkout -b branch_name`. 

[More on branching (and merging)](https://git-scm.com/book/en/v2/Git-Branching-Basic-Branching-and-Merging).

#### What the sample workflow could look like

```
# Clone the remote repository
git clone {repository URL}

# Create a branch where the work will happen
git checkout -b my_work

# Write sample junk
echo "aaa" >> aaa.txt

# Stage the newly created file
git add aaa.txt

# Commit the staged file
git commit -m "Wrote something into a file"

# Push the changes to remote
git push origin my_work

# Pull request can now be created to propose merging changes from my_work branch into main branch
```

## Writing good commit messages

How to write them?

### Squash your commits

Squashing

## Pull requests

How to review them?

## Additional reading

