# Introduction to Git and GitHub

## Version Control System (VCS)

A Version Control System is a software tool that tracks changes to files over time, allowing to revert to previous versions if needed. When you decide to add a file to the VCS, a base version of it (how it is the moment you add it) is stored, and then the changes you make are recorded every time you decide it.

It is easier to see it as a recording of your progress. The VCS also allows you to rewind to previous versions of the documents, see changes you made in the past, have different versions of the same document, etc.

The VCS allows us to decide which changes will be made to the next version, and keeps useful metadata about them. The complete history for a particular project and their metadada make up a **repository**. Repositories can be kept in sync accross different computers, facilitating collaboration among different people.

## Git


Git is the most popular distributed VCS. It is commonly used for both open source and commercial software development. Let's install it and set it up!

To see if you have already installed it, open a terminal and run:

```bash
git --version
> git version 2.43.0
```

The output should look like the above. If you do not have installed it yet, depending on you OS (Windows, Linux or MacOS), you should follow the next instructions:

https://github.com/git-guides/install-git

In the link, there are the instructions to install it in all three OS.


Now, let's add the initial configuration. For that, you will need to indicate to Git your name and an email. Consider that this email is the one we will later use for doing the project. You can use the institutional email, or a personal one if you plan to do projects outside of work. This is also the one we will use later for GitHub. Both will be associated with your Git activity.

Run the following commands, with your name and email:

```bash
git config --global user.name "Your Name"
git config --global user.email "yourname@gmail.com"
```

We will also change the name of the default branch (we will talk about branches later):

```bash
git config --global init.defaultBranch main
```

To see everything added go the configuration file, run:
```bash
git config --list --global
```

You can add more things to the configuration file, such as your preferred text or code editor, display settings, alias command, etc. But for now, with this we have enough.

To finish, you can get a list of the available commands with:

```bash
git help
```

### Creating a repository

When we have a project where we want to track our changes, we can use Git. For that, we can create a local Git repository, which is a place where Git can store versions of our files. To our eyes is nothing more than a special kind of folder/directory. Through the terminal, go to the folder where you want to create the repository:

```bash
cd /your/chosen/folder
```

Create a directory where your repository will be and move inside it:

```bash
mkdir intro_to_github_dir
cd intro_to_github_dir
```

Then, we tell Git to make it a repository:

```bash
git init
```

And run:

```bash
ls -a     # This command is to list everything in the directory, including hidden files
```

You will see that a `.git`directory is now there. This is a special subdirectory to store all the information about the project.

To finish, run:

```bash
git status
```

**IMPORTANT**: there is no need to initialize a git repo inside each of the subdirectories in your project. Git will keep track of everything below the root directory, including all files inside subdirectories.

### Tracking changes

For the purpose of seeing how we can track changes in the repository, create a .txt file, write whatever you want in the first line, and save it in the repository. In Linux and MacOS, you can do it by running:

```bash
nano file_name.txt
```

This will open the nano editor. Write something in the first line and save it (the command to save it should appear on screen). Run:
```bash
ls
```
This command lists all visible files in the current directory. You should see your newly created file. With:
```bash
cat file_name.txt
```
The contents of the file are printed on the terminal.

Now, run the command we saw before: `git status`.
What is the output?

To start tracking the changes in the file, run:

```bash
git add file_name.txt
```
Once again, run `git status`. What is the output?

The file is now being tracked, but we have not yet recorded a first version of it in the repository. To do that, we need to do a **commit**.

**IMPORTANT**: A commit is like a checkpoint. Once you add something to your code and consider the new feature finished, you do a commit. Git will have a list of all commits done in the history of the repo, and it allows you to go back and forth along this list.

**GOOD PRACTICES**: it is always better to do smaller commits (small changes to the code and commit) than bigger ones (big changes to the code and commit). This way, it is easier to keep track of the changes done in the code, and what was done in each commit.

To do a commit, run:

```bash
git commit -m "Created file_name.txt"
```

The -m flag is used to record a short, descriptive message regarding the changes done before the previous commit and the current one. It is useful to later remember what we did and why. Run `git status` again. What is the output?

Now run:

```bash
git log
```
To exit, just press 'q'.

Modify the .txt file again. Add a couple lines to it. Then, run `git status` and the following command:

```bash
git diff
```
This command shows us the differences between the last commit and the current version of the document. Run `git add file_name.txt` and run `git diff` again. To see differences on a document you have already added, run:

```bash
git diff --staged
```

This will look for the differences between the documents in the staging area and their versions in the previous commit. Run a new commit with:

```bash
git commit -m "Descriptive message"
```

**IMPORTANT**: always, before a commit, run `git add` to the new files you want to track and the modified files. This command adds the files to the **staging area** where the snapshot (commit) will be taken.


### Exploring history

When we run `git log`, we can see that there is a string of alphanumerical characters for each commit. This is the commit identifier, and we can use it to refer to a specific commit when exploring the history of our repository. However, we can refer to the most recent commit as `HEAD`.

```bash
git diff file_name.txt
git diff HEAD file_name.txt
```

These two commands will give the same output, since we are looking at the same previous commit. However, we can run:

```bash
git diff HEAD~1 file_name.txt
git diff HEAD~2 file_name.txt
```

This means to look for the differences in the commits previous to the last one. You can refer to a specific commit with the first characters of its identifier:

```bash
git diff f22b25e file_name.txt
```

To restore a version of a document from a previous commit, run:
```bash
git restore -s f22b25e file_name.txt
```
If it is to the previous commit, with `git restore file_name.txt` is enough.

### .gitignore

In every project, there are some documents or directories we do not want to keep track of. There is a file, .gitignore, that we can use in our repo to specify which are the documents we do not want to keep track of. This way, they will not appear when we run `git status`, making things easier to keep track of the files we do need to keep track of. To create this file, run:

```bash
nano .gitignore
```

You can refer to the files inside in several different ways. For example:

```
*.png
*.pkl
raw_data/
```

This will ignore all files that end with .png and .pkl, and everything inside the raw_data/ directory.

## GitHub

VCS are really useful to collaborate with other people. Now, we are only missing the way to copy changes from one repository to another. In practice, the easiest thing to do is to use one copy as a central hub, and to keep it on the web rather than on someone's PC. For that purpose, we will use [GitHub](https://www.github.com).

Now, we will create an account and create our first remote repository.

### Connect local to remote repository

Now that you have your GitHub account and your first repository created, what we need first is a way for your computer and the GitHub web to communicate in a secure way.

#### SSH setup

For that, let's create an SSH key for your computer to comunicate with GitHub. From the terminal, execute the following command:

```bash
ls -al ~/.ssh
```

If you have previously used SSH keys, a list of those keys should appear. Otherwise, an error will appear indicating that such directory does not exist.

Let's create a new SSH key pair:

```bash
ssh-keygen -t ed25519 -C "yourname@gmail.com"

# Some systems do not have support for the Ed25519 algorithm. In that case, use:
ssh-keygen -t rsa -b 4096 -C "yourname@gmail.com"
```

You can put a passphrase (it is just a password) for the SSH key. However, there is no "reset my password" option. Now, run again:

```bash
ls -al ~/.ssh
```

Now, we have to let GitHub know what our SSH public key is. You can get your public key by running:

```bash
cat ~/.ssh/id_ed25519.pub
```
The whole output is the key. Now go to GitHub and add it. Once that is done, run:

```bash
ssh -T git@github.com
```


#### Connect local to remote

Go the the repository page, and copy the SSH URL for a quick setup. Then, run the following:

```bash
git remote add origin repo_url
```
We check if it has worked by running:
```bash
git remote -v
```

Then, finally, you can **push** your local work to the remote repository in GitHub with:

```bash
git push origin main
```

### Collaborating

To collaborate in a GitHub repository, you need to add Collaborators. Go to the GitHub repository you created in the last section, and open the Settings Tab. Inside, go to Collaborators, and here you can add collaborators by indicating their username or email in GitHub.

#### Clone

When there is a remote repository that you have not yet locally (it is on GitHub, but not on your computer), you can **clone** it. That means, you create a local copy of it, and you can now work on it.

You will be able to clone a GitHub repository in two circumstances:
- If the repository is public.
- If the repository is private and you are a collaborator.

To clone a repo, run:

```bash
git clone ssh_url local/path
```

You can get the ssh_url from the repo's main page and the local/path is where in your computer you want to clone the repository.

#### Pull and push

This are the two mechanics you are going to use the most, so it is important to use them well.

**Pull**: this command copies the changes that are in the remote repository that you have not yet pulled to your local copy. This means, if a collaborator has made some changes and uploaded them to the remote repo, you have to do a pull to get them locally. To do it, just run from your local repo:

```bash
git pull
```

**Push**: when you do some commits locally, you have to make them appear in the remote repository. To do that, you push your commits. You can do several commits without a push, and push them all together at once. You only need to run from your local repo:

```bash
git push
```

**GOOD PRACTICES**: whenever you start with your coding session, it is always good practice to do a pull first. This way, you make sure to get all the changes made in your abscence before you start making any more changes.

#### Conflicts and merges

Therefore, when you are making changes in a repository with some other people, it is inevitable to end up making changes in the same file, even along the same lines said file. What happens then?
![image](https://swcarpentry.github.io/git-novice/fig/conflict.svg)

**Conflict**: it happens when there are two versions of the same file; one version you have locally, the other is in the remote repository, and the commit history does not coincide between them. Git will not let you push, and you will have to pull. When you do, something like this will appear in your terminal:

```bash
CONFLICT (content): Merge conflict in file_name.txt
Automatic merge failed; fix conflicts and then commit the result.
```

If you see the contents of the files where there is a conflict, there will be some lines that look like this:

```bash
<<<<<<< HEAD
* These are the changes that are in MY version.
=======
* These are the changes that were in the remote repository.
>>>>>>> dabb4c8c450e8475aee9b14b4383acc99f42af1d
```

**Merge**:  what you need to do is to solve the conflicts yourselves, *merge* the changes. Remove the special lines (<<<<< HEAD, >>>>>>>, =========, etc.), talk to the colleague that made the other changes (sometimes you do not need to, since the code is clear enough to understand the changes you need to make), and decide for a final version of the file. Then, you will be able to push again.

#### Branches

![image](https://wac-cdn.atlassian.com/dam/jcr:a905ddfd-973a-452a-a4ae-f1dd65430027/01%20Git%20branch.svg?cdnVersion=2653)

In the image above, each of the different points is a commit. Normally, what you have is just the line of blue points, with your version being the last commit of the main branch. It usually is a stable version of your code, so when you want to add something to your code, it is better not to touch it.

That is why we have **branches**. To create a new branch, run:

```bash
git branch branch_name
```

When you create a new branch, it starts pointint to the same commit you were before:

![image](https://wac-cdn.atlassian.com/dam/jcr:387f080e-19b8-43ab-a7a3-0921ffd7298a/03%20Creating%20branches.svg?cdnVersion=2653)

To move to the new branch, run:

```bash
git checkout branch_name
```

Then, you can start using the standard git commands to commit the changes in your code, and they will be separated from the main branch.

Other useful commands:

```bash
git branch -d branch_name  # Safe delete of the branch, git prevents you deleting it if there are unmerged changes
git branch -D branch_name  # Force delete of branch
git branch  # Show a list of all branches
```

#### Pull request

A pull request is a way to propose changes you've made in a Git branch to be merged into another branch, usually the one in production (main). It is a way to review the code before it becomes part of the main project.

Once you are done with the changes you wanted to make in your own separate branch, you go to the GitHub repository's main page, and create a pull request. You select the target branch (usually main) and your origin branch (the branch you have been working on). Then, you submit it, and the administrator of the main branch can see the the changes you made in the code, offer some feedback and decide if the code is ready to be integrated or not.

#### Fork

Forking is a different type of workflow. We use fork when we are neither owners nor collaborators of a repository, and we still want to make a contribution.

**Fork**: it is a personal copy of someone else's repository on GitHub. It allows you to experiment and do changes without affecting the original project. It is a way of customizing the project for your own use and, if you want someone else to use it, you can propose the changes to the original repository via Pull Request.

Let's go to the [firstcontributions](https://github.com/firstcontributions/first-contributions) GitHub and fork it!

----------------------
This tutorial is based on the ones from [Software Carpentry](https://swcarpentry.github.io/git-novice/index.html) and [Atlassian](https://www.atlassian.com/git/glossary#commands).