# Introduction to GitHub

Hello there! If you're new to data engineering, or are just looking for a way to manage and collaborate on code, you've come to the right place. In this guide, I will introduce you to `GitHub`, a platform that's essential for any data engineer (or anyone working with code for that matter!). From setting up your first repository to making your contributions count with verified commits, we’ve got it all covered.

## What is GitHub?

`GitHub` is a web-based platform for version control and collaboration. It allows you to work on projects collaboratively with others, and keeps track of all the changes made to the code. It's built on top of Git, which is a distributed version control system developed by Linus Torvalds (yes, the creator of Linux!). GitHub adds a web-based interface to Git, and provides several collaboration features like bug tracking, feature requests, task management, and wikis for every project.


<hr style="background: linear-gradient(to right, #f00, #00f); height: 5px; border: none;" />


## Setting Up GitHub

**Step 1: Create a GitHub Account**

Before you start using GitHub, you need to have an account. Head over to <a href="https://github.com" target="_blank">GitHub</a> and sign up for a new account if you don't have one.

**Step 2: Install Git**

GitHub is built on Git, so you need to have Git installed on your machine. Go to the <a href="https://git-scm.com/downloads" target="_blank">Git downloads page</a> and install Git for your operating system.

**Step 3: Configure Git**

Now that you have Git installed, open a terminal (or command prompt on Windows), and configure Git with your name and email address. These will be used to identify you as the author when you make commits.

<pre>
<code class="shell">
git config --global user.name "Your Name"
git config --global user.email "your.email@example.com"
</code>
</pre>

### Cloning the Training Repository

Let’s say you've found a training repository on GitHub that you'd like to use. The first thing you should do is create a copy of that repository on your local machine, and this is known as ‘cloning’.

Execute the following command in your terminal, replacing the URL with the repository's URL:
<pre>
<code class="shell">
git clone <a href="https://github.com/bcg-x-engineering/bcg-x-DE-training.git">https://github.com/bcg-x-engineering/bcg-x-DE-training.git</a>
</code>
</pre>

Now you've got a local copy of the repository!

<img src="https://i.ibb.co/gb2WVRX/Screenshot-2023-06-11-at-8-00-01-PM.png" alt="Screenshot-2023-06-11-at-8-00-01-PM" border="0" height = "400" width = "400">

### Creating a Branch

Branches are used to develop features isolated from each other. The main (default) branch usually contains the stable code, and other branches are used to develop new features.

Create a new branch by executing:

<pre>
<code class="shell">
git checkout -b my-new-branch
</code>
</pre>

### Making Your First Commit

Now that you're on your branch, let's make some changes to the files. Once you've made your changes, you'll want to 'commit' them. This is like taking a snapshot of your files at this point in time.

First, you need to stage the files that you want to commit. To do this, run:
<pre>
<code class="shell">
git add filename
</code>
</pre>
Or, to stage all the files that have been modified, you can use:
<pre>
<code class="shell">
git add .
</code>
</pre>
Now that your files are staged, you can make the commit with:
<pre>
<code class="shell">
git commit -m "A descriptive message about the changes"
</code>
</pre>

### Pushing Changes to GitHub

After committing your changes locally, you need to send them to GitHub. You can do this with the 'push' command:

git push origin my-new-branch

This sends your changes to the GitHub repository under your new branch.

### Creating a Pull Request

Once you've pushed your branch to GitHub, you can ask for your changes to be merged into the main branch. This is done through a Pull Request (PR).

<ul>
<li>    
Go to the GitHub page for the repository.
</li>
<li>
    Click on <a href="https://github.com/bcg-x-engineering/bcg-x-DE-training/pulls">'Pull requests'.</a>
</li>
<li>
Click on <a href="https://github.com/bcg-x-engineering/bcg-x-DE-training/compare">'New Pull Request'.</a>
</li>
<li>
Select the branch you want to merge from and the branch you want to merge into.
</li>
<li>
Add a title and a description for your PR.
</li>
<li>
Click 'Create Pull Request'.
</li>
</ul>    

### Making Verified Commits

Verified commits are an extra layer of security to ensure that the commits coming from you are indeed authored by you. This is done by signing your commits using GPG.

<ul>
    <li>
    First, you need to generate a new GPG key. <a href="https://docs.github.com/en/authentication/managing-commit-signature-verification/generating-a-new-gpg-key">Follow these instructions.</a>
    </li>
    <li>
    Add the GPG key to your GitHub account.
    </li>
    <li>
    Tell Git about your signing key with:
    <pre>
    <code class="shell">
    git config --global user.signingkey YOUR_GPG_KEY_ID
    </code>
    </pre>
    </li>
    <li>
    Now, use the -S option with git commit to sign your commits:
    <pre>
    <code class="shell">
    git commit -S -m "Your commit message"
    </code>
    </pre>
    </li>
</ul>


<hr style="background: linear-gradient(to right, #f00, #00f); height: 5px; border: none;" />


## Managing Branches

### Branching in git

Branching in Git is a powerful feature that allows you to work on different versions of a project at the same time. When you create a branch, you are essentially creating a copy of your project that you can modify without affecting the main branch.

<b>Creating a branch</b>

<code class="shell">git branch &lt;branch_name&gt;</code>

<b>Switching to a Branch</b>

<code class="shell">git checkout &lt;branch_name&gt;</code>

ALternatively, you can perform both the actions using the below command

<code class="shell">git checkout -b &lt;branch_name&gt;</code>

## Stashing Changes

Sometimes you're in the middle of some changes but need to switch branches. These uncommitted changes can "follow" you to the new branch and mix with the other branch's content. To prevent this, you can use `git stash`.

**Stashing changes**

<code class="shell">git stash</code>

This command takes your modified tracked files, stages changes, and saves them on a stack of unfinished changes that you can reapply at any time.

**Reapplying stashed changes**

<code class="shell">git stash apply</code>

This command reapplies the changes you just stashed.

If you have multiple stashed changes, you can choose which stash to apply by providing as an argument to `apply` the stash name.

<code class="shell">git stash apply stash@{2}</code>


<hr style="background: linear-gradient(to right, #f00, #00f); height: 5px; border: none;" />


## Advanced Git Commands

### Git Rebase

Rebasing is the process of moving or combining a sequence of commits to a new base commit. It's a way to integrate changes from one branch into another, similar to `merge`. But instead of integrating the commits, it's as if the changes happened sequentially on top of the branch you're rebasing onto.

**Executing a rebase**

<code class="shell">git rebase &lt;base&gt;</code>

Here, `<base>` is often `main` branch or the branch you want to base your changes onto.

### Difference between `git rebase main` and `git rebase origin/main`

**`git rebase main`**

This command moves the commits of the current branch on top of the `main` branch. This is useful when you've done work on your current branch and `main` has been updated with more recent commits. You're effectively saying "I want my changes to be based on the latest `main`, not the old `main` I originally branched from".

<code class="shell">git rebase main</code>

**`git rebase origin/main`**

This command is very similar to the previous one, but it operates on the `main` branch of the `origin` remote. This command is useful when other people have made changes to the `main` branch in the remote repository (`origin`) and you want to base your changes on the latest version of `main` in `origin`.

<code class="shell">git rebase origin/main</code>

Please note that it's important to fetch the latest changes from `origin` before running `git rebase origin/main`, otherwise, you might be working with out-of-date information.

<code class="shell">git fetch origin</code>
<code class="shell">git rebase origin/main</code>


### Git Cherry-Pick

If you need changes introduced in a commit on another branch, but don't want to merge the entire branch, you can use `cherry-pick`.

**Cherry-picking a commit**

<code class="shell">git cherry-pick &lt;commit&gt;</code>

Here, `<commit>` is the commit hash you want to pick.

### Git Tag

Tags are ref's that point to specific points in Git history. It’s generally used to capture a point in history that is used for a marked version release.

**Creating an annotated tag**

<code class="shell">git tag -a v1.4 -m "my version 1.4"</code>

**Sharing tags**

By default, when you push to your remote repository, it does not transfer the tags. You can push one single tag with `git push origin <tagname>`, or push all of your tags with `git push origin --tags`.


### Viewing the Commit History with `git log`

You can view the commit history using the git log command. It shows the most recent commits first. Each commit includes the author, the date of the commit, and the commit message.


### Viewing Specific Changes with `git diff`
You can use git diff to view the changes between commits, commit and working tree, etc. If you want to see what you’ve changed but not yet staged, you can use git diff without any parameters.

<code class="shell">git diff</code>

<code class="shell">git diff &lt;branch1&gt;..&lt;branch2&gt;</code>


### Forcefully Synchronizing with a Remote Branch

Sometimes, you might want to forcefully overwrite your current local branch to match a remote branch. This can be achieved using `git reset --hard <remote>/<branch>` command.

<b>Fetching the latest commits</b>

<code class="shell">git fetch &lt;remote&gt;</code>

<b>Resetting the branch</b>

<code class="shell">git reset --hard &lt;remote&gt;/&lt;branch&gt;</code>

<b>Warning</b>: This operation will discard any changes in your working directory and in the commit history of your current branch that aren't in the remote branch. Make sure to backup any changes you want to keep before running this command.


### Pushing to a Specific Branch with `git push origin HEAD:<branch_name>`

If you want to push the current branch to a branch with a different name in the remote repository, you can use `git push origin HEAD:<branch_name>`

`HEAD` is a reference to the last commit in the currently checked-out branch.

<hr style="background: linear-gradient(to right, #f00, #00f); height: 5px; border: none;" />


## Remote Refs and the HEAD

**Remote Refs**

Remote references or "refs" are pointers to the state of branches in your remote repositories. They take the form `refs/remotes/<remote>/<branch>`. For example, if you have a remote named `origin` and a branch named `main`, you'll have a remote ref pointing to the last known state of that branch at `refs/remotes/origin/main`.

**HEAD**

HEAD is a reference to the last commit in the currently checked-out branch. You can think of the HEAD as the "current branch". When you switch branches with `git checkout`, the HEAD revision changes to point to the tip of the new branch.

### Difference between `git rebase origin` and `git rebase origin/main`

The command `git rebase origin` doesn't make sense because `origin` is the name of your remote repository, not a branch. You have to specify a branch to rebase onto. When you run `git rebase origin/main`, Git will rebase your current branch onto `origin/main`.

## Git Fetch

`git fetch` is the command that tells your local git to retrieve the latest meta-data info from the original (yet doesn’t do any file transferring. It’s more like just checking to see if there are any changes available).

<code class="shell">git fetch origin</code>

You can also use the `-p` or `--prune` option to remove any remote-tracking references (refs) to branches that no longer exist on the remote.

<code class="shell">git fetch --prune origin</code>

## Git Add

`git add` command allows the user to add changes in the working directory to the staging area. It tells Git that you want to include updates to a particular file in the next commit. 

<code class="shell">git add &lt;file&gt;</code>

1. To add all new and changed files, you can use `.` or `-A`:

    <code class="shell">git add .</code>

    or 

    <code class="shell">git add -A</code>
    

2. To interactively stage files, or stage portions of files, you can use `-i` or `--interactive`:

    <code class="shell">git add -i</code>
    

3. Interactive Staging with `git add -p`

    Interactive staging allows you to selectively stage portions of changes made in your files. The `-p` option in `git add -p` stands for patch. This command launches an interactive staging session where git shows portions of the file and asks you what you want to do with that change.

    <code class="shell">git add -p</code>