# Class 12 - 8.6.20

# Git and More Python Libraries

## Git

When we started using git, the basic intro I gave came down essentially to this:

![Git, from xkcd 1597](extra_material/git.png)

I did try to explain a little why we use it and how, but I essentially forced it upon you and made you memorize a few commands for it to work, and for your HW to be correctly submitted. Today we'll try to dive a bit deeper and understand the "how" of git. This is because git is an immensely useful tool for our type of work, and it will be even more useful when we start hacking together in a few weeks time. 

### Why git?

Version controlling software lets us:
1. Write and manage the history of our codebase.
2. Work on changes for our code without modifying the currently working copy.
3. Collaborate with others easily.

Reason #1 is important enough as is, but the other two really makes git shine above the rest of the options out there, and it's also what made it so popular in the open source community.

There are a [variety](https://www.daolf.com/posts/git-series-part-1/) [of](https://missing.csail.mit.edu/2020/version-control/) [tutorials](https://rachelcarmena.github.io/2018/12/12/how-to-teach-git.html) [on](https://smusamashah.github.io/explain-git-in-simple-words) [git](https://git-scm.com/doc). Some are also [interactive](https://learngitbranching.js.org/). The one below is aimed at working with VSCode's integrated tools, and should also help you collaborate during the hackathon.

### The model

![Git composition](extra_material/git_lifecycle.png)

In a version-controlled project we have these three basic components:
1. Repository - the name for the entire collection of version-controlled files. We can have a local repository, i.e. one in our computer, a remote one, i.e. one on the cloud (like GitHub provides) or both. A repository contains both the standard files we're working on, and a compressed copy of them inside the `.git` folder.
2. Working tree - the name given for the current state of the version-controlled folder, which might defer from the version that was previously captured and recorded. When the working tree is "clean" it means that there's no such difference between the latest "snapshot" of the codebase, as resides inside the `.git` folder, and the copy currently visible to the user.
3. Index - an intermediate "staging" environment, the only place from which code can be inserted into the repository. This allows us to change many things, but only commit parts of them into the repository itself.

The main idea behind git is to capture complete snapshots of the directory we're tracking. Yes, the snapshots include everything that was present at that directory during that point of time and they're complete, i.e. "decoding" them allows us to view the directory as it once was.

### Setting up a repository

Any directory can be a git directory if it contains a `.git` folder with the proper subfolders (automatically generated, of course). The easiest way to create such a folder is to use the `init` command. If you open a folder which isn't version controlled, VSCode will automatically offer to create a new git repository in it:

![Initialize a repo](extra_material/git_init.png)

If your directory is already version controlled, simply opening it in VSCode should work. If it's not working, make sure you used the "Open Folder..." command to open VSCode. The folder containing the `.git` folder is the "top level" of your repository, and will automatically notice any changes in the subfolders of that folder. It's usually not recommended to have several git projects inside a git-tracked folder. This means that if you have a "my_code_projects" folder which contains several other git-tracked folders, like HW assignments and other unrelated stuff, make sure that "my_code_projects" isn't a git folder in itself, i.e. you shouldn't have a `.git` folder in it. It's technically possible to have "independent" git repositories inside others (they're called sub-modules) but it's usually not the right way to organize your projects.

To conclude, once git is set the parent directory, as well as its sub-directories, will be tracked as long as there's a `.git` folder in it. To remove `git` tracking from that specific folder, all you have to do is to remove the `.git` subfolder.

### Commits and trees

The act of committing a file is perhaps the most fundamental one in git, so let's break it down. When a file in a git-tracked folder has different contents from what the git repository has in it (remember that it has copies of everything) the file will be recognized by git as a "changed" file. In git terms, it means that the copy of the file in the "working tree" is different than what's in the repository. Of course, if we haven't committed anything, and only initialized the repository, __all__ files will have a different working copy than what git has in its repository, since git doesn't recognized any of them. Let's see how it looks like in VSCode:

![Working copy](extra_material/git_workcopy.png)

As you see, there are two options for this situation - either the file is completely new and doesn't exist in the repository, and so it's marked as "U" in git ("Untracked") or it already existed there, but the current content is different than the one recorded in the repository ("M" for "Modified").

From the chart above we see that we need to add the files to the index, before committing them to the repository. Adding to the index, also known as "staging", is done using the "git add" command, represented in VSCode with a "+" sign. We can either add all files to the index, or add one at a time.

![Add all](extra_material/git_add_all.png)
![Add one](extra_material/git_add_one.png)

The reason we might not add everything at once is that commits are used to convey messages, so if a file doesn't fit the message a commit is saying, we can commit it sometime else.