# Analytics snippet: Version control, because the Recycle Bin doesn't count

In [19]:
import os

## What is version control and why should I bother?

Version control, also known as revision control or source control, is the management of changes to programs, files, documents and other collections of information.

As actuaries, our modelling techniques and analyses come in many forms and vary in size depending on the task at hand. For example, a simple exercise of analysing loss ratios for a given period and product could consist only of a small R script or SQL query. On the other hand, something like a functional reserving model could easily be a one gigabyte-sized excel workbook. 

You, the reader, probably already practice version control in one way or another in the course of your work/ studies. The most primitive and clunky (but useful) method would be the the "Save As" button, where another version of the working file is snapshot and the user is free to modify the currently opened document. Does this look familiar or remind you of your college submissions? <br>

INSERT IMAGE HERE

## What can you expect from this article?

This article will attempt to :
- Cover several basic practices for version control for scripts
- Go through a simple Git process avoiding the command line as much as possible
- Provide useful resources for more complicated scenarios in Git
- Discuss version control for general documents and databases
- Provide a list of possible tools available

## GitHub Example

Let's demonstrate some version control practices with a simple but somewhat realistic example using one of the most popular platforms, GitHub. By the way, it is pronounced as _Geet-hub_, not _Jeet-hub_. <br>
To put things simply, Git is just an open source system designed to handle version control processes for both small and large scale projects. GitHub is just a hosting service for projects that use this "Git" tool. <br>
Imagine working on a task which requires your team to come up with a reasonable prediction function given some historical dataset. Unsurprisingly, a folder that encompasses the whole project might contain files such as:
- The required input data
- A python script which runs the a model and produces the coefficients for prediction

Note that in most cases, models and tasks are usually a little more complex, which translates into a more complicated file structure with modularized scripts even. Nevertheless something simple such as the one we have will suffice in demonstrating a good use-case for version control.

In [5]:
os.listdir("./example_project")

['data.csv', 'model.py']

To start off, we want to go into the [GitHub website](https://github.com/) and create an account if you have not already.

In [20]:
%%html
<img src="img/github_homepage.png", width=600, height=600>

Signing up for an account in GitHub is free with a paid option. Besides being a version control tool, GitHub also serves as a platform for collaboration and project management for a variety of disciplines. <br>
After creating a free account, we can then start by creating our first repository or "repo" for short. As the name suggests, a repository is simply a place where you can store your documents and the user has a choice (as of January 2019) of making the its content public or private.

In [21]:
%%html
<img src="img/create_repo.png", width=600, height=600>

In [22]:
%%html
<img src="img/empty_repo.png", width=600, height=600>

We now have an empty (apart from the README.md) virtual storage which we can put our files into! There are generally 2 ways you could go about using the features available on GitHub, and these are the:
- Git CLI (Command-line interface)
- Github Desktop GUI (Graphic User Interface) <br>

In this article, we will try to avoid using the CLI (although it is my preferred method) as it can be daunting for the users who are new to Git and the concept of a version control system. Just as an example, creating a repository using the Git CLI would require the user to install Git from [here](https://git-scm.com/downloads), and the steps shown above correspond to the following terminal/command prompt commands.


In [30]:
%%html
<img src="img/create_repo_cli.png", width=400, height=400>

Okay, enough of that scary stuff, let's go with option number 2, the GitHub Desktop GUI. It is basically an application which can be downloaded from [here](https://desktop.github.com/), and allows the user to interact and use the GitHub features without having to go onto your web browser every single time. Before we move on, let's take some time to go through the general mechanics of the Git system. <br><br>


EXPLAIN WITH A GRAPH/FLOWCHART? NOTHING TOO COMPLICATED. SIMPLE <br>
ADD COMMIT PUSH, TALK BRIEFLY ON BRANCHES

Now that we have installed GitHub desktop, opening the application should bring you to an interface that looks something like this.

In [32]:
%%html
<img src="img/github_desktop_home.png", width=400, height=400>

Remember, we created an online storage before on the GitHub website, but we will need a file in our local machine which we can use to sync our documents. Think of it as dropping some files into a local DropBox or GoogleDrive folder to sync them into your online storage. If we then proceed to click on the "Repository" tab in the top left corner, we will see an "add" option.

In [34]:
%%html
<img src="img/clone_repo.png", width=400, height=400>

In [35]:
%%html
<img src="img/clone_repo_2.png", width=400, height=400>

Here, you can just copy the url of your repository (either from the address bar at the top of the web page or the green "clone or download" button) into the first field. In the local path, you are to choose an empty folder which will become your new "local version" of the GitHub storage, and any changes made there can be synced to the online repository.

In [36]:
%%html
<img src="img/local_repo.png", width=400, height=400>

After cloning a local version of the repository, it is finally time to sync our project files. We can do this by simple adding some files into the local path. Let's add our 2 files:
- data.csv
- model.py

Into the local folder and see what happens.

In [37]:
%%html
<img src="img/git_status_local.png", width=400, height=400>

We can see that the GitHub application automatically detected the changes within the folder! (which is the addition of 2 new files in this case). After checking through to make sure that those are indeed the files that we want to sync, we simply type a summary and description of the update on the bottom left of the window, and hit the "commit to master" button. (There are a set of industry best practices with regards to the format of commit messages and descriptions but we will not go into that in this article) <br>

After committing our the changes/ additions, we just need to hit the "push to origin" button and just like that, our files will be synced to the online repository. Let's go back to the GitHub website to make sure that the files were indeed synced up.

In [38]:
%%html
<img src="img/github_synced.png", width=400, height=400>

If down the road, we make adjustments to any of the files in our local folder, the GitHub application will pick up these changes in the same way shown above and we would just go through the same steps to sync our files online. So, you might be asking, where does the version control aspect of this tool come in?

## Other tools available

- Atom IDE