# Analytics snippet: Version control, because the Recycle Bin doesn't count

## What is version control and why should I bother?

Version control, also known as revision control or source control, is the management of changes to programs, files, documents and other collections of information.

As actuaries, our modelling techniques and analyses come in many forms and vary in size depending on the task at hand. For example, a simple exercise of analysing loss ratios for a given period and product could consist only of a small R script or SQL query. On the other hand, something like a functional reserving model could easily be a one gigabyte-sized excel workbook. 

The rest of the article will go through a simple example concerning a small team of 3 actuaries and how GitHub is used for script version control in their day to day tasks.

## FOO Insurance Ltd

It is the beginning of a new financial year for FOO insurance Ltd, and its produts are due for a pricing review, which is handled by the actuarial team consisting of:

- Esther, Manger
- Jimmy, Analyst
- Michelle Analyst

The first step of this project is to review the age curve independently and produce a model which will be compared with the current one used in production. After getting briefed on the relevant details and timelines, the team then spends the next week discussing and iterating through various parameters and hypothesis, and finally comes up with the agreed upon model to use!

In [3]:
%%html
<img src="img/messy_files.png", width=400, height=400>

Okay, so which exactly is the final model here? Does this look vaguely familiar to your old college submissions or some of your drives in your office? If it is, then you probably already practice version control in one way or another. The most primitive and clunky (but useful) method would be the the "Save As" button, where another version of the working file is snapshot and the user is free to modify the currently opened document which produces a result similar to that shown above.

Note that these files are the product of a small team of 3 actuaries working on a simple and straight forward project. You could imagine how much more convoluted it could get with a bigger team and an actual analytics task.

The goal of the article then, is to give the reader a quick and practical way to go from that screenshot above to this:

INSERT GITHUB REPO HERE WITH 1 FILE

Would you believe it if I told you that the files (or file) in the second screenshot contains the same amount of historical information as the first and more? This is however, limited to scripts and small files. For bigger files like excel workbooks, the article will recommend some alternative tools for the reader but will not go into detail its implementations and mechanics.

#### Administrative and Security Issues

Let's rewind to a week ago before the actuarial team breaks away to work on the model. The first step towards a more sustainable file structure is for the team (or Esther) to consult FOO's IT/Tech department. This is because, there is a good chance that they are already using GitHub (Enterprise) for their day-to-day tasks and would be able to provide some guidance. If this is not the case, the team must consult the appropriate authorities within the company to make sure that GitHub is an approved platform and to go over the risks of security breaches. Do note that, as of October 2018, GitHub is officially owned by Microsoft. That should provide upper management some sense of security, right?  

#### Registering and Installing

After settling all the administrative issues, the actuarial team can now proceed to download and register for GitHub. To start off, they would want to go into the [GitHub website](https://github.com/) and create an account, or sign in if an account was already created.

In [20]:
%%html
<img src="img/github_homepage.png", width=600, height=600>

Assuming FOO Insurance Ltd does not have a corporate account with GitHub Enterprise, signing up for a personal account in GitHub is free with a paid option. Besides being a version control tool, GitHub also serves as a platform for collaboration and project management for a variety of disciplines across different departments. Note that as of January 2019, all GitHub accounts have the ability to create private repositories, which would be necessary for the team. (They might as well e-mail their pricing models to their competitors otherwise.)

#### Repository

Great! Now that all 3 of them have GitHub accounts, the next step would be to create a "location" to store all the relevant project files. In the world of Git, this is called a Repository or "Repo" for short, and repositories can be generally grouped into 2 main categories

- Local
- Remote

A local repository is just a file location residing in your local system, much like a folder in your local drive, whereas a remote repository is the "online" version which everyone sees. Esther now proceeds to create (initialize) a remote repository with the following steps so the team can start to work on the model.

In [1]:
%%html
<img src="img/create_repo.png", width=600, height=600>

In [22]:
%%html
<img src="img/empty_repo.png", width=600, height=600>

We now have an empty remote repositoroy in which the team can put all their project files into! There are generally 2 ways of using the features available on GitHub, and these are the:

- Git CLI (Command-line interface)
- Github Desktop GUI (Graphic User Interface) <br>

In this article, we will try to avoid using the CLI (Although it is my preferred method) as it can be daunting for the users who are new to Git and the concept of a version control system. Just as an example, creating a repository using the Git CLI would require the user to install Git from [here](https://git-scm.com/downloads), and the steps shown above correspond to the following terminal/command prompt commands.


In [44]:
%%html
<img src="img/create_repo_cli.png", width=400, height=400>

Okay, enough of that scary stuff, let's go with option number 2, the GitHub Desktop GUI. It is basically an application which can be downloaded from [here](https://desktop.github.com/), and allows the user to interact and use the GitHub features without having to go onto your web browser every single time. After all of them have installed the application, they should see a screen that looks similar to this.


In [32]:
%%html
<img src="img/github_desktop_home.png", width=400, height=400>

Okay, let's say Esther (Manager) has created an inital script for the age curve in R called "model.r", which uses Simple Linear Regression to model the relationship between policyholders' age and the average claims incurred. The contents of model.r looks something like this:

In [8]:
%%html
<img src="img/initial_model.png", width=400, height=400>

Nothing fancy, all the script does is to read the dataset in, instantiate a simple linear regression model and print the calculated coefficients. Esther now wants both Jimmy and Michelle to work on improving it, and can do that by uploading the current script into the remote repository created.

INSERT STEPS
- clone
- add
- commit
- push