## Crash course to Git

### Git the first step to build reproducible experiments

Our first focus is how to build reproducible experiments. The experiments are unique within ML lifecycle which rarely exists in regular software engineering world. In SE we mostly know what we want to build and how we should build it. But in ML, we should go over several rounds of experimentation to come up with the set of features, algorithms, and hyperparameters to come up with the best model. The experiment is a loop of sadness where we have a hypothesis, we test it, and then we evaluate how it performs. If it meets the requirements, then we're done, if not, we should repeat with the next best educated guess.

In the ML world, the result of an experiment, is a machine learning model. A machine learning model is the result of a marriage between code and data with a set of hyperparameters that should be executed on a certain piece of hardware.
In order to make this experiment reproducible, we need to be able to version or take snapshots of every component used to produce the model.

First we start with versioning the code. One of the most powerful and popular source control technologies is Git. Git is built by the founder of Linux to develop Linux. Since then, it has gain popularity to the extent that is now probably the only technology used to version software projects. Git is a free to use on your development environment. There are several commercial git base repositories, such as GitHub, Azure DevOps, GitLab, BitBucket, etc. which also provide free tier to host your code.

Now let's dive into the git commands!

### Instructions for this tutorial

To complete this tutorial, you need to open a VSCode, a file explorer and a command prompt.

In [1]:
%%bash

ls -ah

.
..
.DS_Store
.git
.gitignore
.ipynb_checkpoints
Git_Tutorial.ipynb
inference.py
training.py
util.py


In [2]:
%%bash

# Initial setup for git
git config --global user.email "girish.sureshkumar@gmail.com"
git config --global user.name "ramyagirish"

Initializing the Git environment on your local computer

In [3]:
%%bash

git init

Reinitialized existing Git repository in /Users/ramyagirish/like_me/git_tutorial/.git/


In [4]:
%%bash

ls -ah

.
..
.DS_Store
.git
.ipynb_checkpoints
Git_Tutorial.ipynb


In [5]:
%%bash

cat .git/HEAD

ref: refs/heads/master


In [6]:
%%bash

git status

On branch master

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	.DS_Store
	.ipynb_checkpoints/
	Git_Tutorial.ipynb

nothing added to commit but untracked files present (use "git add" to track)


### Git Ignore

Creating .gitignore. This file instructs Git on what should be ignored from being tracked or staged.

In [9]:
%%bash

echo "/.ipynb_checkpoints" > .gitignore
echo "Git_Tutorial.ipynb" >> .gitignore
echo ".DS_Store" >> .gitignore
echo "*.csv" >> .gitignore
cat .gitignore

/.ipynb_checkpoints
Git_Tutorial.ipynb
.DS_Store
*.csv


In [10]:
%%bash

git status

On branch master

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	.gitignore

nothing added to commit but untracked files present (use "git add" to track)


In [11]:
%%bash

echo "print('this is the training code!')" > training.py

In [12]:
%%bash

git status

On branch master

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	.gitignore
	training.py

nothing added to commit but untracked files present (use "git add" to track)


In [13]:
%%bash

git add training.py

# to add all files:
# git add .

In [14]:
%%bash

git status

On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

	new file:   training.py

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	.gitignore



In [15]:
%%bash

git commit -m "First commit!"

[master (root-commit) 0f18fe1] First commit!
 1 file changed, 1 insertion(+)
 create mode 100644 training.py


In [16]:
%%bash

git status

On branch master
Untracked files:
  (use "git add <file>..." to include in what will be committed)

	.gitignore

nothing added to commit but untracked files present (use "git add" to track)


In [17]:
%%bash

echo "print('this is the inference code!')" > inference.py

In [18]:
%%bash

git add .
git commit -m "adding inference logic!"

[master 42a9e1a] adding inference logic!
 2 files changed, 5 insertions(+)
 create mode 100644 .gitignore
 create mode 100644 inference.py


In [19]:
%%bash 

git status

On branch master
nothing to commit, working tree clean


In [20]:
%%bash

for value in {1..5..1}
do
    echo "print('Some random changes $value')" >> training.py
    git add training.py
    git commit -m "Commit for random change $value"
done


[master 4de60fb] Commit for random change {1..5..1}
 1 file changed, 1 insertion(+)


In [21]:
%%bash

for value in {1..5..1}
do
    echo "print('Some random changes $value')" >> training.py
done

In [22]:
%%bash

git log training.py

commit 4de60fb5b0eb33662d353c05927defb75fe9fdf6
Author: ramyagirish <girish.sureshkumar@gmail.com>
Date:   Thu Oct 17 16:02:32 2019 -0400

    Commit for random change {1..5..1}

commit 0f18fe175c9f6457702919267fd9a43e9f8a153d
Author: ramyagirish <girish.sureshkumar@gmail.com>
Date:   Thu Oct 17 16:00:15 2019 -0400

    First commit!


In [23]:
%%bash

git log --follow -p training.py

commit 4de60fb5b0eb33662d353c05927defb75fe9fdf6
Author: ramyagirish <girish.sureshkumar@gmail.com>
Date:   Thu Oct 17 16:02:32 2019 -0400

    Commit for random change {1..5..1}

diff --git a/training.py b/training.py
index 1d9e550..fa6b947 100644
--- a/training.py
+++ b/training.py
@@ -1 +1,2 @@
 print('this is the training code!')
+print('Some random changes {1..5..1}')

commit 0f18fe175c9f6457702919267fd9a43e9f8a153d
Author: ramyagirish <girish.sureshkumar@gmail.com>
Date:   Thu Oct 17 16:00:15 2019 -0400

    First commit!

diff --git a/training.py b/training.py
new file mode 100644
index 0000000..1d9e550
--- /dev/null
+++ b/training.py
@@ -0,0 +1 @@
+print('this is the training code!')


In [24]:
%%bash

cat .git/HEAD
echo "--------------------------"
git branch -a


ref: refs/heads/master
--------------------------
* master


### Link a Remote Repository to our Local Repository

In [25]:
%%bash

git remote add upstream https://github.com/ramyagirish/MLOps_workshop.git

In [26]:
%%bash
git push -u upstream master

Branch 'master' set up to track remote branch 'master' from 'upstream'.


To https://github.com/ramyagirish/MLOps_workshop.git
 * [new branch]      master -> master


### Git Issues

Plan your project using GitHub Issues (alternatively you can use AzureDevOps Boards, Jira, ...). Associate commits to the issues and close them at Pull Request.

In [27]:
%%bash

git status

On branch master
Your branch is up to date with 'upstream/master'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   training.py

no changes added to commit (use "git add" and/or "git commit -a")


In [31]:
%%bash

git add .

In [32]:
%%bash

git commit -m "testing the issues #1"

[master 2b2d152] testing the issues #1
 1 file changed, 4 insertions(+)


In [33]:
%%bash

git push

To https://github.com/ramyagirish/MLOps_workshop.git
   4de60fb..2b2d152  master -> master


In [34]:
%%bash

git status

On branch master
Your branch is up to date with 'upstream/master'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   training.py

no changes added to commit (use "git add" and/or "git commit -a")


In [37]:
%%bash

git add . 

In [38]:
%%bash

git commit -m "Resolved #1"

[master ae10834] Resolved #1
 1 file changed, 3 insertions(+), 1 deletion(-)


In [39]:
%%bash

git push

To https://github.com/ramyagirish/MLOps_workshop.git
   2b2d152..ae10834  master -> master


### Branches

For every new change or feature, there should be a new branch aka freature branch.

In [4]:
%%bash
echo "-- Listing branches --"

git branch -a

-- Listing branches --
* hyp-tuning-bo
  hyperparameter-tuning
  master
  remotes/upstream/hyp-tuning-bo
  remotes/upstream/hyperparameter-tuning
  remotes/upstream/master


In [41]:
%%bash

git branch hyperparameter-tuning

In [42]:
%%bash
git status

On branch master
Your branch is up to date with 'upstream/master'.

nothing to commit, working tree clean


In [43]:
%%bash
git checkout hyperparameter-tuning

Switched to branch 'hyperparameter-tuning'


In [44]:
%%bash
git status

On branch hyperparameter-tuning
nothing to commit, working tree clean


In [45]:
%%bash

echo "print('Added the Hyperparameter Tuning to the training code')" >> training.py
# echo "print('Added the Hyperparameter Tuning to the training code')" > inference.py

In [46]:
%%bash

echo "print('This is the util function second time')" > util.py
git add .

In [47]:
%%bash

git commit -m 'Added the hyperparameter tuning'

[hyperparameter-tuning aa72745] Added the hyperparameter tuning
 2 files changed, 2 insertions(+), 1 deletion(-)
 create mode 100644 util.py


In [48]:
%%bash

git checkout master

Your branch is up to date with 'upstream/master'.


Switched to branch 'master'


In [49]:
%%bash

git checkout hyperparameter-tuning

Switched to branch 'hyperparameter-tuning'


In [50]:
%%bash

git push 

fatal: No configured push destination.
Either specify the URL from the command-line or configure a remote repository using

    git remote add <name> <url>

and then push using the remote name

    git push <name>



CalledProcessError: Command 'b'\ngit push \n'' returned non-zero exit status 128.

### Add remote make this branch a tracking branch

In [51]:
%%bash

git push --set-upstream upstream hyperparameter-tuning

Branch 'hyperparameter-tuning' set up to track remote branch 'hyperparameter-tuning' from 'upstream'.


remote: 
remote: Create a pull request for 'hyperparameter-tuning' on GitHub by visiting:        
remote:      https://github.com/ramyagirish/MLOps_workshop/pull/new/hyperparameter-tuning        
remote: 
To https://github.com/ramyagirish/MLOps_workshop.git
 * [new branch]      hyperparameter-tuning -> hyperparameter-tuning


### Merge the NEW branch into the MASTER branch

In [52]:
%%bash

git checkout master

Your branch is up to date with 'upstream/master'.


Switched to branch 'master'


In [53]:
%%bash

git merge --no-ff hyperparameter-tuning

Merge made by the 'recursive' strategy.
 training.py | 2 +-
 util.py     | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)
 create mode 100644 util.py


In [54]:
%%bash

git push

To https://github.com/ramyagirish/MLOps_workshop.git
   ae10834..1564e0e  master -> master


In [55]:
%%bash

git checkout -b hyp-tuning-bo

Switched to a new branch 'hyp-tuning-bo'


In [56]:
%%bash

echo ""  >> training.py
echo "print('Hyperparameter Tuning Bayesian Optimization')" >> training.py
echo ""  >> training.py
for value in {1..5..1}
do
    echo "print('Some random changes $value')" >> training.py
done

In [57]:
%%bash

echo "" >> training.py
echo "print('Hyperparameter Tuning Bayesian Optimization - Completed')" >> training.py

In [58]:
%%bash

git add .
git commit -m "Updgraded HT with Bayesian Optimization"

[hyp-tuning-bo d9b4a0f] Updgraded HT with Bayesian Optimization
 1 file changed, 6 insertions(+)


In [59]:
%%bash

git push --set-upstream upstream hyp-tuning-bo

Branch 'hyp-tuning-bo' set up to track remote branch 'hyp-tuning-bo' from 'upstream'.


remote: 
remote: Create a pull request for 'hyp-tuning-bo' on GitHub by visiting:        
remote:      https://github.com/ramyagirish/MLOps_workshop/pull/new/hyp-tuning-bo        
remote: 
To https://github.com/ramyagirish/MLOps_workshop.git
 * [new branch]      hyp-tuning-bo -> hyp-tuning-bo


In [60]:
%%bash

cat .git/config

[core]
	repositoryformatversion = 0
	filemode = true
	bare = false
	logallrefupdates = true
	ignorecase = true
	precomposeunicode = true
[remote "upstream"]
	url = https://github.com/ramyagirish/MLOps_workshop.git
	fetch = +refs/heads/*:refs/remotes/upstream/*
[branch "master"]
	remote = upstream
	merge = refs/heads/master
[branch "hyperparameter-tuning"]
	remote = upstream
	merge = refs/heads/hyperparameter-tuning
[branch "hyp-tuning-bo"]
	remote = upstream
	merge = refs/heads/hyp-tuning-bo


### Resolve Conflicts

### Create branch from another branch

In [None]:
%%bash

git checkout -b hyp-tuning-bo-ext hyp-tuning-bo

In [None]:
%%bash

echo "" >> training.py
echo "print('Hyperparameter Tuning Bayesian Optimization EXTENDED - Completed')" >> training.py

In [None]:
%%bash

git add .
git commit -m "Extended HT with Bayesian Optimization Completed"

In [None]:
%%bash

git push --set-upstream upstream hyp-tuning-bo-ext

### Pull Request

Master/Another Branch <============= Feature Branch

### Git Graph

In [64]:
%%bash

git log --oneline --graph

* d9b4a0f Updgraded HT with Bayesian Optimization
*   1564e0e Merge branch 'hyperparameter-tuning'
|\  
| * aa72745 Added the hyperparameter tuning
|/  
* ae10834 Resolved #1
* 2b2d152 testing the issues #1
* 4de60fb Commit for random change {1..5..1}
* 42a9e1a adding inference logic!
* 0f18fe1 First commit!


### Git Graph VSCode Extension

In [61]:
%%bash
git log --oneline

d9b4a0f Updgraded HT with Bayesian Optimization
1564e0e Merge branch 'hyperparameter-tuning'
aa72745 Added the hyperparameter tuning
ae10834 Resolved #1
2b2d152 testing the issues #1
4de60fb Commit for random change {1..5..1}
42a9e1a adding inference logic!
0f18fe1 First commit!


In [62]:
%%bash
git show hyperparameter-tuning

commit aa72745210342d3daac0e8b94cb614cd004d8827
Author: ramyagirish <girish.sureshkumar@gmail.com>
Date:   Thu Oct 17 20:02:52 2019 -0400

    Added the hyperparameter tuning

diff --git a/training.py b/training.py
index b6f5c84..f243b55 100644
--- a/training.py
+++ b/training.py
@@ -5,4 +5,4 @@ print('Some random changes {1..5..1}')
 
 print('testing issues')
 
-print('closing the issues')
\ No newline at end of file
+print('closing the issues')print('Added the Hyperparameter Tuning to the training code')
diff --git a/util.py b/util.py
new file mode 100644
index 0000000..a2f0d45
--- /dev/null
+++ b/util.py
@@ -0,0 +1 @@
+print('This is the util function second time')


In [63]:
%%bash
git log -1

commit d9b4a0f413bcdbdb846a75f2203eb78c30d7bc7d
Author: ramyagirish <girish.sureshkumar@gmail.com>
Date:   Thu Oct 17 20:19:34 2019 -0400

    Updgraded HT with Bayesian Optimization


### Create Continue Integration (CI) Pipeline

* Add requirements.txt
* Add a simple unit test
* Create an Azure DevOps project (instructions are in the Deck)
* Create a DevOps pipeline (instructions are in the Deck)

In [None]:
##bash

echo "" >> requirements.txt

In [None]:
%%bash

mkdir tests

In [None]:
%%bash

echo "def test_example6():" > tests/my_unit_tests.py
echo "   assert 3 == 3" >> tests/my_unit_tests.py

### Homework

* Create a repo with Master, Release and Development Branches.
* Create separate Build Pipelines for Pull Request into each of the defined branches
* Add the Azure DevOps pipeline badge for the github repo
* Add a new failed test e.g. (assert 1==2) to observe how the pipeline fails