# What will you learn?

* Important tools and processes for working on a project
* General Idea About Cloud Services
* CI/CD Pipelines
* Basic Docker, Kubernetes and containerization

***To begin with***:

# Version Control

Version control lets multiple developers work on a project simultaneously, or one person work on a project from different computers. Each person has their own copy which they can work on. In addition to this, version control grants historical versions to a project, like a backup, which can be traced back in case of a crash or a mistake. There are two types of version control, **centralized** and **distributed**. The main difference is, while in a centralized version control each user only gets their own copy, while in distributed version control they may also get their own repository. We will be working on some examples from a distributed version control system (Git).

* git init "initializes a git repository"

**A typical workflow would go like this:**

* git pull (Gets the changes from the repo)   
* git add . (Tells git to track everything in the directory, prepares the current directory to get commited)
* git comit -m "comment on the update" (Commits the changes to the local repo with a comment so that other contributers can see what you did, like a checkpoint)
* git push (Pushes the commited files to the github repo)

**If working in your own branch**:

* git merge (Merges the changes in the named branch with your own branch)

**When in doubt:**

* git status (lists all modified files)
* git diff (shows specific differences)
* git log (shows the commits)

**Want to go back to an older version?**

* git checkout "commit hash"

**When you want to work separately:**

* git branch "new branch name"

**To obtain the personal copy of a repository that is on GitHub:** 

* git clone "repo URL"

![assets%2F-M15KrJzoMvhbv4NcO9o%2F-M4D_zaea7Lgc9yTKd8L%2F-M4Da-C19U9kv9CRce8Z%2Fgithub-flow.png?alt=media](https://gblobscdn.gitbook.com/assets%2F-M15KrJzoMvhbv4NcO9o%2F-M4D_zaea7Lgc9yTKd8L%2F-M4Da-C19U9kv9CRce8Z%2Fgithub-flow.png?alt=media)

# Test-Driven Development

## What is test-driven development?

Test-driven development is a software development approach that converts software requirements to test cases. Test cases are made to specify what the code should do. Test cases are created for each functionality. Before fully developing the software, the development is tracked by testing repeadetly, if the test fails, new code is written in order to pass the test. The test-driven approach relies on software requirements being converted to test cases.

## What are software requirements?

A software requirement is a description of what the system should be doing, it is “a property that must be exhibited by
something in order to solve some problem in the real world”. Requirements reflect the needs of different people at various levels of the organisation. There are two types of system requirements, functional(what the systemm should be doing) and non-functional(how the system should be doing).


# How to apply test-driven development?

![img-pyramid-d.png](https://test.io/wp-content/uploads/2018/11/img-pyramid-d.png)

# Unit Testing

![unit_testing.jpg](https://www.tutorialspoint.com/software_testing_dictionary/images/unit_testing.jpg)

## What is Unit Testing?

Unit testing is a software testing technique where individual units, chunks of software gets tested in isolation to see if every part of the software is working as it is planned.

* Helps detect bugs in early stage.
* Reduces cost of later tests.
* Helps refactoring and making further changes.

These tests are written in a way that the tests explain the property of the unit of is explained manually. For example, if you have a multiplication function, in the unit test you check this function by, using library functions or manually multiplying the parameters and checking if the result of the function is equal to the test result. There are several frameworks that help with unit testing, such as pytest, unittest...

In [22]:
def addition(x, y):
    return x + y

if addition(4, 4) == 8:
    print("Correct")
else:
    print("Wrong!")

Correct


## How is this going to help me as a data scientist?

Well, you might want to write functions to make things more dynamic, readable.

In [5]:
import pandas as pd

def load_data():
    data = pd.DataFrame([[1, 5, 8, 2, 0, 5],
                         [6, 3, 5, 0, 6, 8],
                         [1, 2, 5, 3, 1, 7],
                         [4, 7, 3, 4, 1, 4],
                         [1, 7, 5, 3, 2, 1],
                         [1, 4, 6, 2, 3, 4]], 
                         columns = [f"feature_{i}" if i!=0 else "class" for i in range(6)])
    return data

def groupby_mean(df, column):
    return df.groupby("class")[column].mean().to_dict()


data = load_data()
expected = {1: 4.5, 4: 7.0, 6: 3.0}
actual = groupby_mean(data, "feature_1")
assert actual == expected

# Coverage

* Code coverage measures the amount of code that is being tested. We talked about how a unit test performs on units of software, here we check how much of the software our tests actually cover. This can help seeing which parts of the software are not getting tested, which can lead writing and performing more tests. 
* Test coverage checks how much of the features are getting tested, how impactful the tests are. It can help with identifying meaningless tests that are costly.


There are several code coverage techniques, such as:
* Statement Coverage
* Decision Coverage
* Branch Coverage
* Function Coverage

These techniques can be exercised manually or using tools.

## Statement Coverage

Statement coverage is concentrated on how the software works. It is a testing technique that calculates the number of executed statements in the source code. The main goal is to cover all possible statements in source code.

**How to calculate it?**


                Statement Coverage = Number of Executed Statement / Number of Possible Statements


With statement coverage we can detect **unused statements, unused branches, missing statements***

## Decision Coverage

Decision coverage covers every possible branch from each decision point by executing at least once, checking if all reachable code is executed.

**How to calculate it?**


                Decision Coverage = Number of Resulted Decision Outcomes / Number of Decision Outcomes

## Branch Coverage

The goal of branch coverage is to check if every possible branch is executed at least once.

**How to calculate it?**


                Branch Coverage = Number of Executed Branches / Number of Possible Branches

## Function Coverage

This type of coverage checks if all possible functions are called at least once.

**How to calculate it?**


                Function Coverage = Number of Executed Functions / Number of Possible Functions