# Tech skills
- where are we?
    - soft skills
    - **tech skills**
    - preprocessing
    - common data science use cases
    - projects
- Goal: present few common tools and best practices to get the work done.

- **git**
    - self service
    - cloud
    - collaboration

- code editors

- virtual envs (sharing dependencies on other libraries in team)

- python
    - scientific stack
    - basic code quality best practices

# Check
Do you know at least 80% of those?
- git commit, merge, pull, push, fetch, checkout, branch, HEAD
- working directory, stage, remote, origin, clone

Then see you next time.

# Tech skills part 1: Version Control

## Motivation
What do we want to solve here?

### One source of data

![image.png](pics/data_lol1.png)

### Robust distribution of code

![image.png](pics/data_lol2.png)

### Gracefully handling of simultaneous change in one file

![image.png](pics/data_lol3.png)

### part 2

![image.png](pics/data_lol4.png)

### Gracefully handling conflicts between updates

![image.png](pics/data_lol5.png)

### Having clear history of snapshots and one canonical version of truth

![image.png](pics/data_lol6.png)

### Solution?!

![image.png](pics/data_lol7.png)

### Real Solution?

## Version control system

There are many version control systems (git, svn, mercurial even google docs..) but Git is currently the most wildely used. So we're gonna learn this one.

Version control handles

- **local back up** of files, allows us to test and **prototype** easily (branching out)

- save the back ups **online**, access them from multiple places

- **collaborate** on code

# Version control in Git part 1: Self service
How to back up, prototype text files properly.

## How to install

- Mac OS
    - by `brew install git` (you need Homebrew https://brew.sh/)
    - or by installing Xcode Command Line Tools (type `git --version` in terminal)
    - or http://git-scm.com/download/mac
- Windows
    - by installing from https://gitforwindows.org
    - (or through full install of https://cmder.net/)
    - (or through Linux subsystem)
- Linux
    - http://git-scm.com/book/en/Getting-Started-Installing-Git
    
(partially taken from https://rogerdudler.github.io/git-guide/)

## Basic Terminology

### Repository (aka "repo")
- represents project that will be versioned
- is a branching history of snapshots

### Commit
- snapshot of state of the repo


### Branch
- Alternative version of the repository

### HEAD
- Currently active snapshot

## Example

![image.png](pics/intro-git-tree/master-commit1-editor.png)

![image.png](pics/intro-git-tree/master-commit2-editor.png)

![image.png](pics/intro-git-tree/master-commit3-editor.png)

![image.png](pics/intro-git-tree/master-commit4-editor.png)

![image.png](pics/intro-git-tree/master-commit5-editor.png)

![image.png](pics/intro-git-tree/master-commit6-editor.png)

![image.png](pics/intro-git-tree/master-commit7-editor.png)

![image.png](pics/intro-git-tree/master-commit8-editor.png)

![image.png](pics/intro-git-tree/master-commit9-editor.png)

![image.png](pics/intro-git-tree/master-commit10-editor.png)

### Merge

- joining one branch to the other
- git tries to merge automatically.

```sh
22:36 presentation_git_repo on master
➔ git merge address
Auto-merging letter.txt
Merge made by the 'recursive' strategy.
 letter.txt | 11 +++++++++++
 1 file changed, 11 insertions(+)
```

## Three stages of git life

"From idea to immutable snapshot"
- Three places where changes to files may live in git.

### Working directory

"Your playground"

- the current status of the files on your system
- modify files as much You want, nothing is stored in git.
- good for prototyping, experimentation

<img src="pics/git-stages/working-directory.jpg" alt="sandbox" width="400"/>

### Stage (aka Index)
"Putting things together"

- Created by "adding" (aka "staging") some changes.
- Contains selected changes, from selected files, ready to be commited.
- Allows for grouping related things together in commits

<img src="pics/git-stages/staging.jpg" alt="disassembled engine" width="400"/>

### Repository
"Bookkeeping"

- Changes from stage added to repository by commiting
- Permanent, immutable (sort of) store of records

<img src="pics/git-stages/repo.jpg" alt="ancient chronicles" width="400"/>

### Why bother with staging at all?
- Makes git performant
- Allows of preview changes to be commited
- Allows to group related changes together

### Traversing between stages

![image.png](pics/git-stages.png)

(actually the arrows can also go the other way with little bit of git-fu)

## Hands on: part 1

In command line, go to desired folder and type:

```sh
git init
```

This creates an empty git repository.

type `git status`
```sh
> git status
On branch master

No commits yet

nothing to commit (create/copy files and use "git add" to track)
```

```sh
touch my_project.py
```

```sh
> git status
On branch master

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	my_project.py

nothing added to commit but untracked files present (use "git add" to track)
```

See that git helps
```sh
git add
```

```sh
Nothing specified, nothing added.
Maybe you wanted to say 'git add .'?
```

```sh
git add my_project.py
```

(use `git add -p` for changes in existing file, will ask for each change separately)

```sh
> git status
On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

	new file:   my_project.py
```

```sh
git commit -m "Add file with my new cool project"
```

You can ask for help
```sh
git commit --help
```

```sh
GIT-COMMIT(1)                       Git Manual                       GIT-COMMIT(1)



NAME
       git-commit - Record changes to the repository

SYNOPSIS
       git commit [-a | --interactive | --patch] [-s] [-v] [-u<mode>] [--amend]
                  [--dry-run] [(-c | -C | --fixup | --squash) <commit>]
                  [-F <file> | -m <msg>] [--reset-author] [--allow-empty]
...
```

tldr is nice (https://tldr.ostera.io/git-commit) or https://tldr.sh/#installation
```sh
tldr git commit
```


```sh
Commit files to the repository.
- Commit staged files to the repository with a message:
    git commit -m message

- Auto stage all modified files and commit with a message:
    git commit -a -m message

- Replace the last commit with currently staged changes:
    git commit --amend

- Commit only specific (already staged) files:
    git commit path/to/my/file1 path/to/my/file2
```

Status again clean:
```sh
> git status
On branch master
nothing to commit, working tree clean
```

Let's update the file add it to stage and commit the changes.

```sh
git commit -m "Update file"
```

Look at history
```sh
git log
```

```sh
commit 1fbb5c60d09d385d15205d8e1208c7634ef16e8f (HEAD -> master)
Author: Viktor
Date:   Tue Oct 1 23:34:49 2019 +0200

    Update file

commit 577f2d6206b45a59c1ddc33e2220ac4e1aef1648
Author: Viktor
Date:   Tue Oct 1 23:32:52 2019 +0200

    Add file with my new cool project
(END)
```

Move state of the project to the previous commit
```sh
git checkout 6676844adf1745abef38b1046195a2b122e94755
```

Previous commit can be also referenced with:
```sh
git checkout HEAD^
```

Move back to the last commit in the branch:
```sh
git checkout master
```

## Your turn
- Init a repo in folder with your project.
- Add files to stage and commit them, in reasonable groups.
- Checkout to some previous commit and back.

# Version control part 2: Cloud
How to talk with online repository

## Terminology part 2
- **Remote** (repository online)
- **Remote providers** (Github, Gitlab)
- **Clone** (download remote repository)
- **Fetch** (update information about remote repository)
- **Pull** (download and update current branch)
- **Push** (upload)

### Traversing between stages v2

![image.png](pics/git-flowchart-with-remote.png)

## Adding your local repo to Github
https://help.github.com/en/articles/adding-an-existing-project-to-github-using-the-command-line

![image.png](pics/create-repo1.png)

![image.png](pics/create-repo2b.png)

![image.png](pics/create-repo3.png)

```sh
git remote add origin https://github.com/username/my_repo.git
git push --set-upstream origin master
```

## Downloading someone else's repo

![image.png](pics/clone-repo1.png)

```sh
mkdir my_local_repo
git clone https://github.com/username/my_repo.git my_local_repo
```

### Github live demo 1

https://github.com/tensorflow/tensorflow
- README.md
- commits, with diffs
- branches
- git clone

## Your turn
- 1 person from team creates repo in their Github
- Add theirs local repo to the remote
- Other team members clone this remote repo.

# Version control part 3: Collaboration
How to work on a text file in a team.

## Terminology part 3
- Github:
    - Issues
    - Pull requests
- Merge conflicts

### Github Issues

- Request and bug tracker
- Discuss the issue, then open a pull request for it.

![image.png](pics/github-issues.png)

### Github projects

- Like a Trello but for Github issues

![image.png](pics/github-project.png)

### Github projects

- Like a Trello but for Github issues

![image.png](pics/github-project.png)

### Pull request
- process for merging feature branches into master branch
- allows for review of other team members

https://github.com/tensorflow/tensorflow/pull/31132
(check descriptions, files changed, reviewers, comments, conflicts)

### Merge conflict
- Git is not able to automatically resolve changes from two branches
- Conflicts need to be solved manually

```sh
> git merge indent
Auto-merging letter.txt
CONFLICT (content): Merge conflict in letter.txt
Automatic merge failed; fix conflicts and then commit the result.
```

![image.png](pics/merge-conflict.png)

## GUIs
- gitkraken
- sourcetree
- sublime merge
- present in IDEs

# Demo

## Further reading

- https://agripongit.vincenttunru.com/: very nice visual tutorial to git branching, merging
- [gitignore: how to ignore files in project we don't want to ever commit](https://medium.com/@haydar_ai/learning-how-to-git-ignoring-files-and-folders-using-gitignore-177556afdbe3)
- working with large files in git
  - [limits of files sizes in github](https://help.github.com/en/articles/working-with-large-files)
  - [git lfs: versioning large files properly](https://help.github.com/en/articles/versioning-large-files)