# Git version control system

## Why we care?
- Do you recall receiving/sharing files via email like **file_rev01.doc ... file_rev02.doc, file_final.doc, file_final1.doc or file_final_final.doc**
- Making copies of projects/folders like **project_20aug2018**, **project_02sep2018 etc** in your computer

##  So what? What are the issues here?

- Most of the content is similar. Delta change is quite small.
    + Wasteful of  storage space
- Even in a month or a year, it is hard to understand what is the difference between different files/ folder.
    + What is the timeline of changes. Why and what was done?
- 
. 
. 
- 

If we go back to adding suffix [like \_01, \_02, ...]. One thing we need to ask is, why we did this?

+ Maybe \_01, \_02 is some kind of implicit timeline.
    - one issue is we seldom keep meta data about different version of a file. Going back to a particular version is very hard.
+ Parallel changes are not advisable. Who will resolve conflict(**merge conflict**) in words, lines etc.
    - e.g. you are working  with someone and you need to keep updating documents alternatively to produce final document.
+ If we agree on parallel changes to the documents, someone has to take care of merging.

During my students life. I have worked with people in this fashion. It was time consuming and not productive/

__Clearly we need a system(version control or source contro) to handle above issues.__

A version control or source control
- Allows to keep track of history of files
- Allows creating different version of files

|<center> <h1> Version control </h1> </center>

# Types of version control system
- Centralized
- Distributed

**Centralized version control system  have _single point of failure_**


|<center> <h1> Distributed version control system </h1>
    <img src= "./figures/git_dist_arch.png" height= "500" width = "700">  </img>
</center>



code for above graph
digraph G {
  "remote repo" -> "local repo1";
  "remote repo" -> "local repo2";
  "remote repo" -> "local repo3";
}

In Distributed version control system

- Each user has a copy of repository.
    + Copying is called **cloning.**
    + Resulting copy is called **clone.**
    
**_Git is popular distributed version control system._**

# Some terminology


**Repository:  _Place Where version of files are kept_**. A repository can be cloned using git tool (command: __git clone repository url__)

e.g you can clone current course repository as

<font size="8"><b> git clone https://github.com/psnegi/data_science_tools1.git </b> </font>


After cloning one has complete repository with all the history.



Let's try cloning course repository. 

__Do you remeber how to run commands in jupyter notebook?__

In [1]:
# write git clonning command here


In [2]:
! git clone  https://github.com/psnegi/data_science_tools1.git

Cloning into 'data_science_tools1'...
remote: Enumerating objects: 163, done.[K
remote: Counting objects: 100% (163/163), done.[K
remote: Compressing objects: 100% (98/98), done.[K
remote: Total 163 (delta 62), reused 142 (delta 44), pack-reused 0[K
Receiving objects: 100% (163/163), 3.86 MiB | 5.22 MiB/s, done.
Resolving deltas: 100% (62/62), done.
Checking connectivity... done.


**Let's verify course folder in local file system**

In [3]:
# write directory listing command here and see the output

## Working Tree
Collection of files in a local repository is called working tree.

A file in the working tree can have following state


<table style="width:100%" >
    
  <tr>
    <th> <font size = "6"> state </font> </th>
    <th> <font size = "6">Description </font></th> 

  </tr>
  <tr>
    <td> <font size = "6">untracked </font></td>
        <td> <font size = "6">file not tracked by git </font></td>
    <td Smith</td>
  </tr>
  <tr>
    <td><font size = "6">tracked </font></td>
    <td><font size = "6">committed and not staged </font></td>
  </tr>
  <tr>
    <td><font size = "6">staged </font></td>
    <td><font size = "6"> to be included in next commit <font></td>
  </tr>
  <tr>
    <td><font size = "6">dirty/modified </font></td>
    <td><font size = "6"> file has changes but not staged <font></td>
  </tr>
</table>



## But wait. How one creates remote reopsitory on github.
Click on top right + icon in www.github.com

<img src = "./figures/create_remote_rep01.png"> </img>

give a name(like **analysis_of_word_bank** in the following image) -> select private or public -> check Initialize this repository with a README(optional) >add licence and .gitignore(optional) ->  **then click on create repository**
<img src= "./figures/create_remote_rep02.png" height = "800" width = "800"> </img>

follow on of the option to connect new or exisiting local repository 

<img src= "./figures/create_remote_rep03.png" height = "1000" width = "1000"> </img>

<center> <h1> Demo for creating remote repository on github.com </h1> </center>

**We have local and remote repository.**

Next natural things to understand is.

- How to add file ?
- How to synchornize files between local and remote ?

# Adding to a Git repository
Do the following two step to make changes persist to your local repository.

**git add**: add the selected changes to the staging area

**git commit -m "commit message"**: commit the staged changes into the Git repository

**git status**: to check status of local file.

<img src = "./figures/git_add_repo.png"> </img>

dot code for above figure
digraph G {
   rankdir="LR";
  "working tree" -> "staging area";
  "staging area" -> "Git repository";
}

# File adding demo
- create a files
    + echo "This repository contains world bank loan analysis" > README.md
        + echo "print(list([i**2 for i in range(10)]))"> square.py
- git status
- git commit -m "world bank loan lending analysis"
    + it generates a unique 40 char long hashes(SHA-1 hashes). It like timestamping the repository. We can go back to this state of repositoro]y if we know the commit id.



**run git show to see last commit id**

<center> <h1> is README.md, square.py in remote repository? </h1></center>

<center> <h1>How to synchornize  to remote repository? </h1></center>

<center > <b>git push origin master  </b> is used to synchronize the local repository to remote repository.</center>

- origin refers to repository you cloned from

- master refers to local branch(which is master right now)

 <center> <b>  Check remote repository on github </b> </center>

# Why do we need branch concept
Why we care about branch concept?
Note that master branch generally contains production code.


- Allows to work on different versions of your files
- By switching to different branches one can work on different features.



*__You can also work on your clone of master.  Finish assigned features/work and after testing can push code to remote master(production).*__

__*run git branch to see all the branches in your repository*__

__*Actually pushing to remote in real production environment is bit more involved*__

- Run some some unit and integration test for assigned feature.
- If success push to remote master
- Run whole integration suite in deployment pipleline.
    + some stage may mock externel dependencies.
- Promote to production.
    + Manual or continous.


To work on new feature/issue you create a branch. Let's say square.py was supposed to print square of number upto 10. Clearly is print only upto 9. Hence we need to fix this. Let's say we got assiged this feature

- We need to create a branch. Use **git checkout -b branch_name**. It will create and switch you to the new branch. **run git branch to confirm this**.
- modify the code and **git add square.py, git commit -m "fixed range issue in square"**
- Note that branch is local. You may push branch to remote.
    + git push  origin branch_name

## How to get this fix into local master branch(production) and push it to remote.

- switch to master and run **git merge branch_name**
- **git pull origin master**. See if remote master has change.
    + this may results in merge conficts.
- **git push origin master**    

# Resources
## Book
- [Pro Git book](https://git-scm.com/book/en/v2) by Scott Chacon and Ben Straub 
- [github guides](https://guides.github.com/)