
# <span style="color:rgb(106,127,16)">Working with git</span>



**Git** a distributed version control system (and is a program often used independently  of python). A version control system tracks the history of changes in projects with many files, including data files, and codes, which many people  access simultaneously. Git facilitates identification of changes made, fetching revisions from a cloud repository in git format, and pushing revisions to the cloud.


[**GitHub**](https://github.com) is a cloud server that specializes in serving data in the form of `git` repositories. Many other such cloud services exists, such as  Atlassian's [**BitBucket**](https://bitbucket.org).


Here we describe how to access the course materials from a remote git repository.  We will also use this opportunity to introduce some object-oriented terminology like classes, objects, constructor, data members, and methods.


## Course materials in GitHub

We shall use `git` for organizing materials in this course. All course content is located in this GitHub repository: 

https://github.com/jayggg/mth271content

It is being updated as new materials are being developed. 

- All students should know how to fetch materials from this repository. Running this notebook (see below) will pull the updated data.


- If you want to know more about git, there are many resources online, such as 
the [Git Handbook](https://guides.github.com/introduction/git-handbook/). The most common way to fetch materials from a remote repository is using `git`'s command line tools, but for our purposes, the following python code will suffice. 


## Working with git within python 

We shall use the python package `gitpython` to work with `git`. We use  the same approach that  we briefly 
glimpsed in [01 Overview of tools applied to the COVID-19 case](01_Overview_of_tools_applied_to_COVID-19_example.ipynb) to pull the most up to date data on COVID-19 from another GitHub repository. The [documentation of `gitpython`](https://gitpython.readthedocs.io/en/stable/) contains a lot of information on how to use its facilities. The main facility is the **class** called `Repo` which it uses 
to represent git repositories.

In [1]:
from git import Repo

Python is an object-oriented language. Everything in the workspace is an **object.** An object is an instance of a class. The definition and features of the class `Repo` were imported into this workspace by the above line of code. A class has **members**, which could be **data members** (some objects in its memory layout), or function members, called **methods**, which provide functionalities of the class.   

You can query the functionalities of `Repo` using `help`. We will use the method `clone_from` below:  pay particular attention to reading its documentation in the output below:

In [2]:
help(Repo)

Help on class Repo in module git.repo.base:

class Repo(builtins.object)
 |  Repo(path=None, odbt=<class 'git.db.GitCmdObjectDB'>, search_parent_directories=False, expand_vars=True)
 |  
 |  Represents a git repository and allows you to query references,
 |  gather commit information, generate diffs, create and clone repositories query
 |  the log.
 |  
 |  The following attributes are worth using:
 |  
 |  'working_dir' is the working directory of the git command, which is the working tree
 |  directory if available or the .git directory in case of bare repositories
 |  
 |  'working_tree_dir' is the working tree directory, but will raise AssertionError
 |  if we are a bare repository.
 |  
 |  'git_dir' is the .git repository directory, which is always set.
 |  
 |  Methods defined here:
 |  
 |  __del__(self)
 |  
 |  __enter__(self)
 |  
 |  __eq__(self, rhs)
 |      Return self==value.
 |  
 |  __exit__(self, exc_type, exc_value, traceback)
 |  
 |  __hash__(self)
 |      Return h

Classes have a special method called **constructor**, which you would find listed in the output above as `__init__`. This method  is called when you type in `Repo(...)` with the arguments allowed in `__init__`. Below, we will initialize a `Repo` object using our course repository. 

To begin, let us identify and specify a location on your computer where you want the course materials to reside. Please put your folder name into a string variable `coursefolder`:

In [3]:
coursefolder = '/Users/Jay/tmpdir/'

Python provides a module `os` to perform operating system dependent tasks in a portable (platform-independent) way. If you did not give the *full* name of the folder, `os` can attempt to produce it as follows:

In [4]:
import os
os.path.abspath(coursefolder)

'/Users/Jay/tmpdir'

Please double-check that  the output is  what you expected: if not,  please go back and revise `coursefolder` before proceeding.

## Fetch course materials

We proceed to clone the course materials from GitHub. By default, it will be stored in a subfolder of `coursefolder` called  `mth271content`, which is the name of the git repository.

In [5]:
mth271 = os.path.join(os.path.abspath(coursefolder), 'mth271content')
mth271

'/Users/Jay/tmpdir/mth271content'

The value of the string variable `mth271` output above is where your copy of the course materials from GitHub will reside. You can check if this folder already exists using another facility from `os`: 

In [6]:
os.path.isdir(mth271)

True

The first time you run this file, this folder is new and therefore will not exist. When you run it after you have executed the following code at least once, you would already have cloned the repository, so the folder will exist. 

The code below uses the conditionals `if` and `else` (which you should have learnt before coming to class today) to check if the folder exists:
- If it does not exist, a new local copy of the GitHub repository is **cloned** into your local hard drive.
- If it exists, then only the **differences** (or updates) between your local copy and the remote repository are fetched, so that your local copy is up to date with the remote.
    

In [7]:
if os.path.isdir(mth271):       # if repo exists, pull newest data 
    repo = Repo(mth271) 
    repo.remotes.origin.pull()
else:                           # otherwise, clone from remote
    repo = Repo.clone_from('https://github.com/jayggg/mth271content', mth271)


- Here `repo` is an **object** of **class** `Repo`. 
- `Repo(mth271)` invokes the constructor, namely the `__init__` method.
- `Repo.clone_from(...)` calls the `clone_from(...)` method.

Now you have the updated course materials in your computer in a local folder. The object `repo` stores  information about this folder, which you gave to the constructor in the string variable `mth271`, in a **data member** called `working_dir`. You can access any data members of an object in memory and you do so just like you access a method, using a dot `.` followed by the member name. Here is an example:

In [8]:
repo.working_dir

'/Users/Jay/tmpdir/mth271content'

Note how the `Repo` object was either initialized with `mth271` (if that folder exists) or set to clone a remote repository at a URL. 


A copy of the few relevant lines needed for updating course materials are included in file `update_course.py` within the `pyfiles` folder. You should move this file outside of the repository and save it after changing the value of the string `coursefolder` to your specific local folder name. Then any time you need to update the course materials, you just need to run `update_course.py` using python on the command line.




<hr>




<span style="color:rgb(106,127,16); font-size:8pt">These materials were created by</span> [<span style="color:rgb(106,127,16); font-size:8pt">Jay Gopalakrishnan</span>](http://web.pdx.edu/~gjay/) <span style="color:rgb(106,127,16); font-size:8pt">for a sophomore course (MTH 271) offered during the Spring 2020 quarter at Portland State University, and are made available under the</span> [<span style="color:rgb(106,127,16) ; font-size:8pt">CC-BY-SA license</span>](https://creativecommons.org/licenses/by-sa/4.0/legalcode).
 