#GIT for Colab:  Google Drive -- COLAB -- GitHub Integration

**Important Tip:**  

Do not store your OAuth Token anywhere within a file that could be uploaded to GitHub.  GitHub will determine you have done this, and will reject your OAuth Token immediately (you can create a new one, however).
</br></br>

**Public Repositories on GitHub:**  

The code in this notebook is presently set up for connection with GitHub **public** repositories.  StackExchange can give you more information on how to connect securely with a **private** GitHub repository using **ssh**, for example.
</br></br>

**Miscellaneous Collection of Code Blocks Present In this IPynb:**  

You most likely will not want to run all code cells in this notebook at any given time.  Select the blocks of code relevant to your needs, and do not run the others.
</br></br>

**The following code blocks assist with:**

1. (Required on setup; only done once) GitHub 'OAuth Token'-enabled connection between Google Drive and GitHub

2. (Required) Mounting Google Drive so Colab has programmatic access to your Google Drive files

3. (Optional) Clone a GitHub remote repository to a fresh and clean Google Drive location

4. (Optional) git pull code to update your local Google Drive repo

5. (Optional) git add/commit/push code to send updated files to remote GitHub repo

6. (Optional) collection of some useful git commands to check status and/or debug


##REQUIRED (once only):  Get your GitHub OAuth Token

You will want a GitHub OAuth Token to provide you access to push any commits to the GitHub remote repo.  

If you do not already have one, go to [github.com/settings/tokens](https://github.com/settings/tokens/new) and create a new *personal access token* for "Google Drive Repo".  You may ignore the advanced options for the scope of the OAuth token and simply enable the first checkbox for "repo" full control of private repositories.
</br></br>
**Save your OAuth Token as a single line in a text file named "*GitHub_Token.txt*" that is stored in your default Colab directory**
</br></br>
Once you have received and saved your token as directed, you should not have to do it again for this repo, and can enjoy many pushes and pulls for the remainder of time.

**Tips and Notes:**

At the link above, GitHub will provide you with an OAuth Token that can authenticate your use of the GitHub API so that you will be able to push commits to the GitHub remote repo. (Token not required if you just want to download files from public repo.)

The token will resemble a 40-character combination of numbers and letters, and will be unique to you (i.e., not unique to the repo).  **DO NOT share this token with others**, even if you are in a "committed relationship."  These "relationships" have a way of failing quite often, and then your "partner" can trash years of your valuable labor.  ;)

Your **default Colab directory** is created when you first set up Colab.  If you use Google's standard settings, this directory will be titled "Colab Notebooks" and it will reside at the top level of your Google Drive.  Once you mount your Google Drive in Colab, Colab will recognize this as a directory inside "*/content/drive/My Drive*" 

##REQUIRED:  Enter your personal info and run the setup code in this section

Adjust the following information as you require

If Colab disconnects this notebook's runtime, then when you reconnect, you may or may not have to run each of the code blocks in this section below.  (It doesn't hurt to run them all, but you may not have to.)

In [0]:
##################################################################
# Required Code Cell - Must Run This To Enable Any Future Actions
#    (and adjust text for variables in your own environment)
##################################################################

OAUTH_TOKEN_FILENAME = 'GitHub_Token.txt'
COLAB_GDRIVE_MOUNTPOINT = '/content/drive'  # leave this unchanged unless you know something
COLAB_DEFAULT_DIR = 'My Drive/Colab Notebooks'  # leave this unchanged unless you explicitly created a different default Colab directory
GDRIVE_PATH_TO_LOCAL_REPO = 'NRUHSE_2_Kaggle_Coursera/final'  # this is the directory (relative to Colab Default) in which you will have cloned the remote GitHub repo
GIT_REPO_MASTER = 'Kag'  # Name of master branch on GitHub
GIT_USERNAME = 'migai'
GIT_USER_EMAIL = "gaidis@alum.mit.edu"

from pathlib import Path
import os

GDRIVE_HOME = Path(COLAB_GDRIVE_MOUNTPOINT)                 # "/content/drive/My Drive/Colab Notebooks/
COLAB_HOME = GDRIVE_HOME / COLAB_DEFAULT_DIR                # "/content/drive/My Drive/Colab Notebooks/
TOKEN_FILE = COLAB_HOME / OAUTH_TOKEN_FILENAME              # "/content/drive/My Drive/Colab Notebooks/
GDRIVE_CLONE_PATH = COLAB_HOME / GDRIVE_PATH_TO_LOCAL_REPO  # "/content/drive/My Drive/Colab Notebooks/NRUHSE_2_Kaggle_Coursera/final"
GDRIVE_REPO_PATH = GDRIVE_CLONE_PATH / GIT_REPO_MASTER      # "/content/drive/My Drive/Colab Notebooks/NRUHSE_2_Kaggle_Coursera/final/Kag"

In [2]:
##################################################################
# Required Code Cell - Must Run This To Enable Any Future Actions
##################################################################

# This code will mount your personal Google Drive in Colab at "/content/drive"
#   You will be presented an input textbox for your authorization code to give Colab access to your Google Drive
#     To obtain this code, click the lengthy "accounts.google.com" link above the input textbox, and allow use of your Google account
#     Then copy the resulting lenghty passcode provided by Google on the new browser tab, and paste it into
#     the input textbox, and then enter it.  Colab crunches for a few seconds, and then should return a message that your drive is mounted.
from google.colab import drive
drive.mount(COLAB_GDRIVE_MOUNTPOINT)

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


In [0]:
##################################################################
# Required Code Cell - Must Run This To Enable Any Future Actions
##################################################################

# Rather than explicitly entering your GitHub OAuth token, placing it in a file
#   outside your repo (but still on your Google Drive) allows you to put this IPynb
#   in your repo.  GitHub won't deauthorize your token.
# We do, however, have to mount the Google Drive in Colab before we can 
#   have Colab read in the OAuth token (thus, drive.mount() in previous cell)

GIT_TOKEN = Path(TOKEN_FILE).read_text()
GITHUB_REPO_PATH = Path("https://" + GIT_TOKEN + "@github.com") / GIT_USERNAME / (GIT_REPO_MASTER + ".git")

##Optional:  Pull Repo from GitHub to Google Drive
**Use the following code cell to update your local Google Drive files**

In [4]:
##################################################################
# OPTIONAL Code Cell - Only run this if you already have a cloned repo on Google Drive and you wish to pull from remote GitHub repo
##################################################################

os.chdir(GDRIVE_REPO_PATH)
!git pull origin master

From https://github.com/migai/Kag
 * branch            master     -> FETCH_HEAD
Already up to date.


##Optional:  Push Google Drive local repo to GitHub

In [5]:
##################################################################
# OPTIONAL Code Cell - Only run this if you already have a cloned repo on Google Drive and you wish to push files to remote GitHub repo
#   ** Change the "push_message" text as necessary
##################################################################


push_message = "v2.5 IPynb investigates shop-item pairs"



################################################
os.chdir(GDRIVE_REPO_PATH)
!git config user.email "{GIT_USER_EMAIL}"
!git config user.name "{GIT_USERNAME}"

# make sure we are in the correct location on GitHub
#!git remote remove origin   
#!git remote add origin "{GITHUB_REPO_PATH}"

!git add .
!git commit -m "{push_message}"
!git push origin master

[master 2ef12e9] New IPynb for Git Coordination in Colab
 3 files changed, 1 insertion(+)
 create mode 100644 helper_code/Git_enabled_Colab_with_GoogleDrive_and_GitHub.ipynb
 rename helper_code/{Enable_Colab_git_GitHub-GDrive.ipynb => OLD_Enable_Colab_git_GitHub-GDrive.ipynb} (100%)
 rename helper_code/{template_for_Clone_GitHub_to_GDrive.ipynb => OLD_OLD_template_for_Clone_GitHub_to_GDrive.ipynb} (100%)
Counting objects: 4, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), 6.50 KiB | 1.63 MiB/s, done.
Total 4 (delta 2), reused 0 (delta 0)
remote: Resolving deltas: 100% (2/2), completed with 2 local objects.[K
To https://github.com/migai/Kag.git
   f3ca897..2ef12e9  master -> master


##Optional:  Clone Repo from GitHub to Google Drive
***Beware:***

**Use the following code cell ONLY if you are starting a new local Google Drive repo**

(see next code section if you have an existing local Google Drive repo)

In [0]:
##################################################################
# OPTIONAL Code Cell - Only run this if you are starting with a new repo cloning
##################################################################

# Create empty folder to hold the cloned repo (if not done already), and then navigate to it
Path.mkdir(GDRIVE_CLONE_PATH, exist_ok=True)  # "exist_ok = True" ignores error if you have already made the directory
os.chdir(GDRIVE_CLONE_PATH)

# clone it
!git clone "{GITHUB_REPO_PATH}"

##Status Check

In [0]:
os.chdir(GDRIVE_REPO_PATH)
!git status

/content/drive/My Drive/Colab Notebooks/NRUHSE_2_Kaggle_Coursera/final/Kag
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	[31mmodified:   ipynb_versions/MG_EDA_v2.4.ipynb[m

no changes added to commit (use "git add" and/or "git commit -a")


##Tips for Fixing Errors

In [0]:
!git merge --abort
!git pull origin master

From https://github.com/migai/Kag
 * branch            master     -> FETCH_HEAD
hint: Waiting for your editor to close the file... error: unable to start editor 'editor'
Not committing merge; use 'git commit' to complete the merge.


In [0]:
!git blame data_output/shops2.csv

In [0]:
Path.cwd()

PosixPath('/content/drive/My Drive/Colab Notebooks/NRUHSE_2_Kaggle_Coursera/final/Kag')

#Alternative Process
see, for example, Oleg Zero's post at [towardsdatascience.com](https://towardsdatascience.com/colaboratory-drive-github-the-workflow-made-simpler-bde89fba8a39) from October, 2019 and his associated [GitHub repositories](https://github.com/OlegZero13)

also, a slightly more detailed, updated version (December, 2019) of Oleg's post is available [here](https://towardsdatascience.com/google-drive-google-colab-github-dont-just-read-do-it-5554d5824228).

##Modifications to the above workflow:
Oleg creates a temporary directory on Google Drive in which to clone the GitHub repo, then copies the cloned files into the intentioned directory, and removes the temporary files.  I'm not exactly sure why he feels "recloning" is necessary, as opposed to just re-adding the origin and performing a pull operation.  (Oleg mentions "A nice thing about this solution is that it won’t crash if executed multiple times. Whenever executed, it will only update what is new and that’s it.") 
Anyhow, this code is shown below, if for some reason it becomes useful in the future.

In [0]:
!mkdir ./temp
!git clone "{GIT_PATH}"
!mv ./temp/* "{PROJECT_PATH}"
!rm -rf ./temp

##Importing (syncing) files Google Drive --> Colab VM
##and copying files from Colab VM to Google Drive
Oleg also provides an example of how one could load files from the local Google Drive repo into Colab using the !rsync command. It collects everything that belongs to the Drive directory and copies it into our local runtime.
Also, with rsync we have the option to exclude some of the content, which may be unnecessary or take too long to copy (the example below excludes the import of the directory "data" into the Colab VM)

*To Do:  I need to read more about the rsync command*

In [0]:
!rsync -aP --exclude=data/ "{PROJECT_PATH}"/*  ./

In [0]:
# Copying files from the Colab VM to the Google Drive long-term storage can be done with the !cp command:
!cp -r ./* "{PROJECT_PATH}"

##One last thing:  quick writing and reading of .py with Colab
Here is one link to some tips:  https://colab.research.google.com/notebooks/io.ipynb

Oleg's post describes using magic commands within Colab to write code from within the Colab notebook, and how one might go about reloading the code after modification such that Colab recognizes the changes (as opposed to using !shred command):

![Code for %%writefile and %reload_ext](https://miro.medium.com/max/1334/0*IlOTzOp9dYEMiTp6.png)

