##**git for Colab:**
###Google Drive -- Colab VM -- GitHub Integration
---
---
###**Caution: Do not blindly run all cells in this notebook.  </br> Each different section should be run only if it corresponds to your desired actions.**
---
---
When **setting up a new local repo**, proceed in this order:
*  Section 1: OAuth Token
*  Section 2: Personal Parameters
*  Section 3: Mount Google Drive
*  Section 6: Clone Repo
*  Then, use sections 4 and 5 (pull and push) as you wish
</br></br>
---
---
When operating with an **existing local repo**, proceed in this order:
*  Section 2: Personal Parameters
*  Section 3: Mount Google Drive
*  Then, use sections 4 and 5 (pull and push) as you wish
</br></br>
---
---


####**Public Repositories on GitHub:**  

The code in this notebook is presently set up for connection with GitHub **public** repositories.  StackExchange can give you more information on how to connect securely with a **private** GitHub repository using **ssh**, for example.

####**The following code blocks assist with:**

1. (Required on setup; only done once) Obtaining your **GitHub 'OAuth Token'** to enable ***push*** commands from your Google Drive to the public repo on GitHub

2. (Required) Loading your **personal parameters** regarding file locations, GitHub credentials, etc.  

3. (Required) **Mounting Google Drive** so Colab has programmatic access to your Google Drive files

4. (Optional) **git pull** code to update your local Google Drive repo

5. (Optional) **git add/commit/push** code to send updated files to remote GitHub repo

6. (Required on setup; only done once) **Clone** a GitHub remote repository to a desired Google Drive location.  *Do this before doing any pushes or pulls, but only do the cloning once for any given repo.*

7. (Optional) collection of some useful git commands to check status and/or debug

8. Random collection of other stuff found on the web that I didn't want to forget.  Probably not useful to anyone else, but enjoy if you'd like.

##1. **Obtain GitHub OAuth Token**
Do this **once only**.  The first time you set up your local repo on Google Drive, you must follow the instructions in this section.

You do *not* need to repeat this procedure again, as long as your token remains valid.


---
---



The GitHub OAuth Token provides you access to ***push*** any commits to the GitHub remote repo.  (You must have "contributor" authority on the GitHub repo to **push** commits, irrespective of Colab / Google Drive)

(You do *not* need a GitHub OAuth token or "contributor" authority to ***pull*** or ***clone*** from the public repo...  only to ***push***.)

---
---
</br>

If you do not already have one, go to [github.com/settings/tokens](https://github.com/settings/tokens/new) and create a new *personal access token* for "Google Drive Repo".  You may ignore the advanced options for the scope of the OAuth token and simply enable the first checkbox for "repo" full control of private repositories.
</br></br>
**Save your OAuth Token as a single line in a text file named "*GitHub_Token.txt*" that is stored in your default Colab directory (i.e., NOT inside the local repo!).**  GitHub will deauthorize your token and require you to create a new one, if GitHub detects your token in any file you upload to the remote GitHub repo.
</br></br>
Once you have received and saved your token as directed, you should not have to do it again for this repo, and can enjoy many pushes and pulls for the remainder of time.

---
---

**Tips and Notes:**

The token provided by GitHub will resemble a 40-character combination of numbers and letters, and will be unique to you (i.e., not unique to the repo).  **DO NOT share this token with others**, even if you are in a "committed relationship."  These "relationships" have a way of failing quite often, and then your "partner" can trash years of your valuable labor.  ;)

##2. **Personal Parameters**
You must run this code cell each time you start a new runtime in Colab for this IPynb.
</br></br>

---
---

**Enter *your* personal info by replacing my info in the relevant variables and run the setup code in this section**

---
---

Tip: Your **default Colab directory** is created when you first set up Colab.  If you use Google's standard settings, this directory will be titled "Colab Notebooks" and it will reside at the top level of your Google Drive.  Once you mount your Google Drive in Colab, Colab will recognize this as a directory inside "*/content/drive/My Drive*" at "*/content/drive/My Drive/Colab Notebooks*".  The code below assumes you will store your OAuth Token in this particular directory, and that your local repo will be located in a lower-level directory.

In [0]:
##################################################################
# Required Code Cell - Must Run This To Enable Any Future Actions
'''
ADJUST THE VARIABLE ASSIGNMENTS IN THIS CELL TO FIT YOUR PARTICULAR SITUATION
'''
# Examples are provided in the comments for one particular repo and Google Drive path that was used
##################################################################


OAUTH_TOKEN_FILENAME = 'GitHub_Token.txt'                     # this is a one-line text file containing only your GitHub OAuth token (see github.com/settings/tokens) -- do not place this inside your repo!!!
COLAB_GDRIVE_MOUNTPOINT = '/content/drive'                    # leave this unchanged unless you know something
COLAB_DEFAULT_DIR = 'My Drive/Colab Notebooks'                # leave this unchanged unless you explicitly created a different default Colab directory
GDRIVE_PATH_TO_LOCAL_REPO = 'NRUHSE_2_Kaggle_Coursera/final'  # this is the directory (relative to Colab Default) in which you will have cloned the remote GitHub repo
GIT_REPO_MASTER = 'Kag'                                       # Name of master branch on GitHub
GIT_REPO_PATH_PARENT = 'migai'                                # Typically, the orignator of the repo on GitHub (URL for 'Kag' repo == github.com/migai/Kag )

GIT_USERNAME = 'migai'
GIT_USER_EMAIL = "gaidis@alum.mit.edu"


##3. **Mount the Google Drive in Colab; Create GitHub OAuth URL**
You must run these code cells each time you start a new runtime in Colab for this IPynb.
</br></br>

---
---

**This will mount your Google Drive in the Colab VM, </br>and will create an authorization URL that allows you to push files to the remote GitHub repo**



In [2]:
##################################################################
# Create Paths based on personal information input above, then mount the Google Drive
##################################################################
from urllib.parse import urlunparse
from pathlib import Path
import os

GDRIVE_HOME = Path(COLAB_GDRIVE_MOUNTPOINT)                   # "/content/drive
COLAB_HOME = GDRIVE_HOME / COLAB_DEFAULT_DIR                  # "/content/drive/My Drive/Colab Notebooks
TOKEN_FILE = COLAB_HOME / OAUTH_TOKEN_FILENAME                # "/content/drive/My Drive/Colab Notebooks/GitHub_Token.txt
GDRIVE_CLONE_PATH = COLAB_HOME / GDRIVE_PATH_TO_LOCAL_REPO    # "/content/drive/My Drive/Colab Notebooks/NRUHSE_2_Kaggle_Coursera/final"
GDRIVE_REPO_PATH = GDRIVE_CLONE_PATH / GIT_REPO_MASTER        # "/content/drive/My Drive/Colab Notebooks/NRUHSE_2_Kaggle_Coursera/final/Kag"

# Printouts of the various paths to ensure your sanity and help with debug
print(f"GDRIVE_HOME = {GDRIVE_HOME}")
print(f"COLAB_HOME = {COLAB_HOME}")
print(f"TOKEN_FILE = {TOKEN_FILE}")
print(f"GDRIVE_CLONE_PATH = {GDRIVE_CLONE_PATH}")
print(f"GRIVE_REPO_PATH = {GDRIVE_REPO_PATH}\n\n")

# The following code will mount your personal Google Drive in the Colab VM at "/content/drive"
#   You will be presented an input textbox for your authorization code to give Colab access to your Google Drive
#     To obtain this code, click the lengthy "accounts.google.com" link above the input textbox that appears below
#     Then, in the new browser tab that appears, allow use of your Google account containing the Google Drive
#     This new browser tab should then present you with a lenghty passcode provided by Google.  COPY THE PASSCODE
#     Then, you can close the passcode tab and return to this browser tab to paste and enter the passcode.  
#     Colab crunches for a few seconds, and then should return a message that your drive is mounted.

# If Colab disconnects this notebook's runtime, then when you reconnect, you may or may not be prompted by Google 
#     for a passcode to mount the drive again.  I'm not sure what the requirements are, but the less time your 
#     runtime is disconnected, the more likely your drive will stay mounted (and save you the pain of going 
#     through the authorization process again).

from google.colab import drive
drive.mount(COLAB_GDRIVE_MOUNTPOINT)

GDRIVE_HOME = /content/drive
COLAB_HOME = /content/drive/My Drive/Colab Notebooks
TOKEN_FILE = /content/drive/My Drive/Colab Notebooks/GitHub_Token.txt
GDRIVE_CLONE_PATH = /content/drive/My Drive/Colab Notebooks/NRUHSE_2_Kaggle_Coursera/final
GRIVE_REPO_PATH = /content/drive/My Drive/Colab Notebooks/NRUHSE_2_Kaggle_Coursera/final/Kag


Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


In [0]:
##################################################################
# Required Code Cell - Must Run This To Enable Any Future Actions
##################################################################

# Rather than explicitly entering your GitHub OAuth token, placing it in a file
#   outside your repo (but still on your Google Drive) allows you to put this IPynb
#   inside your repo if you wish.  GitHub won't deauthorize your token, as this IPynb does not reveal it to any random GitHub browsers.
#   (You do not NEED to put this IPynb in your repo, but this way of reading your OAuth token gives you that option.)
# Note the order of code cells here:
# To have Colab read your token and thus create an authorized access path to GitHub, 
#   we had to mount the Google Drive first (as in the previous code cell above)

GIT_TOKEN = Path(TOKEN_FILE).read_text()

'''
# If you are having trouble getting a properly formatted URL for GITHUB_REPO_PATH, you can try one of the following:
# Simple string concatenation method of creating authorized URL link to GitHub repo:
GITHUB_REPO_PATH = "https://" + GIT_TOKEN + "@github.com/" + GIT_REPO_PATH_PARENT + "/" + GIT_REPO_MASTER + ".git"

# Don't use Path to join a https:// because you may get inconsistent results, where Path will remove one of the backslashes
# GITHUB_REPO_PATH = Path("https://" + GIT_TOKEN + "@github.com") / GIT_REPO_PATH_PARENT / (GIT_REPO_MASTER + ".git")
'''

# Pythonic (?) method of using urlunparse to create authorized URL link to GitHub repo
GITHUB_REPO_PATH = urlunparse(("https", GIT_TOKEN+"@github.com", (GIT_REPO_PATH_PARENT + "/" + GIT_REPO_MASTER + ".git"),"","",""))

##4. **Pull** Remote GitHub repo file updates to your local Google Drive
**Use the following code cell if you wish to update your local Google Drive files**

You *must* have a cloned copy of the repo on your Google Drive before this will work. 

In [4]:
os.chdir(GDRIVE_REPO_PATH)
!git pull origin master

From https://github.com/migai/Kag
 * branch            master     -> FETCH_HEAD
Already up to date.


##5. **Push** Google Drive local repo to GitHub
**Use the following code cell if you wish to push your local file changes to the remote GitHub repo**

Be sure to adjust the *push_message* at the top of the code cell, before you run the code cell.

You *must* have a cloned copy of the repo on your Google Drive before this will work.  You *must* also have a valid GitHub OAuth token (in the GITHUB_REPO_PATH object).

In [5]:

push_message = "separate ipynb to declutter items"


################################################

os.chdir(GDRIVE_REPO_PATH)
!git config user.email "{GIT_USER_EMAIL}"
!git config user.name "{GIT_USERNAME}"

# make sure we are in the correct location on GitHub
#  (you may comment out these two statements if things are running smoothly for you, but they do help prevent git errors, and add minimal overhead)
!git remote remove origin   
!git remote add origin "{GITHUB_REPO_PATH}"

!git add .
!git commit -m "{push_message}"
!git push origin master

[master e465694] v3.1 more inspection of item names
 5 files changed, 22174 insertions(+), 2 deletions(-)
 create mode 100644 data_output/items_delimited.csv
 rewrite helper_code/Git_enabled_Colab_with_GoogleDrive_and_GitHub.ipynb (69%)
 rewrite ipynb_versions/MG_EDA_v3.0 readable.ipynb (71%)
 create mode 100644 ipynb_versions/MG_EDA_v3.1_items_NLP.ipynb
 rename ipynb_versions/{ => old}/MG_EDA_v2.9 cleaning data.ipynb (100%)
Counting objects: 10, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (10/10), done.
Writing objects: 100% (10/10), 686.45 KiB | 2.35 MiB/s, done.
Total 10 (delta 8), reused 0 (delta 0)
remote: Resolving deltas: 100% (8/8), completed with 7 local objects.[K
To https://github.com/migai/Kag.git
   66c0194..e465694  master -> master


##6. **Clone** Repo from GitHub to Google Drive (ONLY DO THIS ONCE!)
***Beware:***

**Use the following code cell ONLY if you are starting a new local Google Drive repo.</br>And, only do this cloning once -- do not repeat for the same repo.**

---
---
</br>

The following code is commented out, to make sure you do not inadvertently run it.  If you need to start a new repo connection, uncomment the code, and run this cell immediately after mounting your Google Drive.  Only then can you safely do push or pull from your local Google Drive repo to the remote GitHub repo.

In [0]:
'''
##################################################################
# OPTIONAL Code Cell - Only run this if you are starting with a new repo cloning
##################################################################

# Create empty folder to hold the cloned repo (if not done already), and then navigate to it
Path.mkdir(GDRIVE_CLONE_PATH, exist_ok=True)  # "exist_ok = True" ignores error if you have already made the directory
os.chdir(GDRIVE_CLONE_PATH)

# clone it
!git clone "{GITHUB_REPO_PATH}"
'''

##7. **Debugging** Tips and Code Snippets

###**Number 1 Cause of Issues:  Improper Formatting of URL Path Object**
From the code in the cell after mounting your Google Drive, make sure your GITHUB_REPO_PATH looks something like the upper URL, and not the lower URL:
</br>

https://123abc456def890adfa334af@github.com/migai/Kag.git
</br>

https:/123abc456def890adfa334af@github.com/migai/Kag.git
</br></br>

Simple solution: Ensure your URL and Path unparsers are creating properly formatted objects.  Use simple string concatenation if you must.  Check with print statements, for example, **but do not leave your token visible in any file you upload to GitHub**.  (See #2 below)
</br>


###**Number 2 Cause of Issues:  Your OAuth Token is in GitHub Repo**
You must not include your GitHub OAuth Token anywhere in any file you push to the GitHub repo.  GitHub checks every file you upload, and if it sees an active OAuth token anywhere (including code blocks, text blocks, printouts...), GitHub will reject/delete your OAuth Token.

</br>

Simple solution: go to https://github.com/settings/tokens/new and create yourself a new token.

###7.1) Status Check

In [0]:
os.chdir(GDRIVE_REPO_PATH)
!git status

On branch master
Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	[31mmodified:   helper_code/Git_enabled_Colab_with_GoogleDrive_and_GitHub.ipynb[m
	[31mdeleted:    ipynb_versions/MG_EDA_v2.6.ipynb[m
	[31mmodified:   ipynb_versions/MG_EDA_v2.7 correlation heatmaps.ipynb[m

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	[31mipynb_versions/MG_EDA_v2.8 finalize shop item cat groups.ipynb[m
	[31mipynb_versions/old/MG_EDA_v2.6.ipynb[m

no changes added to commit (use "git add" and/or "git commit -a")


###7.2) Step-by-Step Add/Commit/Push

In [0]:
!git add "{'ipynb_versions/old/MG_EDA_v2.6.ipynb'}"

In [0]:
!git commit -m "finalize shop item cat groups"

[master be62e28] finalize shop item cat groups
 2 files changed, 2 insertions(+)
 create mode 100644 ipynb_versions/MG_EDA_v2.8 finalize shop item cat groups.ipynb
 create mode 100644 ipynb_versions/old/MG_EDA_v2.6.ipynb


In [0]:
!git push origin master

Counting objects: 1   Counting objects: 6, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (6/6), done.
Writing objects: 100% (6/6), 1.73 MiB | 5.53 MiB/s, done.
Total 6 (delta 3), reused 0 (delta 0)
remote: Resolving deltas: 100% (3/3), completed with 3 local objects.[K
To https://github.com/migai/Kag.git
   ddc7edf..be62e28  master -> master


###7.3) Aborting, Tracking Who Did What

In [0]:
!git merge --abort
!git pull origin master

From https://github.com/migai/Kag
 * branch            master     -> FETCH_HEAD
hint: Waiting for your editor to close the file... error: unable to start editor 'editor'
Not committing merge; use 'git commit' to complete the merge.


In [0]:
!git blame data_output/shops2.csv

###7.4) Checking PATH and Resetting Origin

In [0]:
Path.cwd()

PosixPath('/content/drive/My Drive/Colab Notebooks/NRUHSE_2_Kaggle_Coursera/final/Kag')

In [0]:
# make sure we are in the correct location on GitHub
# Sometimes if you are having issues with Colab finding your file locations, it is because somehow your origin changed
#   there is code inside the "push" code cell to do this regularly, although it really shouldn't have to be done when things are working correctly
!git remote remove origin   
!git remote add origin "{GITHUB_REPO_PATH}"

##8. Thoughts on Alternative Processes For Colab-GitHub Integration
see, for example, Oleg Zero's post at [towardsdatascience.com](https://towardsdatascience.com/colaboratory-drive-github-the-workflow-made-simpler-bde89fba8a39) from October, 2019 and his associated [GitHub repositories](https://github.com/OlegZero13)

also, a slightly more detailed, updated version (December, 2019) of Oleg's post is available [here](https://towardsdatascience.com/google-drive-google-colab-github-dont-just-read-do-it-5554d5824228).


###Modifications to the above workflow:
Oleg creates a temporary directory on Google Drive in which to clone the GitHub repo, then copies the cloned files into the intentioned directory, and removes the temporary files.  I'm not exactly sure why he feels "recloning" is necessary, as opposed to just re-adding the origin and performing a pull operation.  (Oleg mentions "A nice thing about this solution is that it won’t crash if executed multiple times. Whenever executed, it will only update what is new and that’s it.") 
Anyhow, this code is shown below, if for some reason it becomes useful in the future.

In [0]:
!mkdir ./temp
!git clone "{GIT_PATH}"
!mv ./temp/* "{PROJECT_PATH}"
!rm -rf ./temp

###Importing (syncing) files Google Drive --> Colab VM and copying files from Colab VM to Google Drive
Oleg also provides an example of how one could load files from the local Google Drive repo into Colab using the !rsync command. It collects everything that belongs to the Drive directory and copies it into our local runtime.
Also, with rsync we have the option to exclude some of the content, which may be unnecessary or take too long to copy (the example below excludes the import of the directory "data" into the Colab VM)

*To Do:  I need to read more about the rsync command*

In [0]:
!rsync -aP --exclude=data/ "{PROJECT_PATH}"/*  ./

In [0]:
# Copying files from the Colab VM to the Google Drive long-term storage can be done with the !cp command:
!cp -r ./* "{PROJECT_PATH}"

###One last thing:  quick writing and reading of .py with Colab
Here is one link to some tips:  https://colab.research.google.com/notebooks/io.ipynb

Oleg's post describes using magic commands within Colab to write code from within the Colab notebook, and how one might go about reloading the code after modification such that Colab recognizes the changes (as opposed to using !shred command):

![Code for %%writefile and %reload_ext](https://miro.medium.com/max/1334/0*IlOTzOp9dYEMiTp6.png)

