## Getting Started with GitHub
- I assume that you have set up a GitHub account for yourself; if not... **set up a GitHub account for yourself**
- We're going to talk about some basic functions of GitHub and how to access it from Colab and Rivanna
- Remember,
    - All the content for the course is at `https://github.com/DS3001` and as long as I work at UVA, that's where I'll put things
    - You are going to do all your work on your own GitHub account; that is where you "submit" everything for the course
- GitHub is basically the "hard drive" for the course: It's where everything gets stored. The RAM/processing power is Google Colab or Rivnna.

In [1]:
! git clone https://github.com/DS3001/github_intro

Cloning into 'github_intro'...
remote: Enumerating objects: 14, done.[K
remote: Counting objects: 100% (14/14), done.[K
remote: Compressing objects: 100% (14/14), done.[K
remote: Total 14 (delta 2), reused 0 (delta 0), pack-reused 0[K
Receiving objects: 100% (14/14), 1.41 MiB | 24.46 MiB/s, done.
Resolving deltas: 100% (2/2), done.


# GitHub and Colab
## `! git clone https://github.com/DS3001/github_intro`

## Basic Terms
- **Git** is version control software. It allows you to get content from GitHub (forking and `! git clone ...`), track all your desired changes (`commit`), split the project and work on different versions (`branch`), and return them to your GitHub account (`! git push ...`)
- The fundamental idea is: Git turns a directory into a self-contained archive of past changes, called a **repository** or repo. When you make changes that you want to save, you tell Git to create a new snapshot of the repo, called a **commit**.
- GitHub is a commercial product for hosting repos built on the free Git software; GitHub is not Git
- We will start slow, using GitHub and Google Colab to illustrate some of the key pieces

## Forking
- Go to https://github.com/DS3001/programming . This is a repo(sitory) that includes some .ipynb/Jupyter notebooks
- Click the `Fork` button to create your own copy of this repo in your account.

![Forking](fork.png)

- You now have your own copy. This will be especially helpful when we move on to more complex assignments where repos will include code and data.
- Please put anything you ever want graded in a GitHub repo: You can add doc/pdf/xls/etc files to it. You can put anything in a GitHub repo.

## Working with your Repo: Colab
- The simplest way to work on an .ipynb file in your repo is to use Google Colab
- It is lighter weight than Jupyter-Lab/Rivanna, but very convenient to work with
- Go to https://colab.research.google.com/
- Click `File`, then `Open Notebook`, then click the `GitHub` tab. Enter your username or `DS3001`:

![Colab](colab.png)

- This is super convenient: You can create a virtual machine for data science whenever you need one. It is accessible and equitable: Anyone with a tablet can use Colab to do most of what's needed for the class. It is always available, as long as you can reach Google.

## Saving Back to GitHub
- When you are done working, click `File`, then `Save a Copy in GitHub`. Log into your GitHub account. A dialog should appear:

![Committing](commit.png)

- Notice it says `Commit message`? That is an important step. Your work will be saved in a log on GitHub of all the changes to the repo. Type in something meaningful whenever you use GitHub, so you can tell what each iteration of the project entailed. This will update your copy of the repo on GitHub.
- The recommended style is for Git commit messages to be in present tense: "Edit files to add new regression" rather than "Edited files to add new regression".
- In terms of the underlying Git software, you are making a `commit` and changing the current state of the repo.

## The Commit History
- Because you have the entire history of changes contained in the repo, you can always go back if you regret changes you made (we'll cover branching later)
- Go to your copy of the repo in your GitHub account:

![Commit History](commit_history_1.png)

and click on the little `rewind` button to see the history:

![Commit History](commit_history_2.png)

So all the work ever done on the repo is visible.

## Implications of GitHub
- Here's what happens: We post an assignment/deadline on Canvas under the Assignments tab. You do your work on Github. When the deadline is reached, we grade. That's it. If your work is in when we grade, it gets graded. If it is not in, we can talk about it. Your grade appears on Canvas in the Grades tab.
- When it comes to group work, we look at the Commit History to see that everyone did something. If you didn't contribute, you get a worse grade. That means you have to find ways to contribute meaningfully.
- Lying about when you turned in work is worse than pointless. A few students did this last semester after final grades were posted and the commit history for the relevant repo made it obvious they were lying.

## Rivanna Preview
- Using Rivanna is a little bit more difficult, because you have to make commits and then `push` your changes back to your GitHub account manually: There isn't a dialog like on Google Colab
- For now, I'm just going to show you how to get content from GitHub onto Rivanna, and we'll transition there gradually.
- Go to https://rivanna-portal.hpc.virginia.edu/ . You have to log in with the university security. Once in, it should look like this:

![Rivanna](rivanna.png)

## Tools
- As long as your code is available as an .ipynb file on a GitHub repo, I do not care how they're created: You could use Google Colab, Anaconda, a local copy of Jupyter-Lab or VS Code on your computer, VS Code or Jupyter-Lab on Rivanna, whatever.

## Rivanna
- From the OnDemand page, click `Interactive Apps`, then `Jupyter-Lab`
- This opens a dialog where you build your session:

![Building a Session](rivanna_2.png)

- Typically this is enough: 2 cores, 6 GB of memory, and 2-4 hours of time
- When you hit `Launch`, Rivanna will start putting the Virtual Machine together for you. When it's ready, you get a `Connect to Jupyter` button to launch the Jupyter-Lab session:

![Launching a Session](rivanna_3.png)

## Differences between Colab and Jupyter
- The product -- a Jupyter Lab Notebook -- is fundamentally the same
- Colab is not persistent: When your session ends, all that work goes away (you can set up a Google Drive to make it a persistent computing environment). Rivanna is persistent: You have this workspace as long as you're in the class.
- Rivanna does not have great GitHub support: You have to use bash commands to clone/commit/push content back and forth. Colab has great GitHub support: Cute dialog boxes.
- Colab's compute resources are sufficient for most things, but not everything: You can run out of storage space, or have jobs that take too long with the available cores/memory. On Rivanna, you can ask for a much more powerful computing environment, and you have up to a terabyte of storage space in the scratch drive.
- Small jobs and early homework assignments can be done on Colab easily. Projects and more complex assignments are probably easier to do on Rivanna.
- You can make anything work if you have to.

## Cloning a Repo
- On Colab, you can easily open notebooks from GitHub, but that doesn't pull the entire repo into your workspace. On Rivanna, you need some way to access content on GitHub.
- In Jupyter, you can always "escape" from Python up to a command line interface by using the exclamation point, `!`.
- So to pull resources from GitHub onto Colab or Rivanna, type the following in a Jupyter code chunk and execute it:
`! git clone https://github.com/<Username>/<Repo>`
like `! git clone https://github.com/DS3001/programming`.
- This sends up a Git command up to the operating system (Linux, OSX, Windows), which performs the command for you as long as Git is installed
- This is great: You can access the course content and your repos from anywhere.

In [2]:
print("This is a change to push to my repo") 

This is a change to push to my repo


In [10]:
! git commit -am "2/27 lecture"

Author identity unknown

*** Please tell me who you are.

Run

  git config --global user.email "you@example.com"
  git config --global user.name "Your Name"

to set your account's default identity.
Omit --global to set the identity only in this repository.

fatal: unable to auto-detect email address (got 'kcm7zp@udc-aw29-19b.(none)')


In [11]:
! git push https://kelseymatsik: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@github.com/kelseymatsik/github_intro

Everything up-to-date
