# GitHub and Numpy
### Understanding Uncertainty

# 1. GitHub

## Git
- Git is version control software, like Dropbox
- Git is free and open source
- Git can be a hassle to use
- Microsoft acquired an online implementation of Git, called GitHub, which is used by people to develop software

## Git Words
- A directory that is tracked by Git is called a **repository**, or repo
- Git tracks changes in the contents of the repo over time
- You have to "save" manually; this is called a **commit**, we will talk about it in a second
- The first rule of GitHub is that: The real version of the project is on GitHub, not on your computer

## Creating a Repo
1. Go to your GitHub page, click on the Repositories link
2. Click NEW
3. For the Repo name, write 'practice', and the description, write 'For practice'
4. Always add a README, or cloning will sometimes have errors
5. Create the Repo.

!['Create a repo'](./src/new_repo.png)

## Clone Your Repo
- At the command line, navigate to the folder where you want to work, and type `git clone https://github.com/<your_name>/practice`
- This will create a local copy of the repo you can work on

!['Create a repo'](./src/cloning.png)

## Connecting VS Code to GitHub
1. Either:
    1. At the command line, type `code practice` to open the folder
    2. Use the File -> Open Folder, and select `practice`
2. For VS Code and Git to sync correctly, you need to open the folder that is the Repo, not merely access its files after cloning it. Use the little person button to "Turn on Cloud Changes" and sign into GitHub, to handle security issues:

!['Sign in'](./src/person.png)

3. You have to add your user details at least once, to sign your commits:

!['Sign in'](./src/user_details.png)

## Run the Get Data Code
- Use the `get_data.py` function to download and unzip the data from Zenodo

In [1]:
def download_data(force=False):
    """Download and extract course data from Zenodo."""
    import urllib.request
    import zipfile
    import os
    
    zip_path = 'data.zip'
    data_dir = 'data'
    
    if not os.path.exists(zip_path) or force:
        print("Downloading course data...")
        urllib.request.urlretrieve(
            'https://zenodo.org/records/16954427/files/data.zip?download=1',
            zip_path
        )
        print("Download complete")
    
    if not os.path.exists(data_dir) or force:
        print("Extracting data files...")
        with zipfile.ZipFile(zip_path, 'r') as zip_ref:
            zip_ref.extractall(data_dir)
        print("Data extracted")
    
    return data_dir

download_data()

Downloading course data...
Download complete


'data'

## Add a .gitignore
- A lot of files -- virtual environments, data directories, ipynb checkpoints -- do not belong in the repo, and will not be tolerated by GitHub
- In the file panel on the left, click the file-plus, and add a file called `.gitignore`
- In the `.gitignore` file, type `data.zip` and `data/*` and save it
- You can add the contents of directories like that, as well as specific files, to keep them from being tracked

## Git Commands
- We have some git commands. They are unavoidable:

| Command | Does...?|
| --------------| ---------------|
| git ls-files | List the files Git is tracking |
| git add `file` | Tells git to start tracking `file` ("Stages" it)|
| git add . | Tells git to start tracking everything in the repo|
| git rm --cached `file` | Tells git to stop tracking `file` without deleting it |
| git commit -am 'Commit message' | Save a snapshot of the project |
| git status | See the current branch of the project and changes to be committed| 
| git log | See the log of the repo with the history of changes |

Because we're using GitHub, we have a very, very annoying problem: Files over 25MB or 100MB stop an update of the repo. These commands (and .gitignore) let you track what's actually being tracked. 

## Making a Commit
1. Add `.gitignore` and `get_data.py` to the files being tracked by Git (Stage it)
2. Check the files that are currently tracked by Git
3. Make sure you've saved all the files you've edited (ctrl+S)
4. Click the Git panel, write a commit message, and make a commit

!['Make a commit'](./src/git_commit.png)

## Sync Changes 
- To **push** your changes back to GitHub, click the `commit` button and then `Sync Changes` 
- You should see your .gitignore appear now on GitHub in the repo
!['Sync Changes'](./src/push.png)
- Go to the repo and GitHub and verify your changes made it

## Working in Groups
- To work in groups, you need a few commands to update your work

| Command | Does...?|
| --------------| ---------------|
| git fetch | Update your repo from GitHub, prioritizing your files |
| git pull | Fetch changes, merge them into your repo (possibly overwriting your work)|

!['Retrieve Changes'](./src/fetch_pull.png)

## Group Practice #1
- Group "leader", create a new repo with a README in it
- Everyone in the group, clone the repo
- First person on the left of the leader, add an exercise.txt file to it, including something (your CID, a phrase, whatever). Git add the file and push it back to the groups' repo. The group leader will get a Pull Request, and can then merge the changes into the repo (just click through the dialogs).
- Second person on the left of the leader, git pull, add something to the exercise.txt file, and push it back.
- Continue until everyone, including the group leader, has added something to exercise.txt

## Group Practice #2
- Everyone git pull to sync your repos
- Everyone make a new file, git add it, git commit, and git push
- Do any conflicts occur?

## Group Practice #3
- Everyone git pull to sync
- Everyone coordinate on making edits to the same file, git commit, and git push
- What happens?

## Doomed Situations
- OK, your merge conflict is a disaster. You just can't or don't want to sort this out.
- Rename the doomed local folder to `repo_doomed`
- Git clone a fresh local copy of `repo`, make your changes using code from `repo_doomed`, push back before anyone else does