# Week 10, Class 1: Version Control

## Intro: Why version control?

Version control is used ubiquitously across disciplines in scientific programming, in open source software development, and in software engineering broadly.  It's the way that everyone works for a reason -- it's just extremely useful. To see why consider three scenarios:

1. You're writing a paper over the course of a semester and you'd like to edit it on multiple computers and keep a backup somewhere online. You'll be switching between computers and might want to recover some text and figures you deleted halfway through the writing process. You want to leave yourself some notes along the way to help jog your memory on what you're writing about when you return to the paper after a break.
2. You're collaboratively editing a paper with your advisor and a couple co-authors. Your advisor is sadistic and wants more rounds of revision than you previously thought possible. You take your time to make edits to paper_v3_ijk_lmn_opq_ijk so that you can have a clean paper_v4, but in the meantime collaborator opq has re-edited your advisor's edits on the Introduction and Conclusions.
3. You're writing some code to process and plot up your thesis data. You've hacked together some mostly-working code that processes the data, and you have some new data coming in soon. In the meantime, you'd like to re-write your code so that it works better for your old and new data. But your re-write will break your code that works(ish).
4. You're co-authoring some code on a big project. You're using your data visualization skills to make beautiful, interactive plots, and others are working on data wrangling, number crunching, etc. You email code to each other when you're done adding a new feature or section, but each time you get someone's emailed code you realize they broke a part you need -- they changed the name of a variable or the location of some data.
5. You're collaborating on a widely used open source software project. You and your colleagues release new versions at regular intervals and work together on updating, improving, and adding features to the code. You need a place for users to download your code and documentation, submit bug reports and issues, request new features, contribute to the wiki, and discuss future development directions. Your team needs a place to track who has been assigned which task, and how those tasks are being completed.
7. You're a GEOL 503 student who needs to turn in some data, code, and documentation for your final project.

All of these situations are ripe for some type of version control.  There are several to choose from.

## Intro: Git and GitHub

Git is free open-source version control software. If you wanted, you could download it here: [https://git-scm.com/downloads](https://git-scm.com/downloads), but this version works only at the command line.  However, there are no usernames, passwords, or subscription fees, and it's bombproof.

Git ([wiki](https://en.wikipedia.org/wiki/Git) was originally written by Linus Torvalds, the same guy that wrote Linux. It's used [everywhere](https://en.wikipedia.org/wiki/Git#Adoption) now for source code management and beyond. Other version control tools include Subversion (where I got my start many moons ago) and Mercurial.  Git does most of the version control work for us, but it doesn't include an online place to host your files.

Almost everyone these days uses an online developer platform alongside Git.  We'll be using GitHub [https://github.com/](https://github.com/), which is the most popular in the science community and has the dominant market share worldwide for all development projects.  In addition to providing online file hosting, GitHub extends the capabilities of Git, facilitating collaboration and distribution. Many liken GitHub to a social network for software developers.

GitHub has some nice documentation, which you can find here: [https://docs.github.com/en](https://docs.github.com/en)

## Exercise 1: Sign up and download

We'll use the easiest tool available to navigate Git and GitHub -- the graphical user interface GitHub Desktop.

First, go to [https://github.com/](https://github.com/) and, if you don't have one already, create a username and password. Keep it professional, you will need this later!

Note that you are eligible for a GitHub student account that comes with extras and freebies, like unlimited private repositories. It takes longer to complete this signup -- you need to take a picture of your student ID. Find out more [here](https://docs.github.com/en/education/explore-the-benefits-of-teaching-and-learning-with-github-education/github-education-for-students/apply-to-github-education-as-a-student).


Next, download GitHub Desktop from [https://github.com/apps/desktop](https://github.com/apps/desktop) and install it. Starting it for the first time, you'll be prompted to enter your GitHub username and password.  

## Exercise 2: GitHub tour

A tour of GitHub.  Some lingo:

- **Repository, or repo for short** (_noun_): the "home" folder for an entire project. The project code, data, metadata, etc, comprise files and subfolders. Each repository gets a name, a location, and a URL on GitHub that others can find. The repository contains not just all of its files, but the history of how those files were created and edited.
- **Clone** (_verb_): to make a copy of an entire repository in a new place on your computer.
- **Public** and **Private** repositories (_adjective_): available to everyone, or to only you and the collaborators you choose.
- **README** (_noun, file_): A file (often formatted in MarkDown) that contains a description of the repository's purpose, contents, and sometimes, directions on how to use them.
- **Commit** (_verb or noun_): a set of (related) changes to a repository's files that get saved to the repository at once, preferably with an informative message about what's changed.

Some interesting places:

- Noah McLean: [https://github.com/noahmclean](https://github.com/noahmclean)
- Sam Zipper: [https://github.com/samzipper](https://github.com/samzipper)
- Last week's speaker, Yao Lai: [https://github.com/chingyaolai](https://github.com/chingyaolai)
- Michael Shahin [https://github.com/shahinmg](https://github.com/shahinmg)
- Nick Swanson-Hysell: [https://github.com/swanson-hysell](https://github.com/swanson-hysell)
- Sheree Armistead: [https://github.com/ShereeArmistead](https://github.com/ShereeArmistead)
- ESIP: [https://github.com/esipfed](https://github.com/esipfed)
- NASA-Earth-Data: [https://github.com/opengeos/NASA-Earth-Data](https://github.com/opengeos/NASA-Earth-Data)
- Software Underground Links: [https://github.com/softwareunderground/awesome-open-geoscience](https://github.com/softwareunderground/awesome-open-geoscience)
- MeanderPy: [https://github.com/zsylvester/meanderpy](https://github.com/zsylvester/meanderpy)

## Exercise 3: Make a new remote repository on GitHub

On GitHub, follow along as we all make a new repository for your GEOL 503 final project.  Give it a name that includes your initials and the course number, GEOL 503.  You'll want to make a README.md Markdown file and add a license. Go to [https://choosealicense.com/](https://choosealicense.com/) for some help in choosing your license, if you want.

Your remote repository, when public, is viewable by anyone.  See [https://docs.github.com/en/get-started/git-basics/about-remote-repositories](https://docs.github.com/en/get-started/git-basics/about-remote-repositories). 

## Exercise 4: Cloning a repository on your computer

Cloning a repository makes a local copy of the repository -- and all its history -- available on your computer.  Follow along as we clone your final project repository to your computer. Once cloned, this folder is tracked under version control.  You can learn more about repositories here: [https://docs.github.com/en/repositories](https://docs.github.com/en/repositories). 

## Exercise 5: Add some files

Add some files and folders to your local repository. They could be some data you plan to use, some code you've written, or anything.  Also, try editing the readme file. View hidden files in your file manager (Finder or Windows Explorer) to see some of the Git machinery for your repository.

## Exercise 6: Commit your changes

Stage your changes and commit them using GitHub Desktop to capture a snapshot of your repository's changes, along with documenting what you've done and why. You can do this as many times as you like, at a frequency that suits you.  

Don't be slack about your commit messages!  They're important and future-you will thank present-you profusely for making them informative. So will your collaborators.

## Exercise 7: Push your commit(s)

Using GitHub Desktop, **push** your commit(s) to your remote repository on GitHub. To **push** means to send the commits from your local repository to your remote repository. Go check that it's there!

## Exercise 8: Fetch and pull for simple workflows

To **fetch** a remote repository means to check in on the repository (usually on GitHub) and see what's changed. If anything has changed, the **fetch** lets Git know about it.

If you've been working on files in the same repository, then a fetch will tell you if any of the files that you're working on have changed. If so, Git will help you navigate any possible conflicts. If there are no conflicts, then you can **pull** the changes to your local repository, synchronizing your local repo on your computer with the remote repo on GitHub.

One of Git's really useful features is a streamlined way of comparing files and addressing conflicts. GitHub Desktop handles this process pretty well, as does VSCode, but it's beyond the scope of this class. For a simple collaborative workflow, see [https://www.atlassian.com/git/tutorials/comparing-workflows](https://www.atlassian.com/git/tutorials/comparing-workflows)

## Exercise 9: Clone a classmate's repository

Hopefully you gave your project repositories different names!

## Exercise 10: Clone a fun repository from GitHub.

Make it a small one so you don't kill your computer.

## More topics for discussion

- Branches and pull requests
- Releases
- Forks
- Other GitHub features, like Issues and Discussions
- GitHub Copilot
- Using Git/GitHub to write a paper
- Limitations (e.g., non text-based files)
- Next steps: Python packages and environments.