# CHEM7370 Class 11

# Version control
Have you ever been working on a project and wanted to go back to a previous version of the project? Or perhaps you’ve worked on a group project where multiple people were making changes to files and you ended up with multiple versions of multiple files and it was very confusing? Now imagine that you are working on a software project with 5, 10 or even 100 people. Every person would need their own copies of all the code, but it would be very hard to keep up with the changes each person was making and merge them all together. All of these issues can be handled by using *version control* on your project.

Version control keeps a complete history of your work on a given project. It facilitates collaboration on projects where everyone can work freely on a part of the project without overriding others’ changes. You can move between past versions and rollback when needed. Also, you can review the history of your project through commit messages that describe changes on the source code and see what exactly has been modified in any given commit. You can see who made the changes and when it happened.

This is greatly beneficial whether you are working independently or within a team.

## git and GitHub
The software package `git` is one of the most popular software packages for version control. GitHub is an online hosting service which hosts the files of many software packages that use `git` so that these packages can be shared with other people. Anyone can use `git` locally for version control without using GitHub. To share your code on GitHub, you must create a GitHub account and profile.

Let's first make sure that you have `git` installed in your `class` environment in Anaconda. If not, go ahead and install it! Then, you need to open a terminal where you will configure your `git` installation. In the terminal, navigate (using `cd`) to your class directory and type `conda activate class` to get your `class` environment ready in this terminal.

## Configuring Git
The first time you use Git on a particular computer, you need to configure some things.

First, you should set your identity. One of the most important things that version control like Git does is to keep track of who changes what. This helps repository maintainers coordinate the efforts of all the people who contribute to the project. Most importantly, it makes it easier to figure out who to blame when something goes wrong. To set you identity, *open a Terminal window* and type the following commands:

```
git config --global user.name "<Firstname> <Lastname>"
git config --global user.email "<email address>"
```
You will also need to configure a text editor. For example, if you are on Windows and have installed Visual Studio Code, here is how to configure `git` to use it as your text editor.

```
git config --global core.editor "code --wait"
```
If you are on a Windows and prefer `notepad++` (and have it installed), this is how you can select it:
```
git config --global core.editor "'C:/Program Files (x86)/Notepad++/notepad++.exe' -multiInst -notabbar -nosession -noPlugin"
```
If you are on a Mac and don't know how to use `vim` (the default option), here is how to switch `git` to use `nano`.
```
git config --global core.editor "nano"
```
Next configure the credential helper so you don’t have to type your password as often when performing git operations.
```
git config --global credential.helper cache
```
After you're done setting these options, type
```
git config -l
```
to show a list of all config options that have been set.

# Initializing git on your project
In `git`, a collection of files related to a specific project is called a *repository*. In the Terminal window, navigate to your class folder. In order for the `git` software to know something is a repository, you have to tell `git` that it is. You can check if you are in a `git` repository already by typing
```
git status
```
If you are not in a `git` repository, you should see
```
fatal: not a git repository (or any of the parent directories): .git
```
Tell `git` that you would like to create a repository here, in your class folder, and keep a record of your project by typing
```
git init
```
After you type this command, `git` will initialize an empty repository. Right now, `git` knows that we have started a project, but it doesn’t know what files to track. `Git` will only track the files you tell it to.

You can see the status of your repository by typing
```
git status
```

The exact output will vary depending on file names in your directory. This will list all of the files in your repository and tell you that none of them are tracked. This means that `git` sees the files, but is not keeping a record of them or watching them for changes. We want to tell `git` to start watching these files.

## git add, git status, git commit
Making a commit is like making a checkpoint for a particular version of your code. You can easily return to, or revert to that checkpoint.

We might modify many files at a time in a repository. Thus, the first step in creating a checkpoint (or commit) is to tell git which files we want to include in the checkpoint. We do this with a command called `git add`. This adds files to what is called the staging area.

When we use a `git status` command, git tells us to use `git add` to include what will be committed. We want to add the file that we worked on today to the staging area by typing

```
git add Class10.ipynb
```
When you call `git status`, you will see that the output message has changed. It now tells us we should perform a `commit`.

We are now on the second step of creating a commit. We have added our files to the staging area.

To create the checkpoint, or commit, we will now use the `git commit` command. We add a `-m` after the command for “message.” Whenever you create a commit, you should write a message about what the commit does. If you skip the `-m` option, `git commit` will drop you in a text editor to compose a message.

```
git commit -m "add initial project files"
```

Every time you make a commit, this is now part of the official record of what is in the repository, so you have to write a commit message telling people what is being added. You can write anything you want in these comments, but the best practice is to write something short but descriptive about the files that are being added or changed. Even if you think no one else is ever going to use your code, writing good commit messages is a great way to remind yourself of what you have done in the past. It is good practice for these to be descriptive rather than general, so a message like “Add function for calculating bond lengths” is much better than something non-descriptive like “Commit #5.”

Now when you type `git status` it should say “nothing added to commit but untracked files present”. This means that no changes have been made to your tracked file since your last checkpoint or commit, and the other files in your class folder are not tracked.

Let’s make and track one more change to our repository. Open the first cell with code and add a docstring at the top of the file.
```
"""
This code subtracts the background and locates the peak position in Dr. Ohno's spectrum.
"""
```
Save this change and commit it.

First, we will do a `git status`. Then, to create a checkpoint with the new version, type
```
git add Class10.ipynb
git commit -m "add docstring to Class10.ipynb"
```

## The `git log` command
Git creates a history of our project, but how do we see or use that history? You can see a history of commits using the `git log` command.

Each line of this log tells you something important about the commit, or check point that exists for the project. On the first line,

```
commit adf1dcc0bf88a4971f37edecd80ee25d544c6a6b (HEAD -> master)
```

You have a unique identifier for the commit (`adf1d...`). You can use this number to reference this checkpoint.

Then, git records the name of the author who made the change.

```
Author: Your Name <your_email_address@something.com>
```
This should be your information. This way, anyone who downloads this project can see who made each commit. Note that this name and email address matches what you specified when you configured git in the setup.

```
Date:   Sat Feb 18 22:20:37 2023 -0500
```
Next, it lists the date and time the commit was made.

```
add initial project files
```

Finally, there will be a blank line followed by a commit message. The commit message is a message whoever made the commit chose to write, but should describe the change that took place when the commit was made.

`git log` shows a history of commits to our repository, and they will all have the same format discussed above. Notice that commits are in reverse chronological order, with the most recent change listed first.


## Viewing changes
If you want to see what changed between commits, use the command

```
git diff COMMIT_ID_1 COMMIT_ID_2
```

Let’s do this for our last commit. We will compare the version at commit 2 to commit 1. You can quickly see commit ids using the command

```
git log --oneline
adf1dcc (HEAD -> master) Tiny change because notebook updated.
5fca176 add initial project files
```
We will compare these two commit IDs

```
git diff adf1dcc 5fca176
```

The `+` next to lines tells us that those lines were added from commit 1 to commit 2. If any lines had been deleted, they would appear with a `-` sign next to them.

## Checkout and view previous versions
If you need to revert to a previous version

```
git checkout COMMIT_ID
```
This will temporarily revert the repository to whatever the state was at the specified commit ID.

Let’s checkout the version before we made the most recent edit to `Class10.ipynb`. You will get your commit ID from `git log`.

```
git checkout 707b644
```
If you now reopen and view the file `Class10.ipynb`, it is the previous version of the file.

To return to the most recent point,

```
git checkout master
```

## More Tutorials
If you want to learn more functions of `git`, see the following tutorials.

[Software Carpentry Version Control with Git](https://swcarpentry.github.io/git-novice/)

[GitHub 15 Minutes to Learn Git](https://docs.github.com/en/get-started/quickstart/set-up-git)

[Git Commit Best Practices](https://github.com/trein/dev-best-practices/wiki/Git-Commit-Best-Practices)

## Sharing your code on GitHub
Let’s get your project put on GitHub so you can share it with others. In your browser, navigate to [github.com](https://github.com). If you already have a GitHub account, click the Sign In button. If you need to create an account, click the Sign Up button.

## Creating an online repository
GitHub is a website which provides us with a place to host our code. We are using the software `git` to version control our code. GitHub is providing us a place on the internet to put copies of those repositories. In general, you should have a different repository for each of your projects.

Once you are signed in to GitHub, on the left side of the page, click the green button that says New to create a new repository. On the next page, choose a name for your repository and write a short description of your project. You can choose whether your repository will be public or private. Even if you choose private, you can identify other specific GitHub users who can see your repository and commit to it. However, if you want other people to be able to find your project without you specifically sharing it with them, you should choose public.

For this class, make your repository **public**.

Note the last question, “Initialize this repository with a README”. We will leave this unchecked in our case because we have an existing repository we are adding to GitHub. If you were creating the repository for the first time on GitHub, you would select this. There are also options for adding a `.gitignore` file or a license, but you don’t have to do that right now.

Click Create repository.

We now have an empty spot on GitHub where we can put a copy of our code. On the next page, GitHub now very helpfully gives us directions for how to get our code on GitHub. The most relevant set of instructions for our current situation is the third set **push an existing repository from the command line**.

## Local and Remote Repositories
We are using the word “repository” here to refer to both the local copy of the code on your machine, and to the copy on GitHub. A “repository” simply refers to a directory with code which is being tracked by `git`. Sometimes, people will shorten “repository” to “repo”.

Since we initialized `git` in our project using `git init`, it is now a repository. We will enter commands into the terminal to get our local copy onto GitHub (our online repository). That is what is meant by “push an existing repository from the command line.”

## Pushing your code online
Before we follow these directions, let’s talk about what the directions mean. First, we have to tell `git` that the online repository we created exists. When you want to be able to put the code that is on your computer (your local repo) into an online repository, you have to add what `git` calls *remotes* to your local repository. Think of remotes like roads; if there is a road between two places you can travel between them. If there is a remote between two repos, information can travel between them.

Currently, our repository has no remotes. You can see this by typing
```
git remote -v
```
You should see no output from this command.

Now, we will follow the directions given by GitHub to add remotes. In the Terminal window, type
```
git remote add origin https://github.com/YOUR-USERNAME/REPOSITORY-NAME.git
```
Note that the URL in your first command will be different because it will contain your own GitHub username and repository name. Just copy the line that GitHub gives you.
```
git push -u origin master
```
Note that you will need to enter the first command, press enter, and then enter the second command. The first command adds a remote named `origin` and sets the URL to our repository. The second command pushes our repo to where we have set as origin. The word `master` means we are pushing the master branch.

Now if you refresh the GitHub webpage, you should be able to see all of the new files you added to the repository.

## Adding a README.md file
If your repository contains a file named `README.md`, then GitHub renders it into a nice description so that anyone who comes across your repo knows what the project is about. The md extension on this file refers to “markdown” - the very same simple language for text formatting that we have been using in our Jupyter notebooks. 

Here is an example:

```
# Markdown Example

This is an example of markdown formatting.

## Text Formatting
It's very easy to make some words **bold** and other in *italic*.

### Ordered Lists
I can also easily make an ordered list.
1. List item 1
1. List item 2
1. List item 3

### Unordered Lists
Unordered list
* List item
* List item
* List item
```

### Exercise
In your favorite text editor, create a new file called `README.md` in your class folder. Write a short description of your spectra analysis code and how it works. If you want to have section headings in your file, use `##` to indicate a section heading. Otherwise, just type in your description.

Using the `git` commands learned above, add the file to your repository, commit it with an appropriate commit message, and push it to the master branch of your GitHub repository.

When you did everything right, if you now refresh the GitHub page for your project, you will see your project description at the bottom of the page. That’s it! Your code is now on GitHub for the world to see, download, and use!

## Working With Multiple Repositories
One of the most potentially frustrating problems in software development is keeping track of all the different copies of the code. For example, we might start a project on a local desktop computer, switch to working on a laptop during a conference, and then do performance optimization on a supercomputer. In ye olden days, switching between computers was typically accomplished by copying files via a USB drive, or with ssh, or by emailing things to oneself. After copying the files, it was very easy to make an important change on one computer, forget about it, and go back to working on the original version of the code on another computer. Of course, when collaborating with other people these problems get dramatically worse.

`Git` greatly simplifies the process of having multiple copies of a code development project. Let’s see this in action by making another clone of our GitHub repository. For this next exercise, you must first navigate out of your project folder.

```
cd ../
git status
```
Before continuing to the next command, make sure you see the following output:
```
fatal: Not a git repository (or any of the parent directories): .git
```
If you do not get this message, do `cd ../` until you see it.

Next, make another copy of your repository. We’ll use this to simulate working on another computer.

```
git clone https://github.com/YOUR_GITHUB_USERNAME/YOUR_REPOSITORY_NAME.git friend
cd friend
```
Check the remote on this repository.
```
git remote -v
```
Notice that when you clone a repository from GitHub, it automatically has that repository listed as origin, and you do not have to add the remote the way we did when we did not clone the repository.

Create the file `testing.txt` in this new directory and make it contain the following.
```
I added this file from a new clone!
```
Now we will commit this new file:
```
git status
git add testing.txt
git status
git commit -m "Adds testing.txt"
git log
```
Now push the commit:
```
git push
```
If you check the GitHub page, you should see the `testing.txt` file.

Now change directories into the original local repository, and check if `testing.txt` is there:
```
cd ../<your regular class directory>
ls -l
```
To get the newest commit into this clone, we need to `pull` from the GitHub repository:
```
git pull origin master
```
Now we can actually see `testing.txt` in our original repository.

## Collaborating with others using GitHub
Many software projects, large and small, are hosted on GitHub. If you are working with other people on a project, they may ask you to collaborate and contribute your code contributions through GitHub. To do this, we need to understand a little more the flow of information on GitHub.

As discussed above, a collection of files for a certain project is called a repository on GitHub. When you are collaborating with others on a project, there are at least three relevant copies of a repository: the main repository on GitHub, your copy of the repository on GitHub, and the local copy of the repository on your computer.

![Alt](episode09_fig1.png "GitHub repository interactions scheme")

## Forking a repository
To make your copy of someone else’s repository on GitHub, you *fork* their repository.

Navigate to [the GitHub page of this class](https://github.com/konpat/CHEM7370_Spring2023). In the upper right hand button, click the button that says **Fork**. This will make a copy of the repository *on your GitHub account*.

## Cloning a repository
Now you need to copy the repository on your GitHub to your local computer. This is called a *clone*. Navigate to the GitHub page of your copy of the repository on GitHub. Click the green button that says **Clone** or **Download**. Copy the link in the box.

Now open a terminal window and navigate to the location where you want the repository to be stored on your local computer.
Type `git clone` and then paste the link you copied.
```
git clone https://github.com/YOUR-USERNAME/REPOSITORY-NAME.git
```
where the username will be your GitHub username and the repository name will be the name of the repository. This command will create a folder with the same name as the repository that will be under `git` control.

## Setting git remotes
Now that you have all the copies of the repository in place, you need to make connections between the copies so you can transfer information. These connections between the copies are called *remotes*. We already discussed remotes a bit above, when we pushed our code to GitHub.

When you clone a repository, it automatically sets up one remote for you called `origin`. To see the remotes for a repository, go to the command line and type
```
git remote -v
```
Since we cloned the repository from our copy on GitHub, the `origin` remote is between our copy on GitHub and the local copy on our computer.

We probably also want to connect the repository on our local computer to the main repository on GitHub so that we can get updates that are made to that repository. We need to create a new remote called `upstream` that connects the repository on our local computer to the main repository on GitHub. You can actually name any remote anything you want, but `upstream` is the accepted name for the remote to the original main repository.

To set a new remote, go to GitHub and navigate to the page of the main repository (not your copy, the same page you forked from). Click the green button that says **Clone** or **Download**. Copy the link in the box.

Go to the terminal and type `git remote add upstream` and then paste the link you copied.
```
git remote add upstream https://github.com/ORIGINAL_OWNER/ORIGINAL_REPOSITORY.git
```
Now if you use `git remote -v` again to list your remotes, you should see two remotes, `origin` (to your copy on GitHub) and `upstream` (to the main repository on GitHub).

## Information flow using GitHub
The normal workflow when you are working on a collaborative project with `git` is:

1. Pull changes to the main project from upstream.
2. Work on the project on your local copy of the repository.
3. Commit your changes to your local copy.
4. Push your changes to your copy of the repository on GitHub.
5. Submit your changes to the main developers through a *pull request* so they can consider including them in the main repository.

### Getting changes from the main repository
To get changes from the main repository into your local repository, you use the `upstream` remote and use the `pull` command.
```
git pull upstream master
```
When you enter this command, several things can happen, depending on what changes have been added to the main repository. If the new changes don’t conflict with what you have already, then you might see something that says `Fast Forward`. If there are conflicts, you will have a merge conflict that you need to resolve, where you select which version of the file you want.

### Pull vs. fetch and merge
The `pull` command is actually a combination of two other `git` commands, `fetch` and then `merge`. The `pull` command is easier because it is just one step, but in complex situations you might want to do a `fetch` so you can see what changes have been made and then decide to `merge` or not.

### Using branches to make changes
Within a repository, you can actually create different copies of your code. These are called *branches*. This is particularly useful if you want to work on a new feature, but you want to keep another copy of the code that is working in case you mess something up. In all of the examples above, we have been using the `main` or `master` branch.

Let’s make a branch and make some changes to the classmate’s code you forked and cloned. First, make sure you are on your `master` branch.
```
git status
```
Now, make a new branch for your edits. You can name your branch anything you want, but we will call it `edits` in this example. Usually you would choose a more informative name, but this is just an example.
```
git branch edits
```
Now we will switch from our `master` branch to the new `edits` branch.
```
git checkout edits
```
Now that we have crated a branch to work on, we can make changes to the files without worrying about messing up anyone else’s work.

### Exercise
Open the `README.md` file and make a change to the file.

Use `git status` to check which files have been modified on this branch. Then use `git add` and `git commit` to commit the changes to the branch.

Now you can push your changes through the `origin` remote to *your* repository on GitHub.
```
git push origin edits
```
This will create a new branch on your GitHub repository called `edits` and push your changes to that branch. Generally, you can not push your changes upstream into the main repository on GitHub. This is why you need your own fork on GitHub; you can always push to something that belongs to you, but not something that belongs to someone else.

## Being a Collaborator
There are ways that you can push upstream to the main repository. The owner of a repository can add you as a collaborator and give you permission to push changes. For small projects with only a few people, this is fairly common. For large projects with 100’s of developers, no one does this, and you must submit your changes through a *pull request* as described below.

Notice that in this flow of information, if you want to pull new changes from the main repository into your repository on GitHub, there is no remote that directly connects those two. To accomplish this, you would do a `pull` from the main repository to the copy on your local computer through the `upstream` remote and then do a `push` from your local copy to your copy on GitHub through the `origin` remote.

### Submitting your changes to the main developers
Once you have pushed changes to your repository on GitHub, you may want to tell the main developers about your changes and ask them to consider merging them into the main repository. This is called making a *pull request*. Essentially, you are suggesting to the developers that they pull changes from your repository into their repository. To make a pull request, navigate to the GitHub page of the class and click on **Pull Request** in the navigation bar across the top. On the next page, click the green button that says **New Pull Request**.

On the next page, in the left box that says `base`, the main repository should be listed. If it is not, use the dropdown menu to select it. In the right box that says `compare`, use the dropdown menu to select the `edits` branch of your repository. If you don’t see it in the list, then you need to click the blue link above the boxes that says *compare across forks*. Click **Create Pull Request**.

On the next page, you will type a message to the developers telling them what changes you have made. You can also suggest reviewers who you want to check out your changes. When you are finished, click **Create Pull Request**.

The developers will receive a notification of your pull request and then they can review your changes and decide if they are going to merge them into the main repository. You will receive a notification letting you know if/when the pull request has been merged.