# 4: Working with a Git repo

<img src="../../img/icon-plates.png" width="200" />

In this section, we will create a local repo, add files, and work remote repo on GitHub

## 4.1: Creating a repo and committing files

### `git init`: Creating a local repo

The `git init` command creates a local repo in the current directory.  Let's create a new directory, `~/git-workflow` and initialize a repo inside it.  

(You can use a directory with a different name and location, if you like.)

In [1]:
%%bash

mkdir ~/git-workflow
cd ~/git-workflow
git init

Initialized empty Git repository in /Users/minlu/git-workflow/.git/


You can see the `.git/` directory was created for the database and metadata.  

In [4]:
%%bash

cd ~/git-workflow
ls -lha

total 0
drwxr-xr-x@  3 minlu  staff    96B Jun 13 07:53 [1m[36m.[m[m
drwxr-x---+ 66 minlu  staff   2.1K Jun 13 07:53 [1m[36m..[m[m
drwxr-xr-x@  9 minlu  staff   288B Jun 13 07:53 [1m[36m.git[m[m


#### Q1: In your $HOME directory, create a new directory named "git-workflow2" and initialize it as a Git repository.

In [None]:
%%bash

### BEGIN SOLUTION
mkdir ~/git-workflow2
cd ~/git-workflow2
git init
### END SOLUTION
ls -al

In [None]:
# Autotest to check if folder is created and is a Git repo

### `git status`: Show the state of the working tree

`git status` shows the state of tracked files, untracked files, and the database. Right now, the working tree is empty, and the database is empty, so there's not much to report.  

We already have a **branch** called `main`, which was created when the repo was initialized.  We'll cover branches in detail soon.  For now, you can think of them like branches of a family tree that describe your project's history.  

In [7]:
%%bash

cd ~/git-workflow
git status

On branch main

No commits yet

nothing to commit (create/copy files and use "git add" to track)


### Creating new files

To create and edit files in your working tree, you can use any text editor.  In `~/git_workflow`, let's create two files.

#### Q2: Create a file named `README.md` file with these contents. You might notice a mistake, but don't correct it right now! :)

``` text
# Git Workflow

This is an example repo for the GT OSPO VSIP Spring 2024 Program.  
```

**Note:** Throughout the entire assignment, you can choose to run the necessary commands in whichever way you prefer, whether it be through the GUI or the terminal with the Jupyter Notebook cells. 

In [10]:
%%bash

cd ~/git-workflow
### BEGIN SOLUTION
echo "# Git Workflow" > README.md
echo "" >> README.md
echo "This is an example repo for the GT OSPO VSIP Spring 2024 Program." >> README.md
### END SOLUTION

In [None]:
# Autograded test to check if the file exists

#### Q3: Next, create a Python source file within your Git repo named `my_abs.py` with these contents. We will be making changes to `my_abs.py` throughout this demo.

``` python
def my_abs(x):
    if x < 0:
        return -x
    else:
        return x
```

**Note:** The correct file has also been created for you in the `code` directory under the course assignment folder if you prefer to copy the file over instead.

In [12]:
%%bash

cd ~/git-workflow
### BEGIN SOLUTION
cp code/my_abs.py ~/git-workflow
### END SOLUTION

In [None]:
# Autograded test to check if the file exists

### Untracked files

`git status` reports that the new files are **untracked**.  Recall that untracked files are not recorded in repo's database, and that Git does not automatically track new files.  Git helpfully suggests that we should track them with the `git add` command, and we'll do that soon.  But first let's discuss the possible **states** for files in your working tree.  Knowing these states can help you troubleshoot many issues when you're working with your repo. 

In [14]:
%%bash

cd ~/git-workflow
git status

On branch main

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	README.md
	my_abs.py

nothing added to commit but untracked files present (use "git add" to track)


### The state of files in a working tree

Files in the working tree are either tracked or untracked:

* **Untracked**: No versions of the file are stored in the repo's database
* **Tracked**:  One or more versions of the file are in the database

Additionally, changes in tracked files are always in one of three states:

* **Unmodified**:  The file in the working is tree are up-to-date with the database.  Git sometimes refers to these changes as "up-to-date".  
* **Modfied**:  The file has changes that have not yet been stored in the database.  Furthermore, the user hasn't specified that these changes will be stored in the next database update.  
* **Staged**: The file's changes that will be stored in the next database update.  Git sometimes refers to these as "to be committed".  


<img src="../../img/lifecycle.png" width="800" />


(Image credit:  Scott Chacon and Ben Straub.  Pro Git, [Section 2.2](https://git-scm.com/book/en/v2/Git-Basics-Recording-Changes-to-the-Repository))


### `git add`: Stage new changes

In our working copy of `git_workflow`, we have two untracked files: `README.md` and `my_abs.py`.  To record their changes, we need two steps (as shown in the [diagram above](#The-state-of-files-in-a-working-tree)):

1. **Add the files:**  This changes the files' states from "Untracked" to "Staged".  At this point, the changes are *ready* to be stored in the database but *are not yet stored*.  
2. **Commit**:  This changes their state from "Staged" to "Unmodified".  At this point, the changes have actually been stored in the database.

To accomplish Step 1 (Untracked ➞ Staged), we use `git add`.  Its usage is:

```
git add <file1> [<file2> ...]
```

In [15]:
%%bash

cd ~/git-workflow
git add README.md
git status

On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   README.md

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	my_abs.py



#### Q4: In the above cell, the new changes made to the `README.md` file has been staged. Now, stage the new changes for `my_abs.py`. 

In [16]:
%%bash

cd ~/git-workflow
### BEGIN SOLUTION
git add my_abs.py
### END SOLUTION

In [18]:
# Autograded test to check if my_abs.py has been added as a tracked file in the repo.

### Modifying a staged file

What happens when you've staged a file, but before you commit it, you realize you need to fix something?  For example, in `README.md`, we made a typo.  We wrote "Spring 2024" instead of "Summer 2024".  

**Correct the date shown in the first line of `README.md` with our text editor, and then look at the `git status`:**

In [17]:
%%bash

cd ~/git-workflow
git status

On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   README.md
	new file:   my_abs.py



Git tells us:
* `README.md` and `my_abs.py` have changes that are staged ("to be committed")
* `README.md` also has changes that are not staged

How can one file have both staged and unstaged changes?  The reason is that `git add` stages the state of the file **at the exact moment** you run `git add`.  So if you run `git add`, and make additional changes afterwards (like changing "Spring" to "Summer"), then those additional changes are not automatically staged.  

### Staging new content (again)

To fix this, we'll run `git add` again to stage the new changes ("Spring" to "Summer").  You can see that Git suggests this, too, when it says: 'use "git add <file>..." to update what will be committed'

In [20]:
%%bash

cd ~/git-workflow
git add README.md

Now we see that all the changes in `README.md` have been staged, since there are no "Changes not staged for commit" anymore.  

In [21]:
%%bash

cd ~/git-workflow
git status

On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   README.md
	new file:   my_abs.py



### `git commit`: Updating the database

Finally, we'll use `git commit` to record the previously-staged changes to the database.  

You can use the `-m` option to specify a **commit message** on the command line.  This is a short message that lets other humans know what changes you've made.  Many projects have conventions about what info should go into a commit message.  Ask your project manager for details.  

(If you do not use `-m`, a text editor will pop up and prompt you to enter a message.)

In [22]:
%%bash

cd ~/git-workflow
git commit -m "First commit of README and my_abs (no try/except yet)"

[main (root-commit) f17a6d7] First commit of README and my_abs (no try/except yet)
 2 files changed, 8 insertions(+)
 create mode 100644 README.md
 create mode 100644 my_abs.py


### `git log`:  Showing the repo's history

Now that we actually have information in our repo's database, we can use `git log` to show the history.  `git log` has many options to show more or less information about the history.  You can run `git help log` to see all the available options.

In [23]:
%%bash

cd ~/git-workflow
git log

commit f17a6d75a3aea6712539bd3f3ee18a731a42fc60
Author: Min Lu <luming2k10@gmail.com>
Date:   Fri Jun 13 08:20:42 2025 -0400

    First commit of README and my_abs (no try/except yet)


## 4.2: Working with remotes

Up to this point, we've only made changes to the **local** repo on our computer.  We haven't touched GitHub at all.  Now, we are going to create a **remote** repo on GitHub and upload our local repo to it.

### Creating a remote repo

From the front page of [GitHub](https://github.com), you can create a new repo by clicking on the the "+" button on the top menu bar.  You can also go directly to [https://github.com/new](https://github.com/new)

<img src="../../img/working-with-remote-01.png" width="500" />

You should now see the "Create a new repository" page.  Since we are uploading an existing repository (instead of creating a new one), we'll only need a few options.

* **Repository template**:  GitHub has starter templates (with directories and some boilerplate code) for common types of projects.  We'll select "No template".
* **Owner**:  The repo's owner can be an individual account or an organization (a group of accounts).  A GitHub organization is a powerful tool for collaborating and managing permissions, and more likely than not, you'll be working in an organization for your VSIP project.  For this tutorial, let's use your individual GitHub account.
* **Repository name**: This will be the last part of the repo's URL.  When someone clones your repo, it will also be the default name for the project directory.  Let's give it the same name as your existing project directory.
* **Visibility**: You can make your repo either Public or Private, whichever you prefer
* **Initialize this repository** and below:  Since we are uploading an existing repo, we should not create a `README`, `.gitignore`, or license.

When you're finished, click the "Create repository" button

<img src="../../img/working-with-remote-02.png" width="500" />

You will now be taken to your repo's webpage, and you should see the following info.  We're going to focus on the section "...or push an existing repo from the command line."  In particular, we will focus on the `git remote add` command and the `git push` command. 

(*The `git branch -M main` command is intended for legacy Git repos that would like to change their default branch name.  Historically, Git's default branch name was "master".  Starting around 2020, the Git community began efforts to change this naming convention to "main".  This was widely embraced by stakeholders in the community, such as [Git](https://lore.kernel.org/git/pull.656.v4.git.1593009996.gitgitgadget@gmail.com/) itself, [GitHub](https://github.blog/changelog/2020-10-01-the-default-branch-for-newly-created-repositories-is-now-main/), [GitLab](https://about.gitlab.com/blog/2021/03/10/new-git-default-branch-name/), and [BitBucket](https://bitbucket.org/blog/moving-away-from-master-as-the-default-name-for-branches-in-git).  Today, in recent versions of Git, the default branch name is "main", so you will not need to explicitly change the default with `git branch -M main`*)

<img src="../../img/working-with-remote-03.png" width="600" />

### `git remote`:  Add and manage remotes

The `git remote add` command lets you link your local repo to a remote repo.  The syntax is:
```
git remote add <name> <url>
```
The `<url>` is the URL of your repo on GitHub.  The `<name>` is a shorter identifier for that URL, which makes life easier for humans.  By convention, humans use "origin" as the name for the default remote.  When you are collaborating with a team and have multiple remotes (which is quite common!), there are naming conventions for other remotes (such as "upstream").

Here, we only have one remote, so we name it **`origin`**.  

#### Q5: Link your local repo to your own remote GitHub repo using the following command. Make sure you use the URL from your own GitHub repo!

In [None]:
%%bash

cd ~/git-workflow
### BEGIN SOLUTION
git remote add origin git@github.com:GeorgeBurdell/git-workflow.git
### END SOLUTION

In [None]:
# Autotest to see if a remote has been added to the repo.

Now you can use the `git remote -v` command to see which remotes are defined.  You can see that this remote is setup for both fetching (downloading) and pushing (uploading).

In [25]:
%%bash

cd ~/git-workflow
git remote -v

### `git push`: Upload to the remote repo

The `git push` command is used to upload changes from your local repo to a defined remote repo.  The synax is:

```
git push <remote> <branch>
```

A useful first-time option is `-u`.  It sets `<remote>` as the default remote for `<branch>`.  We'll use that below.  Th effect is that, in subsequent pushes, we can use `git push` without any extra arguments to push `main` to `origin`

In [None]:
%%bash

cd ~/git-workflow
git push -u origin main

### Back to GitHub

Now you can go back to your repo's webpage on GitHub and see your repo's contents!  You might need to click the refresh button in your browser to see the updates.  

## 4.3: Downloading a repo

### `git clone`:  Download a repo

Now that your GitHub repo is populated, let's try downloading (or **cloning**) it somewhere else.  It's common to clone your repo on multiple computers depending on where you're working (like your laptop, a remote server, etc.).  For this demo, we'll just create another directory in `~/somewhere-else`

(You can use a different name or location, if you like)

In [None]:
%%bash

mkdir ~/somewhere-else
cd ~/somewhere-else

To download a remote repository from GitHub onto our local machine, we can either navigate to the GitHub website and download the files directly as zip file, or use the `git clone` command to download the repo. The syntax is:

```
git clone <url>
```

The URL is the same one you used with `git remote add`.  This will create a working copy of the local repo in a new directory.  The directory's name is the last part of the URL.  Afterwards,  we can enter the new working copy and see that our files are present.  We can also verify that our repo's history is present.

#### Q5: Download the remote repository you just created, `git-workflow`, from your GitHub repository into your newly created `~/somewhere-else` directory on your local machine. 

In [None]:
%%bash

### BEGIN SOLUTION
cd ~/somewhere-else
git clone git@github.com:GeorgeBurdell/git-workflow.git
### END SOLUTION

In [None]:
# Autotest to see if clone is successful

To check if your new `~/somewhere-else/git-workflow` is indeed the same directory as your GitHub repo, run the following cells.

In [None]:
%%bash

cd ~/somewhere-else/git-workflow
ls -la
git log

## 4.4: Adding more changes

### Back to our original working copy

Let's return to our original working copy and make some more changes.

### Making more changes

Let's add some new features to `my_abs.py`.  Copy/paste this into my_abs.py

``` python
import math

def my_abs(x):
    try:
        if x < 0:
            return -x
        else:
            return x
    except TypeError:
        return math.nan
```

### `git diff`: Show changes in modified files

At this point, `my_abs.py` is the in the "modified" state.  It has changes that have not yet been staged.  It can be very useful to see changes before you stage and commit. `git diff` lets you do exactly that.

The appearance of a "diff" is the same in many places.  GitHub and your IDE will use a very similar appearance when showing you differences in files.  Hence, understanding diffs is a key skill for understanding your code's history.  The meaning is:
* Lines that are green and/or begin with a "+" have been added to the newer version but are not present in the older vesion.
* Lines that are red and/or begin with a "-" are present in the older version but have been omitted in the newer version.
* Lines that are not colored are the same in both versions

In [None]:
%%bash

cd ~/git-workflow
git diff

### Staging and committing

As we mentioned, `my_abs.py` now has unstaged changes.  We can stage and commit them in one shot with the `-a` flag.  This will stage and commit changes to all currently **tracked** files.  

After committing, we can verify that there is a new commit in our history.  We'll use some additional options to show a more condensed log.  Logs are, by default, shown from most to least recent.  

In [None]:
%%bash

cd ~/git-workflow
git status
git commit -am "Added try/except"
git log --oneline --graph --branches --remotes 

### Pushing to remote

Finally, we'll push our updates to GitHub.  Go back to your GitHub webpage and see your updates!  (You might have to click refresh again.)

In [None]:
%%bash

cd ~/git-workflow
git push

## 4.5: Getting new changes from the remote

Like most things in Git, you have to be extremely explicit about updating your working copies.  If you push changes from one working copy, then other working copies are **not** automatically updated. You must explicitly download the new changes in the other working copies.  

### What's happening in our other working copy?

Let's go back to other working copy in `~/somewhere-else/git-workflow`.  The log shows that the working copy does *not* have the changes that we just pushed.  And if we look at `my_abs.py`, we'll see it's still at the older version.  

In [None]:
%%bash

cd ~/somewhere-else/git-workflow
git log
cat my_abs.py

### `git pull`: Download changes from the remote

We will bring this working copy up-to-date with `git pull`.  The syntax is:

```
git pull [<remote> <branch_name>]
```
If the branch has a default remote, then we can omit the extra arguments and just run `git pull`.  Now when we look at the log and the contents of `my_abs.py`, we see that everything's up-to-date with the remote.

In [None]:
%%bash

cd ~/somewhere-else/git-workflow
git pull

In [None]:
%%bash
cd ~/somwhere-else/git-workflow
git log
cat my_abs.py

### Back to our first working copy

We'll be working with our first working copy in the next section, so be sure to move back to `~/git-workflow` before continuing onto the next section. However, before continuing on with the rest of the lesson make sure to complete the following exercises!

### Quiz

1. The remote repository is not created on the local machine but is instead created by visiting `https://github.com/new`.

In [27]:
# Input the answer below as a Python boolean.
### BEGIN SOLUTION
a1 = True
### END SOLUTION

2. What is the best way to pull the latest changes from a remote Git repo?
   1. Visit the remote repo on GitHub and download the entire repository as a zip file.
   2. Use `git pull` to pull the new changes from the remote upstream.
   3. Use `git diff` to spot the differences between the remote repository and the local repository and manually update the relevant files.

In [28]:
# Input the answer below as a string, such as 'A', 'B', 'C', ...
### BEGIN SOLUTION
a2 = 'B'
### END SOLUTION

3. A Git repo is the same thing as a GitHub repo.

In [None]:
# Input the answer below as a Python boolean.
### BEGIN SOLUTION
a3 = False
### END SOLUTION

4. The correct order of commands to ensure that the changes made in the local repository is reflected in the remote repository is:
    1. commit, stage, push
    2. push, commit, stage
    3. add, push, commit
    4. add, commit, push

In [29]:
# Input the answer below as a string, such as 'A', 'B', 'C', ...
### BEGIN SOLUTION
a4 = 'D'
### END SOLUTION

In [None]:
### BEGIN HIDDEN TESTS
assert a1 == True
assert a2 == 'B'
assert a3 == False
assert a4 == 'D'
### END HIDDEN TESTS