To use this notebook interactively, install the `bash_kernel` Python package with either:
* `pip install bash_kernel`
* `conda install -c conda-forge bash_kernel`

# OSS Module 01 - Git Basics

# Section 1: What is version control?

# Section 2: Git and GitHub setup

# Section 3:  A Git project directory

## Parts of a local Git project

When you download a Git repo, the contents are stored on your computer as normal files in a normal directory.  In Git terminology, these files are the **working tree**.  You can edit these files freely.  

On your computer, changes to the files are stored in the **local repository** or local **database**.  Soon, we will learn how to store (or **commit**) new versions of the files; and how to retrieve (or **checkout**) previous versions of the files.  

The contents of this repo (`oss-training`) are shown below.  The database and metadata are stored in the hidden `.git/` directory.  You can treat this as a black box, and your only interaction with the database and metadata should be through via `git` commands.  While some of the files in `.git/` are human-readible, they should *never* be manually edited, since manual edits can cause an inconsistent state.

In [1]:
ls -la

total 40
drwxr-xr-x  10 rrahaman6  staff   320 Apr 12 08:26 [1m[36m.[39;49m[0m
drwxr-xr-x  49 rrahaman6  staff  1568 Apr 11 10:51 [1m[36m..[39;49m[0m
-rw-r--r--@  1 rrahaman6  staff  6148 Apr 12 08:26 .DS_Store
drwxr-xr-x  13 rrahaman6  staff   416 Apr 25 13:20 [1m[36m.git[39;49m[0m
drwxr-xr-x   3 rrahaman6  staff    96 Apr 11 11:19 [1m[36m.ipynb_checkpoints[39;49m[0m
-rw-r--r--   1 rrahaman6  staff    81 Apr 11 10:37 README.md
drwxr-xr-x   3 rrahaman6  staff    96 Apr 11 14:16 [1m[36mgit_workflow[39;49m[0m
drwxr-xr-x   4 rrahaman6  staff   128 Apr 11 11:27 [1m[36mimg[39;49m[0m
-rw-r--r--   1 rrahaman6  staff  8088 Apr 11 14:46 oss-module-01-git.ipynb
drwxr-xr-x   2 rrahaman6  staff    64 Apr 11 11:19 [1m[36msrc[39;49m[0m


## Tracked vs. Untracked Files

Git does not automatically add files from the working tree to the database.  Instead, the user must explicitly specify which files are stored in the database (**tracked**) and which are not (**untracked**).

In this repo, the *tracked* files are: the README, Jupyter notebooks, `img/` directory, and `src/` directory.  If you edit and save this notebook, your working tree will have an *untracked* directory `.ipynb_checkpoints/` that are save files for Jupyter.  

It's important to consider which files to *not* track.  In a Python project, you should not track byte code (`*.pyc` and `*.pyo` in `__pycache__` directories) since they will be automatically regenerated whenever someone else runs it with a different Python version.  Tracking them will just take up unnecessary space in the database.  

# Section 4: A Git Workflow from Start to Finish

## `git init`: Creating a local repo

The `git init` command creates a new repo in the current directory.  Below, we create a new directory, `~/git_workfow` and initialize an repo inside it.

In [2]:
mkdir ~/git_workflow
cd ~/git_workflow
git init

Initialized empty Git repository in /Users/rrahaman6/git_workflow/.git/


You can see the `.git/` directory was created for the database.  

In [3]:
ls -lha

total 0
drwxr-xr-x    3 rrahaman6  staff    96B Apr 25 13:24 [1m[36m.[39;49m[0m
drwxr-xr-x+ 192 rrahaman6  staff   6.0K Apr 25 13:24 [1m[36m..[39;49m[0m
drwxr-xr-x    9 rrahaman6  staff   288B Apr 25 13:24 [1m[36m.git[39;49m[0m


## `git help`: Getting help

`git help <command>` will show a manual page for the given command.  They describe all the available options with examples at the bottom.

In [None]:
git help init

## `git status`: Show the state of the working tree

`git status` shows the state of tracked files, untracked files, and the database. Right now, the working tree is empty, and the database is empty, so there is not much to report.  

We already have a **branch** called `main`, which was created when the repo was initialized.  We will cover branches in detail soon.  For now, you can think of them like branches of a family tree that describe your project's history.  

In [5]:
git status

On branch main

No commits yet

nothing to commit (create/copy files and use "git add" to track)


## Creating new files

To create and edit files in your working tree, you can use any text editor.  In `~/git_workflow`, let's create two files.

First, create a file named `README.md` file with these contents.  You might notice a mistake; don't correct it right now! :)

``` text
# Git Workflow

This is an example repo for the GT OSPO VSIP Spring 2024 Program.  
```

Then, let's create a Python source file named `my_abs.py` with these contents. We will be making changes to `my_abs.py` throughout this demo.  

``` python
def my_abs(x):
    if x < 0:
        return -x
    else:
        return x
```

## Untracked files

When we look at at `git status`, we see that the new files are **untracked**.  Recall that untracked files are not recorded in repo's database.  Git does not autmatically track new files, and the user must be very intentional about what to track.

In [5]:
git status

On branch main

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	[31mREADME.md[m
	[31mmy_abs.py[m

nothing added to commit but untracked files present (use "git add" to track)


## The state of files in a working tree

To understand how to record changes to our new files, let's describe possible **states** of files in your working tree.  

Files in the working tree are either tracked or untracked:

* **Untracked**: No versions of the file are stored in the repo's database
* **Tracked**:  One or more versions of the file are in the database

Additionally, changes in tracked files are always in one of three states:

* **Unmodified**:  The file in the working is tree are up-to-date with the database.  Git sometimes refers to these changes as "up-to-date".  
* **Modfied**:  The file has changes that have not yet been stored in the database.  Furthermore, the user hasn't specified that these changes will be stored in the next database update.  
* **Staged**: The file's changes that will be stored in the next database update.  Git sometimes refers to these as "to be committed".  


<img src="img/lifecycle.png" width="800" />

(Image credit:  Scott Chacon and Ben Straub.  Pro Git, [Section 2.2](https://git-scm.com/book/en/v2/Git-Basics-Recording-Changes-to-the-Repository))

## `git add`: Stage new changes



In our working copy of `git_workflow`, we have two untracked files: `README.md` and `my_abs.py`.  To record their changes, we need two steps (as shown in the [diagram above](#The-state-of-files-in-a-working-tree)):

1. **Add the files:**  This changes their state from "Untracked" to "Staged".  At this point, the changes are *ready* to be stored in the database but *are not yet stored*.  
2. **Commit**:  This changes their state from "Staged" to "Unmodified".  At this point, the changes have actually been stored in the database.

To accomplish Step 1 (Untracked to Staged), we use `git add`.  Its usage is:

```
git add <file1> [<file2> ...]
```

In [7]:
git add README.md my_abs.py

Now when we run `git status`, we can see that the files' changes are "to be committed", meaning that they are staged for the next database update.

In [8]:
git status

On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	[32mnew file:   README.md[m
	[32mnew file:   my_abs.py[m



## Modifying as staged file

What happens when you've staged a file, but before you commit it, you realize you need to fix something?  

For example, in `README.md`, we made a typo.  We wrote "Spring 2024" instead of "Summer 2024".  We'll go ahead and correct that change with our text editor, and then look at the `git status`:

In [9]:
git status

On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	[32mnew file:   README.md[m
	[32mnew file:   my_abs.py[m

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	[31mmodified:   README.md[m



Git tells us:
* `README.md` and `my_abs.py` have changes that are staged ("to be committed")
* `README.md` also has changes that are not staged

How can one file have both staged and unstaged changes?  The reason is that `git add` stages the state of the file **at the exact moment** you run `git add`.  So if you run `git add`, and make additional changes (like changing "Spring" to "Summer"), then those additional changes are not automatically staged.  

## Staging new content (again)

To fix this, we'll run `git add` again to stage the new changes ("Spring" to "Summer").  You can see that Git suggests this, too, when it says: 'use "git add <file>..." to update what will be committed'

In [10]:
git add README.md

Now we see that all the changes in `README.md` have been staged, since there are no "Changes not staged for commit" anymore.  

In [11]:
git status

On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	[32mnew file:   README.md[m
	[32mnew file:   my_abs.py[m



## `git commit`: Updating the database

Finally, we will use `git commit` to record staged changes to the database.  

You can use the `-m` option to specify a **commit message** on the command line.  This is a short message that lets other humans know what changes you've made.  Many projects have conventions about what info should go into a commit message.  Ask your project manager for details.  

(If you do not use `-m`, a text editor will pop up and prompt you to enter a message.  This is often less convenient than using `-m`)

In [12]:
git commit -m "First commit of my_abs (no try/except yet)"

[main (root-commit) d8b803d] First commit of my_abs (no try/except yet)
 2 files changed, 8 insertions(+)
 create mode 100644 README.md
 create mode 100644 my_abs.py


## `git log`:  Showing the repo's history

Now that we actually have information in our repo's database, we can use `git log` to show the history.  `git log` has many options to show more or less information about the history.

In [13]:
git log

[33mcommit d8b803d9c647a9c4c981ba400c4f802074aa9724[m[33m ([m[1;36mHEAD -> [m[1;32mmain[m[33m)[m
Author: Ron Rahaman <ron.rahaman@outlook.com>
Date:   Fri Apr 26 12:10:26 2024 -0500

    First commit of my_abs (no try/except yet)
