# [**<Center>Version Control with Git</Center>**]()

## **Tracking Changes**

### **Questions**
* How do I record changes in Git?

* How do I check the status of my version control repository?

* How do I record notes about what changes I made and why?



### **Objectives**
* Go through the modify-add-commit cycle for one or more files.

* Explain where information is stored at each stage of that cycle.

* Distinguish between descriptive and non-descriptive commit messages.

---

* First let’s make sure we’re still in the right directory. You should be in the `SRC` directory created in the previous step.

```bash
pwd

/home/user/git_tutorial/SRC
```

If you are not in `SRC`, navigate back to it with: `cd /home/user/git_tutorial/SRC`

* Let’s create a file called `keyval.py` that contains some `key:value` pair (it is a python dictionary).

```bash
touch keyval.py
```
* Modify its content, the file will look like:
```python
# istantiate a simple python dictionary
keyvals = {'key': 'value'}
```

We’ll use the editor available in the `jupyter lab` environment. On a local machine you can, of course, edit the file with whatever editor you like (e.g. `gedit` or `nano`). In particular, this does not have to be the `core.editor` you set globally earlier. 


In [1]:
cd SRC
touch keyval.py
echo "# istantiate a simple python dictionary" >> keyval.py
echo "keyvals = {'key': 'value'}" >> keyval.py
cat keyval.py

# istantiate a simple python dictionary
keyvals = {'key': 'value'}


### **`git status`**

If we check the status of our project again, Git tells us that it’s noticed the new file:

```bash
git status
```

In [2]:
git status


On branch master

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	[31mkeyval.py[m

nothing added to commit but untracked files present (use "git add" to track)


### **`git add`**

The “untracked files” message means that there’s a file in the directory that Git isn’t keeping track of. We can tell Git to track a file using `git add`:



In [3]:
git add keyval.py
git status

On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

	[32mnew file:   keyval.py[m



### **`git rm`**
You can also remove an added file by calling ´git rm --cached filename´

In [4]:
git rm --cached keyval.py

rm 'keyval.py'


In [5]:
git status

On branch master

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	[31mkeyval.py[m

nothing added to commit but untracked files present (use "git add" to track)


* Now let's **re-add** the file, again with `git add`

In [6]:
git add keyval.py
git status

On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

	[32mnew file:   keyval.py[m



### **`git commit`**

Git now knows that it’s supposed to keep track of `keyvalue.py`, but it hasn’t recorded these changes as a commit yet. To get it to do that, we need to run one more command:

```bash
git commit -m "Start notes on Mars as a base"
```

In [7]:
git commit -m "Adding a simple python dictionary"

[master (root-commit) defbe0c] Adding a simple python dictionary
 1 file changed, 2 insertions(+)
 create mode 100644 keyval.py


&nbsp;

When we run `git commit`, Git takes everything we have told it to save by using `git add` and stores a copy permanently inside the special `.git` directory. This permanent copy is called a [commit](https://swcarpentry.github.io/reference.html#commit) (or [revision](https://swcarpentry.github.io/reference.html#revision)) and its short identifier is `e6580eb`. Your commit may have another identifier.


We use the `-m` flag (for “message”) to record a short, descriptive, and specific comment that will help us remember later on what we did and why. If we just run `git commit` without the `-m` option, Git will launch nano (or whatever other editor we configured as core.editor) so that we can write a longer message.

[Good commit messages](https://chris.beams.io/posts/git-commit/) start with a brief (<50 characters) statement about the changes made in the commit. Generally, the message should complete the sentence “If applied, this commit will” . If you want to go into more detail, add a blank line between the summary line and your additional notes. Use this additional space to explain why you made changes and/or what their impact will be.



### **`git log`**

If we run `git status` now:

In [8]:
git status

On branch master
nothing to commit, working tree clean


it tells us everything is up to date. If we want to know what we’ve done recently, we can ask Git to show us the project’s history using `git log`:

In [9]:
git log

[33mcommit defbe0cc4e05327d7bb0da8b03b9a6e71a522ecb[m[33m ([m[1;36mHEAD -> [m[1;32mmaster[m[33m)[m
Author: Massimo Di Stefano <epiesasha@me.com>
Date:   Sun Jun 16 18:17:57 2019 +0200

    Adding a simple python dictionary


&nbsp;

`git log` lists all commits made to a repository in reverse chronological order. The listing for each commit includes the commit’s full identifier (which starts with the same characters as the short identifier printed by the `git commit` command earlier), the commit’s author, when it was created, and the log message Git was given when the commit was created.

### **`git diff`**

In [10]:
cat keyval.py

# istantiate a simple python dictionary
keyvals = {'key': 'value'}


&nbsp;
    
* let's now modify the content of the file so that it looks like:

```python
# istantiate a simple python dictionary
keyvals = {'key1': 'value1',
           'key2': 'value2'}
```


When we run `git status` now, it tells us that a file it already knows about has been modified:



In [11]:
# Open the file keyval.py for editing and apply the changes

In [12]:
git status

On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	[31mmodified:   keyval.py[m

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	[31m.ipynb_checkpoints/[m

no changes added to commit (use "git add" and/or "git commit -a")


&nbsp;

The last line is the key phrase: “no changes added to commit”. We have changed this file, but we haven’t told Git we will want to save those changes (which we do with `git add`) nor have we saved them (which we do with `git commit`). So let’s do that now. It is good practice to always review our changes before saving them. We do this using `git diff`. This shows us the differences between the current state of the file and the most recently saved version:

In [13]:
git diff

[1mdiff --git a/keyval.py b/keyval.py[m
[1mindex def0459..1a8d753 100644[m
[1m--- a/keyval.py[m
[1m+++ b/keyval.py[m
[36m@@ -1,2 +1,3 @@[m
 # istantiate a simple python dictionary[m
[31m-keyvals = {'key': 'value'}[m
[32m+[m[32mkeyvals = {'key1': 'value1',[m
[32m+[m[32m           'key2': 'value2'}[m


&nbsp;

The output is cryptic because it is actually a series of commands for tools like editors and `patch` telling them how to reconstruct one file given the other. If we break it down into pieces:

1. The first line tells us that Git is producing output similar to the Unix `diff` command comparing the old and new versions of the file.
2. The second line tells exactly which versions of the file Git is comparing; it uses unique computer-generated labels for those versions.
3. The third and fourth lines once again show the name of the file being changed.
4. The remaining lines are the most interesting, they show us the actual differences and the lines on which they occur. In particular, the `+/-` markers in the first column shows where we *added / removed* a line.

After reviewing our change, it’s time to commit it:

In [14]:
git commit -m "Added new key:value pair in keyval.py"
git status

On branch master
Changes not staged for commit:
	[31mmodified:   keyval.py[m

Untracked files:
	[31m.ipynb_checkpoints/[m

no changes added to commit
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	[31mmodified:   keyval.py[m

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	[31m.ipynb_checkpoints/[m

no changes added to commit (use "git add" and/or "git commit -a")


**Whoops**: Git **won’t commit** because we didn’t use `git add` first. Let’s fix that:

In [15]:
git add keyval.py
git commit  -m "Added new key:value pair in keyval.py"

[master 79baec7] Added new key:value pair in keyval.py
 1 file changed, 2 insertions(+), 1 deletion(-)


Git insists that we add files to the set we want to commit before actually committing anything. This allows us to commit our changes in stages and capture changes in logical portions rather than only large batches. For example, suppose we’re adding a few citations to relevant research to our thesis. We might want to commit those additions, and the corresponding bibliography entries, but not commit some of our work drafting the conclusion (which we haven’t finished yet).

To allow for this, Git has a special staging area where it keeps track of things that have been added to the current [changeset](https://swcarpentry.github.io/reference.html#changeset) but not yet committed.

**More:** you notice that by running in a `jupyter notebook` environment you will probably end up in the situation where `git` wants to keep track of the `.ipynb_checkpoints/` hidden directory (used internaly by jypyter).
This is the right use case for using a `.gitignore` file. Just put the following line (including the trailing slash) in a `.gitignore` file in the main directory of the Git repository. If you want only ignore all the files of a spectific directory, add a `.gitignore` file in such directory and add a line with `*` in it.


<blockquote class="callout">
  <h2 id="staging-area">Staging Area</h2>

  <p>If you think of Git as taking snapshots of changes over the life of a project,
<code class="highlighter-rouge">git add</code> specifies <em>what</em> will go in a snapshot
(putting things in the staging area),
and <code class="highlighter-rouge">git commit</code> then <em>actually takes</em> the snapshot, and
makes a permanent record of it (as a commit).
If you don’t have anything staged when you type <code class="highlighter-rouge">git commit</code>,
Git will prompt you to use <code class="highlighter-rouge">git commit -a</code> or <code class="highlighter-rouge">git commit --all</code>,
which is kind of like gathering <em>everyone</em> for the picture!
However, it’s almost always better to
explicitly add things to the staging area, because you might
commit changes you forgot you made. (Going back to snapshots,
you might get the extra with incomplete makeup walking on
the stage for the snapshot because you used <code class="highlighter-rouge">-a</code>!)
Try to stage things manually,
or you might find yourself searching for “git undo commit” more
than you would like!</p>
</blockquote>
<center><img src="https://swcarpentry.github.io/git-novice/fig/git-staging-area.svg"></center>

Let’s watch as our changes to a file move from our editor to the staging area and into long-term storage. First, we’ll add another `key:value` pair to the `keyval.py` file, so that it will ook like:

```python
# istantiate a simple python dictionary
keyvals = {'key1': 'value1',
           'key2': 'value2',
           'key3': 'value3'}
```
And then check the `diff` log with:

```bash
git diff
```

In [17]:
cat keyval.py

# istantiate a simple python dictionary
keyvals = {'key1': 'value1',
           'key2': 'value2',
           'key3': 'value3'}



In [18]:
git diff

[1mdiff --git a/keyval.py b/keyval.py[m
[1mindex 1a8d753..a25f46a 100644[m
[1m--- a/keyval.py[m
[1m+++ b/keyval.py[m
[36m@@ -1,3 +1,5 @@[m
 # istantiate a simple python dictionary[m
 keyvals = {'key1': 'value1',[m
[31m-           'key2': 'value2'}[m
[32m+[m[32m           'key2': 'value2',[m
[32m+[m[32m           'key3': 'value3'}[m
[41m+[m


<p>So far, so good:
we’ve added one line to the end of the file
(shown with a <code class="highlighter-rouge">+</code> in the first column).
Now let’s put that change in the staging area
and see what <code class="highlighter-rouge">git diff</code> reports:</p>

In [19]:
git add keyval.py
git diff

<p>There is no output:
as far as Git can tell,
there’s no difference between what it’s been asked to save permanently
and what’s currently in the directory.
However,
if we do this:</p>

In [20]:
git diff --staged

[1mdiff --git a/keyval.py b/keyval.py[m
[1mindex 1a8d753..a25f46a 100644[m
[1m--- a/keyval.py[m
[1m+++ b/keyval.py[m
[36m@@ -1,3 +1,5 @@[m
 # istantiate a simple python dictionary[m
 keyvals = {'key1': 'value1',[m
[31m-           'key2': 'value2'}[m
[32m+[m[32m           'key2': 'value2',[m
[32m+[m[32m           'key3': 'value3'}[m
[41m+[m


it shows us the difference between the last committed change and what’s in the staging area. Let’s save our changes:

In [21]:
git commit -m "Adding a third key:value pair to the dictionary"

[master 8d403db] Adding a third key:value pair to the dictionary
 1 file changed, 3 insertions(+), 1 deletion(-)


check our status:

In [22]:
git status

On branch master
Untracked files:
  (use "git add <file>..." to include in what will be committed)

	[31m.ipynb_checkpoints/[m

nothing added to commit but untracked files present (use "git add" to track)


and look at the history of what we’ve done so far:

In [23]:
 git log

[33mcommit 8d403db2d6cc9d3c555edc9f95691c37351b0138[m[33m ([m[1;36mHEAD -> [m[1;32mmaster[m[33m)[m
Author: Massimo Di Stefano <epiesasha@me.com>
Date:   Sun Jun 16 18:20:15 2019 +0200

    Adding a third key:value pair to the dictionary

[33mcommit 79baec794cffb70d701371de4904d8155c5f85cf[m
Author: Massimo Di Stefano <epiesasha@me.com>
Date:   Sun Jun 16 18:19:17 2019 +0200

    Added new key:value pair in keyval.py

[33mcommit defbe0cc4e05327d7bb0da8b03b9a6e71a522ecb[m
Author: Massimo Di Stefano <epiesasha@me.com>
Date:   Sun Jun 16 18:17:57 2019 +0200

    Adding a simple python dictionary


<blockquote class="callout">
  <h2 id="word-based-diffing">Word-based diffing</h2>

  <p>Sometimes, e.g. in the case of the text documents a line-wise
diff is too coarse. That is where the <code class="highlighter-rouge">--color-words</code> option of
<code class="highlighter-rouge">git diff</code> comes in very useful as it highlights the changed
words using colors.</p>
</blockquote>

<blockquote class="callout">
  <h2 id="paging-the-log">Paging the Log</h2>

  <p>When the output of <code class="highlighter-rouge">git log</code> is too long to fit in your screen,
<code class="highlighter-rouge">git</code> uses a program to split it into pages of the size of your screen.
When this “pager” is called, you will notice that the last line in your
screen is a <code class="highlighter-rouge">:</code>, instead of your usual prompt.</p>

  <ul>
    <li>To get out of the pager, press <kbd>Q</kbd>.</li>
    <li>To move to the next page, press <kbd>Spacebar</kbd>.</li>
    <li>To search for <code class="highlighter-rouge">some_word</code> in all pages,
press <kbd>/</kbd>
and type <code class="highlighter-rouge">some_word</code>.
Navigate through matches pressing <kbd>N</kbd>.</li>
  </ul>
</blockquote>

<blockquote class="callout">
  <h2 id="limit-log-size">Limit Log Size</h2>

  <p>To avoid having <code class="highlighter-rouge">git log</code> cover your entire terminal screen, you can limit the
number of commits that Git lists by using <code class="highlighter-rouge">-N</code>, where <code class="highlighter-rouge">N</code> is the number of
commits that you want to view. For example, if you only want information from
the last commit you can use:</p>

  ```bash
  git log -1
  ```
    
  <p>You can also reduce the quantity of information using the

  ```bash      
  git log --oneline
  ```

  <p>You can also combine the <code class="highlighter-rouge">--oneline</code> options with others. One useful
combination is:</p>

  ```bash
  git log --oneline --graph --all --decorate
  ```
</blockquote>

<blockquote class="callout">
  <h2 id="directories">Directories</h2>

Two important facts you should know about directories in Git.

* Git does not track directories on their own, only files within them.
* Git does not track directories on their own, only files within them. (e.g. empty directories). See `.gitkeep` and `.gitignore`.
* If you create a directory in your Git repository and populate it with files,
you can add all files in the directory at once by:
    
  ```bash
   git add <directory-with-files>
  ```

</blockquote>

* Before moving on, we will make sure we commit the last changes:

In [24]:
git commit -m "Adding a third key:value pair to the dictionary"

On branch master
Untracked files:
	[31m.ipynb_checkpoints/[m

nothing added to commit but untracked files present


: 1

&nbsp;

To recap, when we want to add changes to our repository, we first need to add the changed files to the staging area (git add) and then commit the staged changes to the repository (git commit):

<center><img src="https://swcarpentry.github.io/git-novice/fig/git-committing.svg"></center>


<blockquote class="keypoints">
  <h2>Key Points</h2>

* `git status` shows the status of a repository.

* Files can be stored in a project’s working directory (which users see), the staging area (where the next commit is being built up) and the local repository (where commits are permanently recorded).

* `git add` puts files in the staging area.

* `git commit` saves the staged content as a new commit in the local repository.

* Write a commit message that accurately describes your changes.
</blockquote>