<div id="body">
    <center>
    <a href="03 Tracking Changes.ipynb">  <font size="8"> &lt; </font></a>
    <a href="index.ipynb">  <font size="8"> Version Control with Git </font> </a>
    <a href="05 Ignore Things.ipynb">  <font size="8"> &gt; </font></a>
    </center>
</div>

# **Exploring History**

## **Questions**

* How can I identify old versions of files?

* How do I review my changes?

* How can I recover old versions of files?



## **Objectives**
* Explain what the HEAD of a repository is and how to use it.

* Identify and use Git commit numbers.

* Compare various versions of tracked files.

* Restore old versions of files.

---

As we saw in the previous lesson, we can refer to commits by their identifiers. You can refer to the most recent commit of the working directory by using the identifier `HEAD`.

We’ve been adding one `key:value` pair at a time to `keyval.py`, so it’s easy to track our progress by looking, so let’s do that using our `HEAD`s. Before we start, let’s make a change to `keyval.py`, adding yet another `key:value` pair so that it looks like:

```python
# istantiate a simple python dictionary
keyvals = {'key1': 'value1',
           'key2': 'value2',
           'key3': 'value3',
           'key4': 'value4'}
```




In [1]:
# Open the file keyval.py for editing and apply the changes

Now, let's see what we get.

In [2]:
pwd

/home/epinux/notebooks/git


In [3]:
cd SRC

In [4]:
git diff HEAD keyval.py

which is the same as what you would get if you leave out `HEAD` (try it). The real goodness in all this is when you can refer to previous commits. We do that by adding `~1` (where “~” is “tilde”, pronounced [til-duh]) to refer to the commit one before `HEAD`.

In [5]:
git diff HEAD~1 keyval.py

[1mdiff --git a/keyval.py b/keyval.py[m
[1mindex 1a8d753..a25f46a 100644[m
[1m--- a/keyval.py[m
[1m+++ b/keyval.py[m
[36m@@ -1,3 +1,5 @@[m
 # istantiate a simple python dictionary[m
 keyvals = {'key1': 'value1',[m
[31m-           'key2': 'value2'}[m
[32m+[m[32m           'key2': 'value2',[m
[32m+[m[32m           'key3': 'value3'}[m
[41m+[m


If we want to see the differences between older commits we can use `git diff` again, but with the notation `HEAD~1`, `HEAD~2`, and so on, to refer to them:

In [6]:
git diff HEAD~2 keyval.py

[1mdiff --git a/keyval.py b/keyval.py[m
[1mindex def0459..a25f46a 100644[m
[1m--- a/keyval.py[m
[1m+++ b/keyval.py[m
[36m@@ -1,2 +1,5 @@[m
 # istantiate a simple python dictionary[m
[31m-keyvals = {'key': 'value'}[m
[32m+[m[32mkeyvals = {'key1': 'value1',[m
[32m+[m[32m           'key2': 'value2',[m
[32m+[m[32m           'key3': 'value3'}[m
[41m+[m


We could also use `git show` which shows us what changes we made at an older commit as well as the commit message, rather than the differences between a commit and our working directory that we see by using `git diff`.

In [7]:
git show HEAD~2 keyval.py

[33mcommit defbe0cc4e05327d7bb0da8b03b9a6e71a522ecb[m
Author: Massimo Di Stefano <epiesasha@me.com>
Date:   Sun Jun 16 18:17:57 2019 +0200

    Adding a simple python dictionary

[1mdiff --git a/keyval.py b/keyval.py[m
[1mnew file mode 100644[m
[1mindex 0000000..def0459[m
[1m--- /dev/null[m
[1m+++ b/keyval.py[m
[36m@@ -0,0 +1,2 @@[m
[32m+[m[32m# istantiate a simple python dictionary[m
[32m+[m[32mkeyvals = {'key': 'value'}[m


In this way, we can build up a chain of commits. The most recent end of the chain is referred to as `HEAD`; we can refer to previous commits using the `~` notation, so `HEAD~1` means “the previous commit”, while `HEAD~123` goes back 123 commits from where we are now.

**Note**: We can also refer to commits using those long strings of digits and letters that `git log` displays. These are unique IDs for the changes, and “unique” really does mean unique: every change to any set of files on any computer has a unique 40-character identifier. 

Example syntax: `git diff 3fc611cdb8f706e8e49e14f33e6e22d036502411 keyval.py`
Git also allows us to use just the first few characters like:
``git diff 3fc611 keyval.py``

### `git checkout`

<span style="text-decoration:underline">All right!</span> So we can save changes to files and see what we’ve changed—now how can we restore older versions of things? Let’s suppose we change our mind about the last update to `keyval.py` ... 



`git status` now tells us that the file has been changed, but those changes haven’t been staged:

In [10]:
git status

On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	[31mmodified:   keyval.py[m

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	[31m.ipynb_checkpoints/[m

no changes added to commit (use "git add" and/or "git commit -a")


In [11]:
cat keyval.py

# istantiate a simple python dictionary
keyvals = {'key1': 'value1',
           'key2': 'value2',
           'key3': 'value3',
           'key4': 'value4'}



We can put things back the way they were by using git checkout:

In [12]:
git checkout HEAD keyval.py
cat keyval.py

# istantiate a simple python dictionary
keyvals = {'key1': 'value1',
           'key2': 'value2',
           'key3': 'value3'}



As you might guess from its name, `git checkout` checks out (i.e., restores) an old version of a file. In this case, we’re telling Git that we want to recover the version of the file recorded in `HEAD`, which is the last saved commit. If we want to go back even further, we can use a commit identifier instead:

In [13]:
git checkout HEAD~2 keyval.py 

In [14]:
cat keyval.py

# istantiate a simple python dictionary
keyvals = {'key': 'value'}


In [15]:
git status

On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

	[32mmodified:   keyval.py[m

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	[31m.ipynb_checkpoints/[m



Notice that the changes are on the staged area. Again, we can put things back the way they were by using `git checkout`:


In [16]:
git checkout HEAD keyval.py

In [17]:
cat keyval.py

# istantiate a simple python dictionary
keyvals = {'key1': 'value1',
           'key2': 'value2',
           'key3': 'value3'}



**NOTE:** be careful! The command checkout has other important functionalities and Git will misunderstand your intentions if you are not accurate with the typing. For example, if you forget `keyval.py` in the previous command.

In [18]:
git checkout HEAD~2

Note: checking out 'HEAD~2'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at defbe0c Adding a simple python dictionary


&nbsp;

The “detached HEAD” is like “look, but don’t touch” here, so you shouldn’t make any changes in this state. After investigating your repo’s past state, reattach your `HEAD` with `git checkout master`.

In [19]:
git checkout master

Previous HEAD position was defbe0c Adding a simple python dictionary
Switched to branch 'master'


In [20]:
git status

On branch master
Untracked files:
  (use "git add <file>..." to include in what will be committed)

	[31m.ipynb_checkpoints/[m

nothing added to commit but untracked files present (use "git add" to track)


In [21]:
cat keyval.py

# istantiate a simple python dictionary
keyvals = {'key1': 'value1',
           'key2': 'value2',
           'key3': 'value3'}



It’s important to remember that we must use the commit number that identifies the state of the repository before the change we’re trying to undo. A common mistake is to use the number of the commit in which we made the change we’re trying to get rid of. 

<img src="static/images/git-checkout.svg">


So, to put it all together, here’s how Git works in cartoon form:

&nbsp;

<img src="static/images/git_staging.svg">


<blockquote class="callout">
  <h2 id="simplifying-the-common-case">Simplifying the Common Case</h2>

  <p>If you read the output of <code class="highlighter-rouge">git status</code> carefully,
you’ll see that it includes this hint:</p>

  <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">(</span>use <span class="s2">"git checkout -- &lt;file&gt;..."</span> to discard changes <span class="k">in </span>working directory<span class="o">)</span>
</code></pre></div>  </div>

  <p>As it says,
<code class="highlighter-rouge">git checkout</code> without a version identifier restores files to the state saved in <code class="highlighter-rouge">HEAD</code>.
The double dash <code class="highlighter-rouge">--</code> is needed to separate the names of the files being recovered
from the command itself:
without it,
Git would try to use the name of the file as the commit identifier.</p>
</blockquote>

The fact that files can be reverted one by one tends to change the way people organize their work. If everything is in one large document, it’s hard (but not impossible) to undo changes to the introduction without also undoing changes made later to the conclusion. If the introduction and conclusion are stored in separate files, on the other hand, moving backward and forward in time becomes much easier.

<blockquote class="challenge">
  <h2 id="recovering-older-versions-of-a-file">Recovering Older Versions of a File</h2>

    Jennifer has made changes to the Python script that she has been working on for weeks, and the
modifications she made this morning “broke” the script and it no longer runs. She has spent
~ 1hr trying to fix it, with no luck…

Luckily, she has been keeping track of her project’s versions using Git! Which commands below will
let her recover the last committed version of her Python script called `data_cruncher.py`?</p>

1. `git checkout HEAD`
2. `git checkout HEAD data_cruncher.py`
3. `git checkout HEAD~1 data_cruncher.py`
4. `git checkout <unique ID of last commit> data_cruncher.py`
5. `Both 2 and 4`

<blockquote class="solution">
  <h2 id="solution">solution</h2>  
The answer is (5)-Both 2 and 4.</p>
The `checkout` command restores files from the repository, overwriting the files in your working directory. Answers 2 and 4 both restore the latest version in the repository of the file `data_cruncher.py`. Answer 2 uses `HEAD` to indicate the latest, whereas answer 4 uses the unique ID of the last commit, which is what `HEAD` means.

Answer 3 gets the version of `data_cruncher.py` from the commit before `HEAD`, which is NOT what we wanted.

Answer 1 can be dangerous! Without a filename, `git checkout` will restore **all files** in the current directory (and all directories below it) to their state at the commit specified. This command will restore `data_cruncher.py` to the latest commit version, but it will also restore any other files that are changed to that version, erasing any changes you may have made to those files! As discussed above, you are left in a detached `HEAD` state, and you don’t want to be there.
    </blockquote>
</blockquote>

<blockquote class="challenge">
  <h2 id="reverting-a-commit">Reverting a Commit</h2>

  <p>Jennifer is collaborating on her Python script with her colleagues and
realizes her last commit to the group repository is wrong and wants to
undo it.  Jennifer needs to undo correctly so everyone in the group
repository gets the correct change.  <code class="highlighter-rouge">git revert [wrong commit ID]</code>
will make a new commit that undoes Jennifer’s previous wrong
commit. Therefore <code class="highlighter-rouge">git revert</code> is different than <code class="highlighter-rouge">git checkout [commit
ID]</code> because <code class="highlighter-rouge">checkout</code> is for local changes not committed to the
group repository.  Below are the right steps and explanations for
Jennifer to use <code class="highlighter-rouge">git revert</code></p>

  <ol>
    <li>
      <p><code class="highlighter-rouge">git diff HEAD~1 </code>Look at the git history of the project to find the commit ID</p>
    </li>
    <li>
      <p>Copy the ID (the first few characters of the ID, e.g. 0b1d055).</p>
    </li>
    <li>
      <p><code class="highlighter-rouge">git revert [commit ID]</code></p>
    </li>
    <li>
      <p>Type in the new commit message.</p>
    </li>
    <li>
      <p>Save and close</p>
    </li>
  </ol>
</blockquote>

<blockquote class="challenge">
  <h2 id="understanding-workflow-and-history">Understanding Workflow and History</h2>

  <p>What is the output of the last command in</p>

   ```bash
   mkdir doc
   cd doc
   echo "# First steps with GIT " > README.md
   git add README.md
   echo "This tutorial should give you the **Git superpower**" >> README.md
   git commit -m "Comment on README.md about your new superpower"
   git checkout HEAD README.md
   cat README.md #this will print the contents of README.md to the screen
   ```
    
&nbsp;
    
    
1. `This tutorial should give you the **Git superpower**`
2. `# First steps with GIT` 
3. `# First steps with GIT`  
    <br>
   This tutorial should give you the **Git superpower**
4. Error because you have changed README.md without committing the changes
    


<blockquote class="solution">
    <h2 id="solution-1">Solution<span class="fold-unfold glyphicon glyphicon-collapse-up"></span></h2>

    The answer is 2.

The command `git add README.md` places the current version of `README.md` into the staging area. The changes to the file from the second `echo` command are only applied to the working copy, not the version in the staging area.

So, when `git commit -m "Comment on README.md about your new superpower" is executed, the version of `README.md` committed to the repository is the one from the staging area and has only one line.

At this time, the working copy still has the second line (and `git status` will show that the file is modified). However, `git checkout HEAD README.md` replaces the working copy with the most recently committed version of `README.md`.

So, `cat README.md` will output

`# First steps with GIT`
    

  </blockquote>
</blockquote>

<blockquote class="challenge">
  <h2 id="getting-rid-of-staged-changes">Getting Rid of Staged Changes</h2>

  <p><code class="highlighter-rouge">git checkout</code> can be used to restore a previous commit when unstaged changes have
been made, but will it also work for changes that have been staged but not committed?
Make a change to <code class="highlighter-rouge">keyval.py</code>, add that change, and use <code class="highlighter-rouge">git checkout</code> to see if
you can remove your change.</p>
</blockquote>

<blockquote class="challenge">
  <h2 id="explore-and-summarize-histories">Explore and Summarize Histories</h2>

  <p>Exploring history is an important part of Git, often it is a challenge to find
the right commit ID, especially if the commit is from several months ago.</p>

  <p>Imagine your project has more than 50 files.
You would like to find a commit with specific text in <code class="highlighter-rouge">keyval.py</code> is modified.
When you type <code class="highlighter-rouge">git log</code>, a very long list appeared,
How can you narrow down the search?</p>

  <p>Recall that the <code class="highlighter-rouge">git diff</code> command allow us to explore one specific file,
e.g. <code class="highlighter-rouge">git diff keyval.py</code>. We can apply a similar idea here.</p>

  <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>git log keyval.py
</code></pre></div>  </div>

  <p>Unfortunately some of these commit messages are very ambiguous e.g. <code class="highlighter-rouge">update files</code>.
How can you search through these files?</p>

  <p>Both <code class="highlighter-rouge">git diff</code> and <code class="highlighter-rouge">git log</code> are very useful and they summarize a different part of the history 
for you.
Is it possible to combine both? Let’s try the following:</p>

  <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>git log <span class="nt">--patch</span> keyval.py
</code></pre></div>  </div>

  <p>You should get a long list of output, and you should be able to see both commit messages and 
the difference between each commit.</p>

  <p>Question: What does the following command do?</p>

  <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>git log <span class="nt">--patch</span> HEAD~9 <span class="k">*</span>.txt
</code></pre></div>  </div>
</blockquote>

<blockquote class="keypoints">
  <h2>Key Points</h2>
    
* `git diff` displays differences between commits.

* `git checkout` recovers old versions of files.

</blockquote>

<div id="body">
    <center>
    <a href="03 Tracking Changes.ipynb">  <font size="4"> &lt; </font></a>
    <a href="index.ipynb">  <font size="4"> Version Control with Git </font> </a>
    <a href="05 Ignore Things.ipynb">  <font size="4"> &gt; </font></a>
    </center>
</div>