# Objective
Get comfortable with git-objects. Tutorial adapted heavily from [here](https://book.git-scm.com/book/en/v2/Git-Internals-Git-Objects).

At the end of this, you should be able to answer:
1. What is a blob?
2. What is a tree?
3. What is a commit?

You should be able to describe the lower-level procedures, that underpin:
- `git add`
- `git commit -m`

We will also cover `git stash`.

First, let's check what working directory we are in.

In [27]:
%%bash
echo $PWD

/mnt/c/Users/jingapore/Desktop/Repos/git_tutorial/notebooks/mock_repo


In [28]:
%cd mock_repo

[WinError 2] The system cannot find the file specified: 'mock_repo'
C:\Users\jingapore\Desktop\Repos\git_tutorial\notebooks\mock_repo


In [29]:
%%bash
echo $PWD

/mnt/c/Users/jingapore/Desktop/Repos/git_tutorial/notebooks/mock_repo


# Creating git object out of thin air

Following comment pipes 'test content' to the command `git hash-object -w --stdin`. (The `|` in command line is a pipe.)

This creates a Git object, that we can see using the command `find .git/objects -type f` or just by cd-ing to the `.git/objects` directory and taking a look ourselves.

In [30]:
%%bash
echo 'test content' | git hash-object -w --stdin

d670460b4b4aece5915caf5c68d12f560a9fe3e4


In [31]:
%%bash
cd .git/objects
ls -la

total 0
drwxrwxrwx 1 jingapore jingapore 4096 Jun 12 10:11 .
drwxrwxrwx 1 jingapore jingapore 4096 Jun 12 10:10 ..
drwxrwxrwx 1 jingapore jingapore 4096 Jun 12 10:11 d6
drwxrwxrwx 1 jingapore jingapore 4096 Jun 12 10:10 info
drwxrwxrwx 1 jingapore jingapore 4096 Jun 12 10:10 pack


Ah, it looks like there is a sub-diretory called `d6`. This is where objects with hashes starting with 'd6' go to. Let's check out this directory.

In [32]:
%%bash
cd .git/objects/d6
ls -la

total 0
drwxrwxrwx 1 jingapore jingapore 4096 Jun 12 10:11 .
drwxrwxrwx 1 jingapore jingapore 4096 Jun 12 10:11 ..
-rwxrwxrwx 1 jingapore jingapore   29 Jun 12 10:11 70460b4b4aece5915caf5c68d12f560a9fe3e4


Aha! Our file is there. The hash value is truncated by 2 characters. These 2 characters are in the `.git/objects/` sub-directory name, i.e. 'd6'.

Now, we see what the contents of the file are.

In [33]:
%%bash
cd .git/objects/d6
git cat-file -p $(ls -la | awk 'FNR == 4{a="d6"$NF; print a}')

test content


Nothing unexpected. The content is 'test content'.

Let's break down what the commands are doing.

First, we run the command encapsulated in `$(ls -la | awk ...)`. What this does, is to pipe the output of `ls` (which lists files in directory) to `awk`.

Now, what in the world is `awk`?
**awk**<br>
Awk is a useful tool to parse text.
- `FNR` stands for record number (i.e. the line) in the current file.
- `NF` is the number of fields, where fields represent the columns. So, `$NF` prints the 9th column of the `ls -la` output. If we want the 8th column, we go with `$8`
- So, we are taking the 4th record (i.e. the 4th line), and the last field of that line.

Then, we take the output of the command `$(ls -la | awk ...)` as an argument to `git-cat-file`.

**git cat-file**<br>
So we see the output of the hash. We feed this command into `git cat-file`. What `git cat-file` does, is to take the hash of a git object, and print its contents. `-p` is a flag specifying pretty-print, and allows us to cat-file without setting the object type.

An alternative command without the flag `-p` is:
`git cat-file blob d670460b4b4aece5915caf5c68d12f560a9fe3e4`

This is the simpler way, involving just copy-and-pasting. But it's useful to know your way around the command line with `awk`.

In [34]:
%%bash
git cat-file -p d670460b4b4aece5915caf5c68d12f560a9fe3e4

test content


Taking a step back, we see that our working directory is empty. But we have a blob in `.git/objects/`. Why is this the case?

In [35]:
%%bash
echo $PWD
ls -la

/mnt/c/Users/jingapore/Desktop/Repos/git_tutorial/notebooks/mock_repo
total 0
drwxrwxrwx 1 jingapore jingapore 4096 Jun 12 10:10 .
drwxrwxrwx 1 jingapore jingapore 4096 Jun 12 10:10 ..
drwxrwxrwx 1 jingapore jingapore 4096 Jun 12 10:10 .git


What we have done is indeed unusual. We didn't create a file in the working directory, but we created a blob (a type of Git object), by piping a stream of text into the command `git hash-object`.

What happens in the more common `git add`, is that prior to including the file in the index (i.e. the file is still in the working directory), the file is not a in the directory `.git/objects/`. But the moment we add the file to the index, the file content exists in two places: the working directory, and also in `.git/objects/` as a hashed object whose contents are revealed with `git cat-file`.

# Adding git object to index/staging

OK, so that seemed strange. We created a blob in the Git data store, but not in the working directory.

Usually, it is the other way around. We create a file in the working directory. Then, we run `git add` which (1) adds the file to the index, and (2) creates a Git object in the Git data store.

First, let's get this blob into a file, in the working directory. In the following command, `cat test.txt` outputs content of 'test.txt', and we get the expected content of 'test content'.

In [36]:
%%bash
git cat-file -p d670460b4b4aece5915caf5c68d12f560a9fe3e4 > test.txt
ls -la
cat test.txt

total 0
drwxrwxrwx 1 jingapore jingapore 4096 Jun 12 10:11 .
drwxrwxrwx 1 jingapore jingapore 4096 Jun 12 10:10 ..
drwxrwxrwx 1 jingapore jingapore 4096 Jun 12 10:10 .git
-rwxrwxrwx 1 jingapore jingapore   13 Jun 12 10:11 test.txt
test content


So, the file contents now exist in two places. First, the Git object store. Second, the working directory.

But since we are running lower-level commands here, this file is not registered in the Git index. Let's confirm that
1. The Git index is indeed empty, and
2. Add this blob into the Git index.

In [37]:
%%bash
git ls-files

Note: Above command, i.e. `git ls-files` should not have any output. Because you haven't added your file to the Git index.

In [38]:
%%bash
git update-index --add --cacheinfo 100644 d670460b4b4aece5915caf5c68d12f560a9fe3e4 test.txt

Two remarks on the above command `git update-index --add --cacheinfo 100644 d670460b4b4aece5915caf5c68d12f560a9fe3e4 test.txt`.
1. When you see 'cache' as an option in git command, you're likely working on the Git index. E.g. `git rm --cached` removes a file not from your working directory, but the Git index.
2. The digits '100644' represents file mode. The significant digits are the last 3, in this case '644'. Each digit represents the executable-write-read permissions for the (a) owner, (b) group, (c) public. Each digit represents a binary for 3 permissions (a) executable-write-read.

OK, back to business. After we've run `git update-index`, let's see if it is in the index.

In [39]:
%%bash
git ls-files

test.txt


For completeness, we take a look at what is in the file `.git/index`. It is gibberish, because it is a binary file containing a sorted list of path names, each with permissions and hash of git objects. `git ls-files` is the human readable way of seeing what is in this file. Stackoverflow answer [here](https://stackoverflow.com/questions/4084921/what-does-the-git-index-contain-exactly).

In [40]:
%%bash
cat .git/index

DIRC                                �¤            ÖpFKJìå‘\¯\hÑ/V
Ÿãä test.txt  0¿†’^kÖ¿6þÓ]™÷1n0�


If you see 'test.txt', you've successfully:
1. Created a file and written it into the Git object store as a blob with a hash.
2. Written that file into the Git index.

# Exploring tree objects

Thus far, we have worked with blobs, a type of Git object. As you may recall from the slides, the tree is another object.

Trees have children, which are other trees or blobs. If it is a blob, it terminates there.

Commit objects (which we haven't created) point to trees.

In [41]:
%%bash
git write-tree

80865964295ae2f11d27383e5f9c0b58a8ef21da


The above command writes the Git index (which contains your blob or 'test.txt') out to a tree object.

We go back into `.git/objects/ and see that there is a new sub-directory with the first 2 characters of the tree object's hash.

In [42]:
%%bash
cd .git/objects
ls

80
d6
info
pack


We compare the contents of the (a) blob and (b) tree.

In the output, we see that:
- the blob just ouputs the content of 'test.txt', whereas
- the tree points to blobs.

In [43]:
%%bash
git cat-file -p d670460b4b4aece5915caf5c68d12f560a9fe3e4
git cat-file -p 80865964295ae2f11d27383e5f9c0b58a8ef21da

test content
100644 blob d670460b4b4aece5915caf5c68d12f560a9fe3e4	test.txt


In [44]:
%%bash
git ls-files

test.txt


Let's expand this tree. It's sad if it has just 1 object.

So, we:
1. Create another object called 'new.txt' with content 'new file', and add it to the index, then
2. Run `git write-tree`.

In [45]:
%%bash
echo 'new file' > new.txt
git update-index --add new.txt
git write-tree

21ffec6dd9d7f64d4c693ea205a65905ba1bb41b


Let's examine what's in this new tree object.

In [46]:
%%bash
git cat-file -p 21ffec6dd9d7f64d4c693ea205a65905ba1bb41b

100644 blob fa49b077972391ad58037050f2a75f74e3671e92	new.txt
100644 blob d670460b4b4aece5915caf5c68d12f560a9fe3e4	test.txt


Hooray, the tree is less sad.

# Finally, commit objects

Recall each branch (which are ref objects) point to a commit.

We have a tree object and a blob in our Git index. Now, we create a commit object.

*Note 1: If you get an error message about user.name not being set, just run this command in a cell with the bash magic `git config user.name 'foo_user'`. This includes a user.name just for this repo.*

*Note 2: If you run the following cell multiple times, you will actually create multiple commit objects, with the same messages. You can delete these commit objects by going to `.git/objects/` and removing the sub-directories.*

In [48]:
%%bash
git config user.name foo_user

In [49]:
%%bash
echo 'first commit' | git commit-tree 21ffec6dd9d7f64d4c693ea205a65905ba1bb41b

e79ecc9a42b5d49e7ce9f7fcf04788f148f8e429


So now, we have a commit object pointing to a blob. Recall that a commit object is *also* able to point to another commit object as its child, to create a chain of commits that go into the Git log.

To achieve this, we go with `git commit-tree <hash of tree> -p <hash of child>`.

In [None]:
%%bash
cd .git/objects
ls