# Why is Git so hard to understand?

I've been writing a series of posts on Git, with the aim of using them to train data scientists on some good practices in developing solutions that can be productionized - and increasing developer/data scientist velocity.


## Why Git?

Git is the standard DVCS tool for groups of software professionals collaborating. To build robust projects that can be iterated and ideated on, git is the de-facto standard. The git project was started by [Linus Torvalds](https://github.com/torvalds) which helped its adoption.

## What's wrong with (G)it?

With a little googling, I've found dozens of posts accepting that even seasoned developers can be puzzled by it's intricacies, it is widely recognized as difficult to use - [[1]](https://spderosso.github.io/onward13.pdf). Most users find some working patterns and rely on this to develop software - googling when things get out of hand, usually relying on `git reset --hard` and `git reflog` to get out.

## Object Model

Git doesn't have a clean API with an object model. Git evolved somewhat organically as needs arose. Although some efforts were made to abstract out what was happening, it somehow got all mixed up.

## Plumbing, Porcelain and Poop

The original idea had been there would be low-level commands (for developers of git) were not for users, and instead they would be given porcelain commands. Instead, as users became familiar and had various use cases, they dropped down into the low-level plumbing and the porcelain commands just didn't have enough expressive power. So now we have a poop-like situation where the ways to interact with git are so numerous and involved, that it's hard to keep a handle on.


## Data Structures

Git makes use of, and inadvertently exposes several data structures - which are never thoroughly explained and often not needed for new users - adding to the mystique surrounding git.

### The Git Graph Model

The git graph model is a great way to understand how branches, commits and navigation work. However it relates to other concepts such as working tree, branch name pointers, HEAD and branch tips - all of which need time to be understood.

## Blobs

The Blobs are object stores that contain the contents of files. They are named using the contents of the file and a hashing algorithm. 

## SHAs

The hashing algorithm is applied to git commits and blobs contents. It's not widely used by users.

## Naming of Commands

The naming of git commands is very different from other VCS systems and so can confuse users from other VCS systems.

## Distributed Version Control

The distributed in git's DVCS is the major innovation vs other VCS systems.


## Lack of Consistency In Command Names

The git commands have irregular names and flags, so that the same semantic operations have different names. This parable from [quora](https://www.quora.com/Why-is-Git-so-hard-to-learn) is quite telling:

> A novice was learning at the feet of Master Git. At the end of the lesson he looked through his notes and said, “Master, I have a few questions. May I ask them?”

> Master Git nodded.

> “How can I view a list of all tags?” 

> “git tag”, replied Master Git.

>  “How can I view a list of all remotes?”

>  “git remote -v”, replied Master Git.

> “How can I view a list of all branches?”

> “git branch -a”, replied Master Git.

> “And how can I view the current branch?”

> “git rev-parse --abbrev-ref HEAD”, replied Master Git.

> “How can I delete a remote?”

> “git remote rm”, replied Master Git.

> “And how can I delete a branch?”

> “git branch -d”, replied Master Git.

> The novice thought for a few moments, then asked: “Surely some of these could be made more consistent, so as to be easier to remember in the heat of coding?”

> Master Git snapped his fingers. A hobgoblin entered the room and ate the novice alive. In the afterlife, the novice was enlightened.

## Conclusion

Git is a genius piece of software. It is probably unnecessarily complicated and unwieldy, but given it's wide user base - we're stuck with it for some time to come. There are some efforts to produce a cleaner porcelain shell, such as [gitless](https://github.com/sdg-mit/gitless) - they just haven't got enough traction for most organizations.

Seems like it's ripe for disruption.