Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Git does not store deltas. #6

Closed
otac0n opened this issue Feb 16, 2013 · 7 comments
Closed

Git does not store deltas. #6

otac0n opened this issue Feb 16, 2013 · 7 comments

Comments

@otac0n
Copy link

otac0n commented Feb 16, 2013

Git stores snapshots, not deltas. If a file has changed twice, there will be two copies of it in the repository.

However, Git does re-use files that haven't changed between commits, keeping repositories small.

With this in mind, your intro is lying to people.

@cwgreene
Copy link

The author has proposed "a commit specifies the entire state of a repository, but is usually stored on disk as a set of changes" as an alternative. Does that sound reasonable?

@pcottle
Copy link
Owner

pcottle commented Feb 16, 2013

I'm assuming your the same commentator from HN -- thanks a ton for catching this. I'll paste my reply below:

"Thanks a ton for catching this. I guess there is a distinction to be made -- the compression might use delta's, but a commit specifies the entire state of the repository.
It's a tricky line to walk though, because commands like "git show" and "git patch" clearly show the delta-like nature of a single commit. I also don't want newcomers to think that commits are heavy and should be used sparingly.
I'm totally down to discuss this on a github issue with you, we could go over the wording. Maybe something like "a commit specifies the entire state of a repository, but is usually stored on disk as a set of changes"?"

Does that wording work? Here's a draft:


A commit in git specifies the state of the repository. This state encodes what each file looks like, so you can think of it as a snapshot of everything you're working on.

Git wants to keep commits as lightweight as possible though, so it doesn't just copy the entire directory every time you commit. It actually stores each commit as a set of changes, or a "delta", from one version of the repository to the next.

In order to clone a repository, you have to unpack or "resolve" all these deltas. That's why you might see the command line output:

resolving deltas

when cloning a repo.

It's a tricky concept, but for now you can think of commits as snapshots of the directory that are stored as deltas. Combining all the deltas together inside an empty folder gives you the full repository.

@cwgreene
Copy link

It's not particularly important, but I'm the one from HN. :)

Your expanded draft looks fine (otac0n, if you can see any issues, let us know ), and is readable. I'll let you worry over how much you think needs to be explained up front :).

EDIT: I think the important thing is to get across the idea (which you do in the revision) that a commit is a state of the repo, and git handles the storage of said states (and switching between them) efficiently by mostly just keeping track of changes. However, bear in mind that I'm unsure how much of this has to be done explicitly using 'git gc'.

@pcottle
Copy link
Owner

pcottle commented Feb 16, 2013

It's also a lot to digest on (literally) the first intro screen in the entire app. I wish we could do stepping stones but then you have to oversimplify either way

either way I changed the dialog for now, lets see how that goes

@pcottle pcottle closed this as completed Feb 16, 2013
@pcottle
Copy link
Owner

pcottle commented Feb 16, 2013

Fixed in 168852b

@otac0n
Copy link
Author

otac0n commented Feb 17, 2013

The thing us that it is USUALLY stored as a snapshot, not as a diff. (see
git-hash-object)

When .pack files are involved, it gets more complicated. However, in the
common case, both versions of the files are stored.
On Feb 16, 2013 3:04 PM, "Chris Greene" notifications@github.com wrote:

The author has proposed "a commit specifies the entire state of a
repository, but is usually stored on disk as a set of changes" as
alternative. Does that sound reasonable?


Reply to this email directly or view it on GitHubhttps://github.com//issues/6#issuecomment-13676807.

@Cogito
Copy link

Cogito commented Feb 17, 2013

I followed up with issue #13 which is I hope a clearer, and correct, treatment of the commit data structure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants