Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial diff wrapping #98

Closed
wants to merge 2 commits into from
Closed

Initial diff wrapping #98

wants to merge 2 commits into from

Conversation

carlosmn
Copy link
Member

@carlosmn carlosmn commented Aug 7, 2012

So, let's talk API.

It's based on what pygit2 is doing, but with more data.

# First get the trees you want to compare
repo = Rugged::Repository.new '.'
tree = repo.lookup repo.head.resolve.target
tree2 = repo.lookup(repo.head.resolve.target).parents[0].tree
# Get the diff object, remember it's old <-> new
diff = tree2.diff tree
# Show a plain-text version of the patch
puts diff.patch
# Or list the files
diff.each_file { |f| puts f[:old_file][:path] }
# Or work per hunk
diff.each_hunk { |h| puts h}
# Or show the data (which you'd want for a colour UI)
diff.each_data {} # Not implemented yet, maybe this one should be the plain .each

However, as I've no idea what I'm doing with the ruby extension API, I seem to be getting random trees compared instead of the ones I ask it to.

I'd also like to add an API so you can do commit.diff and it diffs the commit's tree with its parent (git show basically).

@travisbot
Copy link

This pull request fails (merged b05409aa into 0ccf6eb).

@travisbot
Copy link

This pull request fails (merged 23278ed into 0ccf6eb).

@brianmario
Copy link
Contributor

fwiw I made this the other day before realizing this pull even existed, thoughts?

diff = repo.diff(commit1, commit2)

# Maybe this would be aliased as #each as well?
# That way we could include Enumerable and this would
# be the default iteration target?
diff.each_file do |delta|
  # Where delta contains something like this:
  #
  # :status will be one of:
  # [:unmodified, :added, :deleted, :modified, :renamed, :copied,
  #  :ignored, :untracked]
  #
  # {
  #   :status => :modified,
  #   :similarity => 95, # 0-100 for rename/copy detection
  #   :binary => false,
  #   :progress => 25, # overall iteration progress % for the entire diff?
  #   :old_file => {
  #     :oid => 'deadbeef', # blob oid?
  #     :path => 'README.md',
  #     :size => 1234,
  #     :mode => 0666
  #   },
  #   :new_file => {
  #     :oid => 'deadbeef',
  #     :path => 'README.md',
  #     :size => 5678,
  #     :mode => 0666
  #   }
  # }
end

diff.each_hunk do |delta, hunk|
  # Where hunk contains something like this:
  #
  # {
  #   :header => '...',
  #   :range => {
  #     :old_start => 0, # starting line number
  #     :old_lines => 15, # number of lines changed?
  #     :new_start => 100,
  #     :new_lines => 15
  #   }
  # }
end

diff.each_line do |delta, hunk, line_info|
  # Where line_info contains something like this:
  #
  # :line_origin will be one of:
  # [:context, :addition, :deletion, :add_eof_newline, :del_eof_newline]
  #
  # or if it's being formatted via print patch or print compact:
  # [:file_header, :hunk_header, :binary]
  #
  # {
  #   :line_origin => :addition,
  #   :content => "asdkfjldkad"
  # }
end

@brianmario
Copy link
Contributor

This was mainly focused on commit diffs because I thought the tree diffs would be treated a little different (based on my understanding of the tree-diff API in libgit2 at least)

@brianmario
Copy link
Contributor

Oh nevermind about the differences with tree diffs, not sure what I was smoking... ;)

@carlosmn
Copy link
Member Author

A diff between two commits is simply a diff between their trees. The other tree-diff API in libgit2 is of the diff-tree variety but nobody ever really used it and it's superseded by the "normal" diff machinery's per-file callback. I put the diff on the tree because that's what we're comparing anyway (and what we give to the library). C# also has diff on the repo object, IIRC. If that makes more sense, we can put it there.

Giving the block several objects is better than what this does, so I'll change it so it does that. IMO it feels more natural to have #each be #each_line, but I don't feel that strongly about it.

@arthurschreiber
Copy link
Member

What I'd love to see would be something along the lines of (very rough):

diff = repo.diff(commit1, commit2)
diff.each do |file|
  # file.status will be one of:
  #    [:unmodified, :added, :deleted, :modified, :renamed, :copied, :ignored, :untracked]
  # file.similarity # => 95, # 0-100 for rename/copy detection
  # file.binary? # => false,
  # file.progress # => 25, # overall iteration progress % for the entire diff?
  # file.old # => {
  #   :oid => 'deadbeef', # blob oid?
  #   :path => 'README.md',
  #   :size => 1234,
  #   :mode => 0666
  # }
  # file.new # => {
  #   :oid => 'deadbeef',
  #   :path => 'README.md',
  #   :size => 5678,
  #   :mode => 0666
  # }
  # 

  file.each do |hunk|
    # Where hunk contains something like this:
    #
    # hunk.header # => '...'
    # hunk.range # => {
    #   :old_start => 0, # starting line number
    #   :old_lines => 15, # number of lines changed?
    #   :new_start => 100,
    #   :new_lines => 15
    # }
    # 

    hunk.each do |line|
      # Where line contains something like this:
      #
      # line.line_origin will be one of:
      # [:context, :addition, :deletion, :add_eof_newline, :del_eof_newline]
      #
      # or if it's being formatted via print patch or print compact:
      # [:file_header, :hunk_header, :binary]
      #
      # line.line_origin # => :addition,
      # line.content # => "asdkfjldkad"
    end
  end
end

This is just right off the top of my head. And I don't think that this would even be possible with the current libgit2 API.

@carlosmn
Copy link
Member Author

On Sun, 2012-08-12 at 03:01 -0700, Arthur Schreiber wrote:

What I'd love to see would be something along the lines of (very
rough):

[snip #each loops in loops]

This is just right off the top of my head. And I don't think that this
would even be possible with the current libgit2 API.

We can do something similar. We can have Diff#each_file and in that
loop do a Delta#each_hunk or #each_line using git_diff_blobs.
Getting a bunch of lines out of a hunk doesn't seem to be possible right
now in the library, though you may be able to simulate it with some
trickery at the the rugged level, checking when the hunk changes.

cmn

@arthurschreiber
Copy link
Member

Yeah, I'd like that more. I simple don't like the prospect of having to
check if the file/hunk has changed during each iteration of the #each loop.

@brianmario
Copy link
Contributor

A diff between two commits is simply a diff between their trees. The other tree-diff API in libgit2 is of the diff-tree variety but nobody ever really used it and it's superseded by the "normal" diff machinery's per-file callback.

Yeah that's what I noticed after I said what I did, thanks :)

@arthurschreiber yeah agreed. That API in Ruby would definitely be ideal IMO, but not possible given the current libgit2 diff API without ugly hacks in Ruby-C-land.

Maybe if we had something like this in addition to the existing API?

git_diff_file_foreach(git_diff_delta *delta, git_diff_file_fn *file_cb);
git_diff_hunk_foreach(git_diff_file *file, git_diff_hunk_fn *hunk_cb);
git_diff_data_foreach(git_diff_hunk *hunk, git_diff_data_fn *line_cb);

Only thing is it would then mean there were two ways to iterate over the parts of the diff - possibly at the same time?. For example, what should we do if the caller were to specify callbacks with these as well as with git_diff_foreach? Maybe if the callback(s) pointed to the same address we could do something... Anyway, just an idea. It would make that Ruby API possible I think.

@carlosmn
Copy link
Member Author

As for extending the API and what's possible or not, @arrbee is the one who can say. It does make sense if the language only uses one foreach at a time (in C, you could signal file/hunk changes in some shared state as the three should get called in the right order).

@travisbot
Copy link

This pull request fails (merged c6ae857 into 0ccf6eb).

@arrbee
Copy link
Member

arrbee commented Aug 13, 2012

I'm totally open to extending the libgit2 diff API to separate the iteration over files from the actual file content diffing. If we added something like:

GIT_EXTERN(int) git_diff_delta_diff(
    git_diff_delta *delta,
    void *cb_data,
    git_diff_hunk_fn hunk_cb,
    git_diff_data_fn line_cb);

would that be sufficient? You could call the existing git_diff_foreach() will NULL hunk and line callbacks to iterate over the diff and then call the above function on each delta.

Implementing this would require a couple extra fields in the delta structure, I think; notable a git_diff_list pointer to store the context and maybe an extra flags field to preserve some state that is currently on the git_diff_foreach() stack at the time the delta computation is run. But that's not too hard.

@carlosmn
Copy link
Member Author

That function is easily simulated with the one for diffing blobs, isn't it? It'd be nice to have, but nothing that a few lines of code don't give us anyway. The main addition would be something like git_diff_hunk_foreach(git_diff_range *range, void *cb_data, git_diff_data_fn line_cb) as there doesn't seem to be a way of doing that with current code.

@arrbee
Copy link
Member

arrbee commented Aug 16, 2012

So, just to make a note of it, I'm looking at adding a git_diff_iterator that you can create from a git_diff_list which will support scanning through the files in a diff, then the hunks in a file and the lines in a hunk. I think this will be a lot easier to use in the various language bindings provided I can make it work fairly efficiently.

This is a busy week, but I've been sketching out the APIs and data structures. I'll dive in next week and try to wrap it up.

@carlosmn
Copy link
Member Author

Yeah, being able to get a file iter, then a hunk iter out of that file and then a line iter out of the hunk wold be awesome.

@ghost
Copy link

ghost commented Apr 7, 2013

Similar to #103 ...would love to see diffing functionality added.

@arthurschreiber
Copy link
Member

@vmg / @carlosmn can you close this?

@vmg
Copy link
Member

vmg commented May 5, 2013

Clooosing.

@vmg vmg closed this May 5, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants