Rev parse #648

Closed
wants to merge 27 commits into from

5 participants

@ben
libgit2 member

Support for the rev-parse revision-specification language, which takes a string and provides a git_object. So far, these are working:

  • a65fedf39aefe402d3bb6e24df4d4f5fe4547750 -> object with that exact hash
  • a65fedf39a -> object with that hash prefix (uses git_object_lookup_prefix)
  • v0.16.0-234-gd98fa70 (output from git describe) -> object with the hash prefix that follows "-g"
  • refs/heads/master or master -> object pointed to by a fully- or partially-named ref
  • tag^{} -> dereference until we find something that isn't a tag
  • tag^{tree} -> dereference until we find an object of the given type
  • tag^ or master^3 -> dereference until we find a commit, then get the Nth parent
  • tag~3 -> commit that is N steps back from the ref, always following the 1st parent

Things that are still missing:

  • HEAD@{2} -> object that was pointed to by a ref N steps back in the reflog
  • branch@{yesterday}, master@{5 minutes ago} -> object that was pointed to by a ref at the specified time
  • @{-3} -> commit pointed to by the Nth branch checked out before the current one
  • tag^{/foo} -> youngest commit whose message matches the regex after "/", which is reachable from the given starting point
  • :/foo -> youngest commit whose message matches the regex after "/", which is reachable from any ref
  • tag:README.md -> dereference to a tree, then return the blob or tree at the given path
  • :2:Makefile -> blob object in the index at the given merge stage

The entry point is git_revparse_single, and it's expected that there will eventually be a git_revparse_range which supports the rev-parse syntax for finding ranges of commits.

Ben Straub added some commits Apr 25, 2012
Ben Straub First stab at implementation of rev-parse.
This version supports refspecs of these kinds:
- Full & partial SHAs
- Output from "git describe"
- "/refs/heads/master" (full ref names)
- "master" (partial ref names)
- "FETCH_HEAD" (named heads)
32c0bf7
Ben Straub Simpler states and initial structure.
New tests for "foo^2" syntax, but they don't pass
yet. Support for chaining these, i.e.
"foo^2~3^{u}~1' is starting to shape up.
c92dd42
Ben Straub Implemented partial caret syntax for rev-parse.
Supported forms:
- "^n"
- "^0"
- "^"

Still missing: all of the "^{…}" variants.
629072c
Ben Straub Implemented rev-parse's "^{}" syntax. 32775f8
Ben Straub Implemented rev-parse "^{type}" syntax. d98fa70
@travisbot

This pull request fails (merged d98fa70 into 821f6bc).

@arrbee arrbee and 1 other commented on an outdated diff Apr 27, 2012
src/revparse.c
+}
+
+static int dereference_to_type(git_object **out, git_object *obj, git_otype target_type)
+{
+ git_object *obj1 = obj, *obj2 = obj;
+
+ while (1) {
+ git_otype this_type = git_object_type(obj1);
+
+ if (this_type == target_type) {
+ *out = obj1;
+ return 0;
+ }
+
+ /* Dereference once, if possible. */
+ obj2 = dereference_object(obj1);
@arrbee
libgit2 member
arrbee added a note Apr 27, 2012

I think you need to check for NULL and return an error here.

@ben
libgit2 member
ben added a note May 10, 2012

Fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arrbee arrbee commented on the diff Apr 27, 2012
src/revparse.c
+ } else {
+ git_buf_putc(&specbuffer, *spec_cur);
+ }
+ spec_cur++;
+
+ if (current_state != next_state) {
+ /* Leaving INIT state, find the object specified, in case that state needs it */
+ retcode = revparse_lookup_object(&next_obj, repo, git_buf_cstr(&specbuffer));
+ if (retcode < 0) {
+ goto cleanup;
+ }
+ }
+ break;
+
+
+ case REVPARSE_STATE_CARET:
@arrbee
libgit2 member
arrbee added a note Apr 27, 2012

The CARET state is interesting. When I was originally thinking about this problem, I was imagining more states. For example: a ^ could put you in a CARET state, but then on the next character you would exit that into one of three states: CARET_DIGITS which you then stay in until a non-digit, CARET_BRACE which you then stay in until a matching close brace and then call a handler for the brace contents, or any other character which causes the caret to execute (i.e. move to parent) and go to the new state. Of course, you could break the CARET_BRACE state even further into CARET_BRACE_SLASH for searches, CARET_BRACE_EMPTY for ^{} and CARET_BRACE_TYPE.

There is nothing wrong with keeping the number of states small, I guess. Essentially the CARET_BRACE state I was imagining is just covered by the broader handle_caret_syntax() function. This design does mean that you don't get to take advantage of the state machine for most of the syntax error detection and have to do more in the handler, but apart from that, I don't think it adds substantially to the code complexity.

@ben
libgit2 member
ben added a note Apr 27, 2012

I was thinking of this too, but there's also the issue of operator chaining: master^2~3^{tree}. I'm using the state machine to detect operators and dispatch the parsing operations, and handle_caret_syntax to parse the argument to the caret. It seems like ^3 syntax use a different state than ^{tree} would simplify the argument handlers, but add transitions to the state diagram, which seems like a wash complexity-wise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@travisbot

This pull request fails (merged 048debe into 821f6bc).

@arrbee arrbee commented on the diff Apr 27, 2012
src/revparse.c
+ spec_cur++;
+
+ if (current_state != next_state) {
+ /* Leaving INIT state, find the object specified, in case that state needs it */
+ retcode = revparse_lookup_object(&next_obj, repo, git_buf_cstr(&specbuffer));
+ if (retcode < 0) {
+ goto cleanup;
+ }
+ }
+ break;
+
+
+ case REVPARSE_STATE_CARET:
+ /* Gather characters until NULL, '~', or '^' */
+ if (!*spec_cur) {
+ retcode = handle_caret_syntax(out,
@arrbee
libgit2 member
arrbee added a note Apr 27, 2012

Instead of exiting with a goto, I would be tempted to add a DONE state and have terminal cases such as this just go into that state and then let the loop exit naturally.

@ben
libgit2 member
ben added a note Apr 27, 2012

Yeah, that seems cleaner. goto made me feel a little dirty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Ben Straub added some commits Apr 27, 2012
Ben Straub Removed goto from state machine loop. fafb0c8
Ben Straub Implementing rev-parse's "ref~2" syntax.
Also extended the test suite to include chaining
operators, e.g. "master^2~3^4".
ec05459
@travisbot

This pull request fails (merged fafb0c8 into 821f6bc).

@travisbot

This pull request fails (merged ec05459 into 821f6bc).

@nulltoken
libgit2 member

Should fix #235

@vmg vmg and 1 other commented on an outdated diff Apr 30, 2012
src/revparse.c
+ return GIT_OBJ_BAD;
+}
+
+static int handle_caret_syntax(git_object **out, git_object *obj, const char *movement)
+{
+ git_commit *commit;
+ int n;
+
+ if (*movement == '{') {
+ if (movement[strlen(movement)-1] != '}') {
+ set_invalid_syntax_err(movement);
+ return GIT_ERROR;
+ }
+
+ /* {} -> Dereference until we reach an object that isn't a tag. */
+ if (strlen(movement) == 2) {
@vmg
libgit2 member
vmg added a note Apr 30, 2012

Stop dupping the strlen, just stick it in a variable.

@ben
libgit2 member
ben added a note May 10, 2012

Fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@vmg vmg and 1 other commented on an outdated diff Apr 30, 2012
src/revparse.c
+ git_buf_clear(&stepbuffer);
+ if (retcode < 0) {
+ next_state = REVPARSE_STATE_DONE;
+ }
+ } else if (*spec_cur == '^') {
+ retcode = handle_linear_syntax(&next_obj, cur_obj, git_buf_cstr(&stepbuffer));
+ git_buf_clear(&stepbuffer);
+ next_state = !retcode ? REVPARSE_STATE_CARET : REVPARSE_STATE_DONE;
+ } else {
+ git_buf_putc(&stepbuffer, *spec_cur);
+ }
+ spec_cur++;
+ break;
+
+ case REVPARSE_STATE_DONE:
+ keep_looping = 0;
@vmg
libgit2 member
vmg added a note Apr 30, 2012

A done state sounds superfluous, specially if it's only going to set keep_looping. You can just skip it.

while (!finished) {
  ...
  if (something) {
    finished = 1;
  }
  ...
}
@ben
libgit2 member
ben added a note May 10, 2012

Looks like we hammered out a good way to exit the state machine loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Ben Straub Implementing rev-parse's ref@{n} and @{-n} syntaxes.
Added some reflags to the test repo to support
unit tests.
e1849d7
@travisbot

This pull request fails (merged e1849d7 into 821f6bc).

@arrbee
libgit2 member

Regarding the "DONE" state, I was thinking that the code would be structured like:

    while (current_state != STATE_DONE) {
        char next = *spec_cur++;

        switch (current_state) {
        case A:
            ...
            next_state = ?;
            break;
        case B:
            ...
            next_state = ?;
            break;
        }

        current_state = next_state;
    }

To me, at least, this feels like a natural structure for a state machine. I agree with @tanoku that the separate loop boolean does not improve the structure.

Ben Straub added some commits May 1, 2012
Ben Straub Rev-parse chaining: adding the longest chain
in the test repo.
697eb51
Ben Straub Incorporating feedback from @tanoku.
Removed repeated strlen's, and unnecessary loop-termination variable.
2b7cf74
@travisbot

This pull request fails (merged 2b7cf74 into 821f6bc).

@travisbot

This pull request fails (merged 20849e2 into 821f6bc).

@travisbot

This pull request fails (merged 7db36b3 into 821f6bc).

Ben Straub Rev-parse: "ref@{upstream}" syntax.
Added tracking configuration to the test repo's
config to support unit tests.
5d3cfb9
@travisbot

This pull request fails (merged 5d3cfb9 into 821f6bc).

Ben Straub Fixing legacy unit tests.
New functionality introduced some new refs and
tags in the test repo.
43ad496
@travisbot

This pull request fails (merged 43ad496 into 821f6bc).

@travisbot

This pull request fails (merged fa2ab26 into 821f6bc).

@travisbot

This pull request fails (merged adff19d into 821f6bc).

@arrbee arrbee and 1 other commented on an outdated diff May 8, 2012
src/revparse.c
+ git_reference *ref;
+ git_object *obj = NULL;
+
+ if (!git_reference_lookup(&ref, repo, spec)) {
+ git_reference *resolved_ref;
+ if (!git_reference_resolve(&resolved_ref, ref)) {
+ if (!git_object_lookup(&obj, repo, git_reference_oid(resolved_ref), GIT_OBJ_ANY)) {
+ *out = obj;
+ }
+ git_reference_free(resolved_ref);
+ }
+ git_reference_free(ref);
+ }
+ if (obj) {
+ return 0;
+ }
@arrbee
libgit2 member
arrbee added a note May 8, 2012

revparse_lookup_fully_qualifed_ref can be rewritten using the new git_reference_name_to_oid function as:

{
    git_oid resolved;

    if (git_reference_name_to_oid(&resolved, repo, spec) < 0)
        return -1;

    return git_object_lookup(out, repo, &resolved, GIT_OBJ_ANY);
}
@ben
libgit2 member
ben added a note May 10, 2012

Nice. I didn't know about that call. Thanks!

@arrbee
libgit2 member
arrbee added a note May 10, 2012

It's quite new, possibly added after you forked.

@ben
libgit2 member
ben added a note May 10, 2012

It's actually in there, I just wasn't aware of it. Fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arrbee arrbee commented on the diff May 8, 2012
src/revparse.c
+ git_object *obj = NULL;
+ git_oid oid;
+ git_buf refnamebuf = GIT_BUF_INIT;
+ static const char* formatters[] = {
+ "refs/%s",
+ "refs/tags/%s",
+ "refs/heads/%s",
+ "refs/remotes/%s",
+ "refs/remotes/%s/HEAD",
+ NULL
+ };
+ unsigned int i;
+ const char *substr;
+
+ /* "git describe" output; snip everything before/including "-g" */
+ substr = strstr(spec, "-g");
@arrbee
libgit2 member
arrbee added a note May 8, 2012

This is a problem for branches that contain the string "-g". I tried creating a branch named "this-is-not-good" and git_revparse_single could not resolve it (because it was trying to resolve "ood").

I think you either need to look much more carefully to see if the spec looks like git describe output, or probably even better, you need to fall back to the original spec if treating it as git describe output fails. Actually, I think you should do both - require at least slightly tighter matching before trying as describe output (perhaps looking back from the matched "-g" to make sure it is preceded by 1 or more digits and another dash, but that may not be the best choice) - and then if it failed, try again with the original spec. You could probably code it like:

    substr = strstr(spec, "-g");
    if (substr && further_checks_on_spec(spec, substr)) {
        if (revparse_lookup_object(out, repo, substr + 2) == 0)
            return 0;
    }

Using this recursive call (just one, so we don't really have to worry about too much nesting) makes it easy to continue with the normal path if the describe fails to match.

What do you think?

@ben
libgit2 member
ben added a note May 10, 2012

Interesting. Would it be acceptable to use a regex in further_checks_on_spec? I know we like to keep our CPU time down, but this api isn't called all the time, and this would only happen if there's a "-g" in the spec.

@vmg
libgit2 member
vmg added a note May 10, 2012

It looks like a text-book example of using a regex. If we pre-compile it just once, the performance hit will be minimal.

@vmg
libgit2 member
vmg added a note May 10, 2012

Here's a good starting point

/([\.\w\/\-]+)-(\d+)-g([\da-fA-f]{3,})/
@ben
libgit2 member
ben added a note May 10, 2012

Fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arrbee arrbee and 1 other commented on an outdated diff May 9, 2012
src/revparse.c
+ return GIT_ERROR;
+ }
+
+ /* "@{-N}" form means walk back N checkouts. That means the HEAD log. */
+ if (refspeclen == 0 && !git__prefixcmp(reflogspec, "@{-")) {
+ if (git__strtol32(&n, reflogspec+3, NULL, 0) < 0 ||
+ n < 1) {
+ giterr_set(GITERR_INVALID, "Invalid reflogspec %s", reflogspec);
+ return GIT_ERROR;
+ }
+ git_reference_lookup(&ref, repo, "HEAD");
+ git_reflog_read(&reflog, ref);
+ git_reference_free(ref);
+
+ refloglen = git_reflog_entrycount(reflog);
+ for (i=0; i < refloglen; i++) {
@arrbee
libgit2 member
arrbee added a note May 9, 2012

I think you are walking the reflog from oldest to newest here instead of newest to oldest so -1 is grabbing the first checkout instead of the last.

@ben
libgit2 member
ben added a note May 10, 2012

Fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@vmg vmg and 1 other commented on an outdated diff May 9, 2012
src/revparse.c
+ /* {/...} -> Walk all commits until we see a commit msg that matches the phrase. */
+ if (movement[1] == '/') {
+ int retcode = GIT_ERROR;
+ git_revwalk *walk;
+ if (!git_revwalk_new(&walk, repo)) {
+ git_oid oid;
+ regex_t preg;
+ git_buf buf = GIT_BUF_INIT;
+
+ git_revwalk_sorting(walk, GIT_SORT_TIME);
+ git_revwalk_push(walk, git_object_id(obj));
+
+ /* Extract the regex from the movement string */
+ git_buf_put(&buf, movement+2, strlen(movement)-3);
+
+ if (!regcomp(&preg, git_buf_cstr(&buf), REG_EXTENDED)) {
@vmg
libgit2 member
vmg added a note May 9, 2012

error handling? See giterr_set_regex.

@ben
libgit2 member
ben added a note May 10, 2012

Fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arrbee arrbee and 1 other commented on an outdated diff May 9, 2012
src/revparse.c
+ size_t refspeclen = strlen(refspec);
+
+ if (git__prefixcmp(reflogspec, "@{") != 0 ||
+ git__suffixcmp(reflogspec, "}") != 0) {
+ giterr_set(GITERR_INVALID, "Bad reflogspec '%s'", reflogspec);
+ return GIT_ERROR;
+ }
+
+ /* "@{-N}" form means walk back N checkouts. That means the HEAD log. */
+ if (refspeclen == 0 && !git__prefixcmp(reflogspec, "@{-")) {
+ if (git__strtol32(&n, reflogspec+3, NULL, 0) < 0 ||
+ n < 1) {
+ giterr_set(GITERR_INVALID, "Invalid reflogspec %s", reflogspec);
+ return GIT_ERROR;
+ }
+ git_reference_lookup(&ref, repo, "HEAD");
@arrbee
libgit2 member
arrbee added a note May 9, 2012

Think you need to be checking the error return value here and on the next couple lines.

@ben
libgit2 member
ben added a note May 10, 2012

Fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arrbee arrbee and 1 other commented on an outdated diff May 9, 2012
src/revparse.c
+ }
+ git_reference_lookup(&ref, repo, "HEAD");
+ git_reflog_read(&reflog, ref);
+ git_reference_free(ref);
+
+ refloglen = git_reflog_entrycount(reflog);
+ for (i=0; i < refloglen; i++) {
+ const char *msg;
+ entry = git_reflog_entry_byindex(reflog, i);
+
+ msg = git_reflog_entry_msg(entry);
+ if (!git__prefixcmp(msg, "checkout: moving")) {
+ n--;
+ if (!n) {
+ char *branchname = strrchr(msg, ' ') + 1;
+ git_buf_printf(&buf, "refs/heads/%s", branchname);
@arrbee
libgit2 member
arrbee added a note May 9, 2012

Unfortunately, the contents of the checkout lines in the reflog are not guaranteed to be a branch name. In my reflog for libgit2, I found: <sha>^0, <sha>, <sha prefix>, FETCH_HEAD, and <branch>

@ben
libgit2 member
ben added a note May 10, 2012

Fixed! Using a regex to pull out the desired value, and doing a full lookup on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Ben Straub added some commits May 10, 2012
Ben Straub Rev-parse: @{time} syntax.
Ported date.c (for approxidate_careful) from git.git
revision aa39b85. Trimmed out the parts we're not
using.
a8c3850
Ben Straub Fixing broken tests. 641ff7e
@travisbot

This pull request fails (merged 641ff7e into 821f6bc).

@travisbot

This pull request fails (merged fc6221d into 821f6bc).

@travisbot

This pull request fails (merged 6309f9e into 821f6bc).

@arrbee
libgit2 member

@benstraub is on 🔥 Fixed! 👊 Fixed! 👊 Fixed! 👊

@vmg
libgit2 member

@ben
libgit2 member

Closing this PR. Stay tuned for a new one that's rebased on top of development.

EDIT: Check out #684! It's ♨️!

@ben ben closed this May 11, 2012
@ben ben referenced this pull request May 11, 2012
Merged

Rev parse #684

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment