Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Rev parse #648

Closed
wants to merge 27 commits into from

5 participants

@ben

Support for the rev-parse revision-specification language, which takes a string and provides a git_object. So far, these are working:

  • a65fedf39aefe402d3bb6e24df4d4f5fe4547750 -> object with that exact hash
  • a65fedf39a -> object with that hash prefix (uses git_object_lookup_prefix)
  • v0.16.0-234-gd98fa70 (output from git describe) -> object with the hash prefix that follows "-g"
  • refs/heads/master or master -> object pointed to by a fully- or partially-named ref
  • tag^{} -> dereference until we find something that isn't a tag
  • tag^{tree} -> dereference until we find an object of the given type
  • tag^ or master^3 -> dereference until we find a commit, then get the Nth parent
  • tag~3 -> commit that is N steps back from the ref, always following the 1st parent

Things that are still missing:

  • HEAD@{2} -> object that was pointed to by a ref N steps back in the reflog
  • branch@{yesterday}, master@{5 minutes ago} -> object that was pointed to by a ref at the specified time
  • @{-3} -> commit pointed to by the Nth branch checked out before the current one
  • tag^{/foo} -> youngest commit whose message matches the regex after "/", which is reachable from the given starting point
  • :/foo -> youngest commit whose message matches the regex after "/", which is reachable from any ref
  • tag:README.md -> dereference to a tree, then return the blob or tree at the given path
  • :2:Makefile -> blob object in the index at the given merge stage

The entry point is git_revparse_single, and it's expected that there will eventually be a git_revparse_range which supports the rev-parse syntax for finding ranges of commits.

Ben Straub added some commits
Ben Straub First stab at implementation of rev-parse.
This version supports refspecs of these kinds:
- Full & partial SHAs
- Output from "git describe"
- "/refs/heads/master" (full ref names)
- "master" (partial ref names)
- "FETCH_HEAD" (named heads)
32c0bf7
Ben Straub Simpler states and initial structure.
New tests for "foo^2" syntax, but they don't pass
yet. Support for chaining these, i.e.
"foo^2~3^{u}~1' is starting to shape up.
c92dd42
Ben Straub Implemented partial caret syntax for rev-parse.
Supported forms:
- "^n"
- "^0"
- "^"

Still missing: all of the "^{…}" variants.
629072c
Ben Straub Implemented rev-parse's "^{}" syntax. 32775f8
Ben Straub Implemented rev-parse "^{type}" syntax. d98fa70
@travisbot

This pull request fails (merged d98fa70 into 821f6bc).

src/revparse.c
((161 lines not shown))
+}
+
+static int dereference_to_type(git_object **out, git_object *obj, git_otype target_type)
+{
+ git_object *obj1 = obj, *obj2 = obj;
+
+ while (1) {
+ git_otype this_type = git_object_type(obj1);
+
+ if (this_type == target_type) {
+ *out = obj1;
+ return 0;
+ }
+
+ /* Dereference once, if possible. */
+ obj2 = dereference_object(obj1);
@arrbee Owner
arrbee added a note

I think you need to check for NULL and return an error here.

@ben
ben added a note

Fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arrbee arrbee commented on the diff
src/revparse.c
((291 lines not shown))
+ } else {
+ git_buf_putc(&specbuffer, *spec_cur);
+ }
+ spec_cur++;
+
+ if (current_state != next_state) {
+ /* Leaving INIT state, find the object specified, in case that state needs it */
+ retcode = revparse_lookup_object(&next_obj, repo, git_buf_cstr(&specbuffer));
+ if (retcode < 0) {
+ goto cleanup;
+ }
+ }
+ break;
+
+
+ case REVPARSE_STATE_CARET:
@arrbee Owner
arrbee added a note

The CARET state is interesting. When I was originally thinking about this problem, I was imagining more states. For example: a ^ could put you in a CARET state, but then on the next character you would exit that into one of three states: CARET_DIGITS which you then stay in until a non-digit, CARET_BRACE which you then stay in until a matching close brace and then call a handler for the brace contents, or any other character which causes the caret to execute (i.e. move to parent) and go to the new state. Of course, you could break the CARET_BRACE state even further into CARET_BRACE_SLASH for searches, CARET_BRACE_EMPTY for ^{} and CARET_BRACE_TYPE.

There is nothing wrong with keeping the number of states small, I guess. Essentially the CARET_BRACE state I was imagining is just covered by the broader handle_caret_syntax() function. This design does mean that you don't get to take advantage of the state machine for most of the syntax error detection and have to do more in the handler, but apart from that, I don't think it adds substantially to the code complexity.

@ben
ben added a note

I was thinking of this too, but there's also the issue of operator chaining: master^2~3^{tree}. I'm using the state machine to detect operators and dispatch the parsing operations, and handle_caret_syntax to parse the argument to the caret. It seems like ^3 syntax use a different state than ^{tree} would simplify the argument handlers, but add transitions to the state diagram, which seems like a wash complexity-wise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@travisbot

This pull request fails (merged 048debe into 821f6bc).

@arrbee arrbee commented on the diff
src/revparse.c
((297 lines not shown))
+ spec_cur++;
+
+ if (current_state != next_state) {
+ /* Leaving INIT state, find the object specified, in case that state needs it */
+ retcode = revparse_lookup_object(&next_obj, repo, git_buf_cstr(&specbuffer));
+ if (retcode < 0) {
+ goto cleanup;
+ }
+ }
+ break;
+
+
+ case REVPARSE_STATE_CARET:
+ /* Gather characters until NULL, '~', or '^' */
+ if (!*spec_cur) {
+ retcode = handle_caret_syntax(out,
@arrbee Owner
arrbee added a note

Instead of exiting with a goto, I would be tempted to add a DONE state and have terminal cases such as this just go into that state and then let the loop exit naturally.

@ben
ben added a note

Yeah, that seems cleaner. goto made me feel a little dirty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Ben Straub added some commits
Ben Straub Removed goto from state machine loop. fafb0c8
Ben Straub Implementing rev-parse's "ref~2" syntax.
Also extended the test suite to include chaining
operators, e.g. "master^2~3^4".
ec05459
@travisbot

This pull request fails (merged fafb0c8 into 821f6bc).

@travisbot

This pull request fails (merged ec05459 into 821f6bc).

@nulltoken
Collaborator

Should fix #235

src/revparse.c
((191 lines not shown))
+ return GIT_OBJ_BAD;
+}
+
+static int handle_caret_syntax(git_object **out, git_object *obj, const char *movement)
+{
+ git_commit *commit;
+ int n;
+
+ if (*movement == '{') {
+ if (movement[strlen(movement)-1] != '}') {
+ set_invalid_syntax_err(movement);
+ return GIT_ERROR;
+ }
+
+ /* {} -> Dereference until we reach an object that isn't a tag. */
+ if (strlen(movement) == 2) {
@vmg Owner
vmg added a note

Stop dupping the strlen, just stick it in a variable.

@ben
ben added a note

Fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
src/revparse.c
((373 lines not shown))
+ git_buf_clear(&stepbuffer);
+ if (retcode < 0) {
+ next_state = REVPARSE_STATE_DONE;
+ }
+ } else if (*spec_cur == '^') {
+ retcode = handle_linear_syntax(&next_obj, cur_obj, git_buf_cstr(&stepbuffer));
+ git_buf_clear(&stepbuffer);
+ next_state = !retcode ? REVPARSE_STATE_CARET : REVPARSE_STATE_DONE;
+ } else {
+ git_buf_putc(&stepbuffer, *spec_cur);
+ }
+ spec_cur++;
+ break;
+
+ case REVPARSE_STATE_DONE:
+ keep_looping = 0;
@vmg Owner
vmg added a note

A done state sounds superfluous, specially if it's only going to set keep_looping. You can just skip it.

while (!finished) {
  ...
  if (something) {
    finished = 1;
  }
  ...
}
@ben
ben added a note

Looks like we hammered out a good way to exit the state machine loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Ben Straub Implementing rev-parse's ref@{n} and @{-n} syntaxes.
Added some reflags to the test repo to support
unit tests.
e1849d7
@travisbot

This pull request fails (merged e1849d7 into 821f6bc).

@arrbee
Owner

Regarding the "DONE" state, I was thinking that the code would be structured like:

    while (current_state != STATE_DONE) {
        char next = *spec_cur++;

        switch (current_state) {
        case A:
            ...
            next_state = ?;
            break;
        case B:
            ...
            next_state = ?;
            break;
        }

        current_state = next_state;
    }

To me, at least, this feels like a natural structure for a state machine. I agree with @tanoku that the separate loop boolean does not improve the structure.

Ben Straub added some commits
Ben Straub Rev-parse chaining: adding the longest chain
in the test repo.
697eb51
Ben Straub Incorporating feedback from @tanoku.
Removed repeated strlen's, and unnecessary loop-termination variable.
2b7cf74
@travisbot

This pull request fails (merged 2b7cf74 into 821f6bc).

@travisbot

This pull request fails (merged 20849e2 into 821f6bc).

@travisbot

This pull request fails (merged 7db36b3 into 821f6bc).

Ben Straub Rev-parse: "ref@{upstream}" syntax.
Added tracking configuration to the test repo's
config to support unit tests.
5d3cfb9
@travisbot

This pull request fails (merged 5d3cfb9 into 821f6bc).

Ben Straub Fixing legacy unit tests.
New functionality introduced some new refs and
tags in the test repo.
43ad496
@travisbot

This pull request fails (merged 43ad496 into 821f6bc).

@travisbot

This pull request fails (merged fa2ab26 into 821f6bc).

@travisbot

This pull request fails (merged adff19d into 821f6bc).

src/revparse.c
((41 lines not shown))
+ git_reference *ref;
+ git_object *obj = NULL;
+
+ if (!git_reference_lookup(&ref, repo, spec)) {
+ git_reference *resolved_ref;
+ if (!git_reference_resolve(&resolved_ref, ref)) {
+ if (!git_object_lookup(&obj, repo, git_reference_oid(resolved_ref), GIT_OBJ_ANY)) {
+ *out = obj;
+ }
+ git_reference_free(resolved_ref);
+ }
+ git_reference_free(ref);
+ }
+ if (obj) {
+ return 0;
+ }
@arrbee Owner
arrbee added a note

revparse_lookup_fully_qualifed_ref can be rewritten using the new git_reference_name_to_oid function as:

{
    git_oid resolved;

    if (git_reference_name_to_oid(&resolved, repo, spec) < 0)
        return -1;

    return git_object_lookup(out, repo, &resolved, GIT_OBJ_ANY);
}
@ben
ben added a note

Nice. I didn't know about that call. Thanks!

@arrbee Owner
arrbee added a note

It's quite new, possibly added after you forked.

@ben
ben added a note

It's actually in there, I just wasn't aware of it. Fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arrbee arrbee commented on the diff
src/revparse.c
((64 lines not shown))
+ git_object *obj = NULL;
+ git_oid oid;
+ git_buf refnamebuf = GIT_BUF_INIT;
+ static const char* formatters[] = {
+ "refs/%s",
+ "refs/tags/%s",
+ "refs/heads/%s",
+ "refs/remotes/%s",
+ "refs/remotes/%s/HEAD",
+ NULL
+ };
+ unsigned int i;
+ const char *substr;
+
+ /* "git describe" output; snip everything before/including "-g" */
+ substr = strstr(spec, "-g");
@arrbee Owner
arrbee added a note

This is a problem for branches that contain the string "-g". I tried creating a branch named "this-is-not-good" and git_revparse_single could not resolve it (because it was trying to resolve "ood").

I think you either need to look much more carefully to see if the spec looks like git describe output, or probably even better, you need to fall back to the original spec if treating it as git describe output fails. Actually, I think you should do both - require at least slightly tighter matching before trying as describe output (perhaps looking back from the matched "-g" to make sure it is preceded by 1 or more digits and another dash, but that may not be the best choice) - and then if it failed, try again with the original spec. You could probably code it like:

    substr = strstr(spec, "-g");
    if (substr && further_checks_on_spec(spec, substr)) {
        if (revparse_lookup_object(out, repo, substr + 2) == 0)
            return 0;
    }

Using this recursive call (just one, so we don't really have to worry about too much nesting) makes it easy to continue with the normal path if the describe fails to match.

What do you think?

@ben
ben added a note

Interesting. Would it be acceptable to use a regex in further_checks_on_spec? I know we like to keep our CPU time down, but this api isn't called all the time, and this would only happen if there's a "-g" in the spec.

@vmg Owner
vmg added a note

It looks like a text-book example of using a regex. If we pre-compile it just once, the performance hit will be minimal.

@vmg Owner
vmg added a note

Here's a good starting point

/([\.\w\/\-]+)-(\d+)-g([\da-fA-f]{3,})/
@ben
ben added a note

Fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
src/revparse.c
((132 lines not shown))
+ return GIT_ERROR;
+ }
+
+ /* "@{-N}" form means walk back N checkouts. That means the HEAD log. */
+ if (refspeclen == 0 && !git__prefixcmp(reflogspec, "@{-")) {
+ if (git__strtol32(&n, reflogspec+3, NULL, 0) < 0 ||
+ n < 1) {
+ giterr_set(GITERR_INVALID, "Invalid reflogspec %s", reflogspec);
+ return GIT_ERROR;
+ }
+ git_reference_lookup(&ref, repo, "HEAD");
+ git_reflog_read(&reflog, ref);
+ git_reference_free(ref);
+
+ refloglen = git_reflog_entrycount(reflog);
+ for (i=0; i < refloglen; i++) {
@arrbee Owner
arrbee added a note

I think you are walking the reflog from oldest to newest here instead of newest to oldest so -1 is grabbing the first checkout instead of the last.

@ben
ben added a note

Fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
src/revparse.c
((327 lines not shown))
+ /* {/...} -> Walk all commits until we see a commit msg that matches the phrase. */
+ if (movement[1] == '/') {
+ int retcode = GIT_ERROR;
+ git_revwalk *walk;
+ if (!git_revwalk_new(&walk, repo)) {
+ git_oid oid;
+ regex_t preg;
+ git_buf buf = GIT_BUF_INIT;
+
+ git_revwalk_sorting(walk, GIT_SORT_TIME);
+ git_revwalk_push(walk, git_object_id(obj));
+
+ /* Extract the regex from the movement string */
+ git_buf_put(&buf, movement+2, strlen(movement)-3);
+
+ if (!regcomp(&preg, git_buf_cstr(&buf), REG_EXTENDED)) {
@vmg Owner
vmg added a note

error handling? See giterr_set_regex.

@ben
ben added a note

Fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
src/revparse.c
((127 lines not shown))
+ size_t refspeclen = strlen(refspec);
+
+ if (git__prefixcmp(reflogspec, "@{") != 0 ||
+ git__suffixcmp(reflogspec, "}") != 0) {
+ giterr_set(GITERR_INVALID, "Bad reflogspec '%s'", reflogspec);
+ return GIT_ERROR;
+ }
+
+ /* "@{-N}" form means walk back N checkouts. That means the HEAD log. */
+ if (refspeclen == 0 && !git__prefixcmp(reflogspec, "@{-")) {
+ if (git__strtol32(&n, reflogspec+3, NULL, 0) < 0 ||
+ n < 1) {
+ giterr_set(GITERR_INVALID, "Invalid reflogspec %s", reflogspec);
+ return GIT_ERROR;
+ }
+ git_reference_lookup(&ref, repo, "HEAD");
@arrbee Owner
arrbee added a note

Think you need to be checking the error return value here and on the next couple lines.

@ben
ben added a note

Fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
src/revparse.c
((141 lines not shown))
+ }
+ git_reference_lookup(&ref, repo, "HEAD");
+ git_reflog_read(&reflog, ref);
+ git_reference_free(ref);
+
+ refloglen = git_reflog_entrycount(reflog);
+ for (i=0; i < refloglen; i++) {
+ const char *msg;
+ entry = git_reflog_entry_byindex(reflog, i);
+
+ msg = git_reflog_entry_msg(entry);
+ if (!git__prefixcmp(msg, "checkout: moving")) {
+ n--;
+ if (!n) {
+ char *branchname = strrchr(msg, ' ') + 1;
+ git_buf_printf(&buf, "refs/heads/%s", branchname);
@arrbee Owner
arrbee added a note

Unfortunately, the contents of the checkout lines in the reflog are not guaranteed to be a branch name. In my reflog for libgit2, I found: <sha>^0, <sha>, <sha prefix>, FETCH_HEAD, and <branch>

@ben
ben added a note

Fixed! Using a regex to pull out the desired value, and doing a full lookup on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Ben Straub added some commits
Ben Straub Rev-parse: @{time} syntax.
Ported date.c (for approxidate_careful) from git.git
revision aa39b85. Trimmed out the parts we're not
using.
a8c3850
Ben Straub Fixing broken tests. 641ff7e
@travisbot

This pull request fails (merged 641ff7e into 821f6bc).

@travisbot

This pull request fails (merged fc6221d into 821f6bc).

@travisbot

This pull request fails (merged 6309f9e into 821f6bc).

@arrbee
Owner

@benstraub is on :fire: Fixed! :punch: Fixed! :punch: Fixed! :punch:

@vmg
Owner

:sparkles::sparkles::sparkles::sparkles::sparkles::sparkles::sparkles::sparkles:

@ben

Closing this PR. Stay tuned for a new one that's rebased on top of development.

EDIT: Check out #684! It's :hotsprings:!

@ben ben closed this
@ben ben referenced this pull request
Merged

Rev parse #684

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Apr 27, 2012
  1. First stab at implementation of rev-parse.

    Ben Straub authored
    This version supports refspecs of these kinds:
    - Full & partial SHAs
    - Output from "git describe"
    - "/refs/heads/master" (full ref names)
    - "master" (partial ref names)
    - "FETCH_HEAD" (named heads)
  2. Simpler states and initial structure.

    Ben Straub authored
    New tests for "foo^2" syntax, but they don't pass
    yet. Support for chaining these, i.e.
    "foo^2~3^{u}~1' is starting to shape up.
  3. Implemented partial caret syntax for rev-parse.

    Ben Straub authored
    Supported forms:
    - "^n"
    - "^0"
    - "^"
    
    Still missing: all of the "^{…}" variants.
  4. Implemented rev-parse's "^{}" syntax.

    Ben Straub authored
  5. Implemented rev-parse "^{type}" syntax.

    Ben Straub authored
  6. Returning error if dereferencing operation fails.

    Ben Straub authored
  7. Removed goto from state machine loop.

    Ben Straub authored
  8. Implementing rev-parse's "ref~2" syntax.

    Ben Straub authored
    Also extended the test suite to include chaining
    operators, e.g. "master^2~3^4".
Commits on Apr 30, 2012
  1. Adding comment documentation for rev-parse api.

    Ben Straub authored
Commits on May 1, 2012
  1. Implementing rev-parse's ref@{n} and @{-n} syntaxes.

    Ben Straub authored
    Added some reflags to the test repo to support
    unit tests.
  2. Rev-parse chaining: adding the longest chain

    Ben Straub authored
    in the test repo.
  3. Incorporating feedback from @tanoku.

    Ben Straub authored
    Removed repeated strlen's, and unnecessary loop-termination variable.
Commits on May 2, 2012
  1. Rev-parse: plugging (most) memory leaks.

    Ben Straub authored
Commits on May 3, 2012
  1. Fixed last 2 memory leaks in rev-parse.

    Ben Straub authored
  2. Rev-parse: "ref@{upstream}" syntax.

    Ben Straub authored
    Added tracking configuration to the test repo's
    config to support unit tests.
  3. Fixing legacy unit tests.

    Ben Straub authored
    New functionality introduced some new refs and
    tags in the test repo.
Commits on May 4, 2012
Commits on May 7, 2012
  1. Rev-parse: "ref^{/regex}" syntax.

    Ben Straub authored
Commits on May 10, 2012
  1. Rev-parse: @{time} syntax.

    Ben Straub authored
    Ported date.c (for approxidate_careful) from git.git
    revision aa39b85. Trimmed out the parts we're not
    using.
  2. Fixing broken tests.

    Ben Straub authored
  3. Simplifying revparse_lookup_fully_qualified_ref.

    Ben Straub authored
  4. Rev-parse: now capturing and reporting regex errors.

    Ben Straub authored
  5. Plugging memory leak.

    Ben Straub authored
  6. Rev-parse: regex check for "git describe" output.

    Ben Straub authored
  7. Rev-parse: proper error checking.

    Ben Straub authored
Something went wrong with that request. Please try again.