Fix tags not properly indexed in Lucene #291

Merged
merged 1 commit into from Jul 30, 2015

Projects

None yet

2 participants

@plamentotev
Contributor

All tags that reference to particular object are stored in a list within a map.
There is a if statement that inits empty list when there are not references
to the current object yet, but the boolean expression checks for the wrong value.

@plamentotev plamentotev Fix tags not properly indexed in Lucene
All tags that reference to particular object are stored in a list within a map.
There is a if statement that inits empty list when there are not references
to the current object yet, but the boolean expression checks for the wrong value.
d0bb38c
@plamentotev
Contributor

Hi,

I opened this pull request to start a discussion more than to submit a patch.

The tags map contains all tags that point to particular commit. The key of the map is the commit SHA-1 but the if statement checks if the map contains the SHA-1 of the tag. That's a bug but there are two more things I've noticed.

  1. updateIndex contains similar to reindex code but there the tags map is only build but not used. Still can get a NPE exception (see 2. for more details). So it should be fixed there too or deleted.
  2. Tags can point to arbitrary object not only commits. In particular they can point to other tags. Lets say me have commit c, tag tag-a that points to c and tag tag-b that points to tag-a. We have something like that tag-b -> tag-a -> c. With the current code (with the patch applied) in this case the tags will have the following content { c=[tag-a], tag-a=[tag-b] } . (without the patch we may get NPE, that's how I found the bug). Is this is the intended behavior? To me it's seems more logical to have { c=[tag-a, tag-b] }. This could be achieved if RevWalk#peel is used instead of getReferencedObjectId. If the tag is annotated(non-annotated are skipped so we are not interested in them anyway) RevWalk#peel returns the first non-tag object. That's not necessary a commit object btw (not sure if this a problem - I think it's not but I''m not sure). For a real world example you can look at git git repo - v1.0rc6 points to v0.99.9n
@plamentotev
Contributor

And also if I may ask (just out of curiosity) - why only the annotated tags are indexed?

@gitblit
Owner
gitblit commented Jul 16, 2015

The key of the map is the commit SHA-1 but the if statement checks if the map contains the SHA-1 of the tag.

Oops. Good catch.

To me it's seems more logical to have { c=[tag-a, tag-b] }.

I agree and that was the intention.

That's not necessary a commit object btw (not sure if this a problem - I think it's not but I''m not sure)

Generally my answer is the indexer is trying to cover the 98% use-case of tags. While it is possible to tag a non-commit object, as you point out, I believe it to be an infrequently used capability. The indexer indexes commits (with and without tags) and blobs but it does not currently consider that a blob may be tagged. The JGit repo has a good example which tags Shawn's public GPG key. I'd add a Github link if they supported non-commit tags at all. :)

I believe lightweight tags are intended to be used client-side not server-side due to their transitive nature and lack of tagger metadata. For this reason, lightweight tags are deliberately avoided by the indexer.

If we did want to support them, the indexing process would have to be revised. Lucene does not have the ability to update a document so we'd have to destroy all documents which reference the lightweight tag and then re-index the affected objects.

@plamentotev
Contributor

Thanks for the answers.

The indexer indexes commits (with and without tags) and blobs but it does not currently consider that a blob may be tagged

That's exactly what I had in mind. RevWalk#peel may return blob that is not commit object but it will be ignored so it should not not be a problem.

The JGit repo has a good example which tags Shawn's public GPG key. I'd add a Github link if they supported non-commit tags at all. :)

That's very nice functionality. Using tags to point to GPG keys seems like a common use case. But when I click on the object link I get an internal server error.

@gitblit gitblit merged commit 234abbd into gitblit:develop Jul 30, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment