New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lucene decoding does not work when reindexing #408

Closed
gitblit opened this Issue Aug 12, 2015 · 5 comments

Comments

Projects
None yet
2 participants
@gitblit
Owner

gitblit commented Aug 12, 2015

Originally reported on Google Code with ID 112

Sometimes (as it seems to work sometimes) indexing of blob content
is not correct. It always decodes as UTF-8.

It turns out that there are two indexing methods, one index() that
works and re-index that doesn't. If I drop the lucene index, reindex will be used.

Fix attached

Reported by robin.rosenberg on 2012-07-23 14:26:49


- _Attachment: [0001-Fix-the-LuceneExecutor.reindex-to-decode-blobs-the-s.patch](https://storage.googleapis.com/google-code-attachments/gitblit/issue-112/comment-0/0001-Fix-the-LuceneExecutor.reindex-to-decode-blobs-the-s.patch)_
@gitblit

This comment has been minimized.

Owner

gitblit commented Aug 12, 2015

Oops.  I pushed a lighter-weight fix for this which addresses the problem without having
to use JGitUtils.  Thanks for the report!

Reported by James.Moger on 2012-07-24 00:24:12

  • Status changed: Queued
  • Labels added: Milestone-1.0.1
@gitblit

This comment has been minimized.

Owner

gitblit commented Aug 12, 2015

The idea was to make reduce code duplication in the same fix,

Reported by robin.rosenberg on 2012-07-24 08:45:16

@gitblit

This comment has been minimized.

Owner

gitblit commented Aug 12, 2015

I do like reusing code, but jumping to that method in JGitUtils opens a new revwalk,
a new treewalk with a path filter, and doesn't reuse any byte buffers for what is a
memory-consuming process.  The other getStringContent would be a better match, but
it still has to perform an unnecessary lookup.  To my mind it is better to keep the
8 lines of code which decode a blob from the repository in the Lucene indexer.

It should be noted that the strategy differs slightly between index() and reindex().
 Index is for incrementally updating branches and blobs and is executed due to pushed
commits.  It delegates most git ops to JGitUtils which I think is reasonable.  Reindex
is for ground-zero indexing, which is expensive, so it directly uses revwalks, treewalks,
etc in a way that is optimal for the indexing the entire rpeository.

Reported by James.Moger on 2012-07-24 12:25:40

@gitblit

This comment has been minimized.

Owner

gitblit commented Aug 12, 2015

Reported by James.Moger on 2012-08-20 02:06:35

  • Labels added: Milestone-1.1.0
  • Labels removed: Milestone-1.0.1
@gitblit

This comment has been minimized.

Owner

gitblit commented Aug 12, 2015

Fix/change released in 1.1.0.

Reported by James.Moger on 2012-08-25 12:20:42

  • Status changed: Fixed

@gitblit gitblit closed this Aug 12, 2015

@fzs fzs modified the milestone: 1.1.0 Dec 13, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment