Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store and verify AnalyzerGuru and Analyzer versions #2066

Merged
merged 2 commits into from
May 9, 2018

Conversation

idodeclare
Copy link
Contributor

Hello,

Please consider for integration this patch to store and verify version numbers for AnalyzerGuru, FileAnalyzer, and FileAnalyzer subclasses so that re-analysis can be done in a targeted manner as needed upon certain reconfigurations or upon deploying a new OpenGrok release.

AnalyzerGuru version number is a 64-bit value composed from:

  • a static 32-bit value incremented by developers upon the addition of a new FileAnalyzer subclass (as for the patches in queue to support Forth [4be83daf29] and Objective-C [fbd7de4f249d]) or when existing matching logic is altered (e.g., changing a supported suffix for Fortran).
  • a dynamic 32-bit value composed deterministically from users' customizing prefix/extension matching by specifying -A,--analyzer or by using the new OPENGROK_ASSIGNMENTS environment variable.

On finding a mismatch of AnalyzerGuru version for a stored Document, OpenGrok will re-determine the appropriate analyzer for the file (possibly changing e.g. from PlainAnalyzer to ObjectiveCAnalyzer) or decide that the existing analyzer is appropriate and do nothing.

FileAnalyzer and subclasses' version number is a 64-bit value composed from:

  • a static 32-bit value defined in the FileAnalyzer base class that applies to all analyzers.
  • a static 32-bit value defined for each specific FileAnalyzer subclass.

On finding a mismatch of analyzer version, OpenGrok will re-analyze a file. Such a targeted version update might be used e.g. when fixing a specific support for a specific language's syntax.

(Updating the base class FileAnalyzer version will be done very rarely when new developments require that all files be re-analyzed but without requiring users to manually delete their own indexes.)

Thank you.

@tarzanek tarzanek self-assigned this Apr 19, 2018
@tarzanek tarzanek added this to the 1.1 milestone Apr 19, 2018
@tarzanek
Copy link
Contributor

ok, I think this PR is a good step towards the generic upgrade code, that will allow incremental update from old lucene versions ...
lucene itself has code to upgrade indexes, we just lack versions of xrefs and this is a good step towards versioning of xrefs ...

@tarzanek
Copy link
Contributor

ok, lgtm, I am curious on size impact on index for huge projects

so getSpecializedVersionNo will need to be upgraded whenever you change the analyzer ... OK

@tarzanek
Copy link
Contributor

the only thing that comes to my mind if this update of version number won't be done automagically (e.g. from timestamp of class?)

@tarzanek
Copy link
Contributor

but I guess we can start like this ...
another optimization might be to somehow dedup this (if it is needed) ... will the version be stored in index for lots of documents?
would it make sense to have just a mapping table for the type to version?
(this would have the implication that the index would need a FULL update of all documents to upgraded analyzer version ... which might be a prob ... anyways, I am just thinking aloud)

@tarzanek
Copy link
Contributor

tarzanek commented Apr 19, 2018

other than above musings, I'd say let's go and merge ...
@vladak @tulinkry @Orviss any comments?
otherwise I will merge and we will have to live with the change ;)

@tarzanek
Copy link
Contributor

(with the motto better to have a small change in right direction than no change at all :-D )

@idodeclare
Copy link
Contributor Author

Thanks, @tarzanek, for the review. Your question made me realize that the analyzer version values can be stored in various fields in the now-existing metadata document from 39bf187 — so I’ll work on that.

... with upgrader from binary serialization of v1
IndexAnalysisSettings.
@idodeclare
Copy link
Contributor Author

@tarzanek , I pushed that change to avoid redundant storage of analyzer version.

the only thing that comes to my mind if this update of version number won't be done automagically (e.g. from timestamp of class?)

No not easily possible automatically I'm afraid. The version number is stored on the analyzer class but is representing possible changes in multiple objects related to the language; e.g. to related Xref and SymbolTokenizer .lex files or to symbol Consts, etc. so the developer will have to intelligently decide when to bump the language version (and code reviewers will have to double check this as well).

@idodeclare
Copy link
Contributor Author

Ping, @tarzanek

@tarzanek
Copy link
Contributor

tarzanek commented May 3, 2018

sorry for the delays, been stuck in with another nosql big data problem :) ,
thank you for the ping, checking the review ...

@tarzanek
Copy link
Contributor

tarzanek commented May 9, 2018

looks OK to me, not sure if the indexanalysis settings upgrader and new class is needed
since there was no release since first indexanalysis class was released I would say we don't need it, but then it is a good precedence we might want to test for future upgrades
merging, thank you for your patience Chris

@tarzanek tarzanek merged commit 78635c7 into oracle:master May 9, 2018
@idodeclare idodeclare deleted the feature/g_z_vers branch May 10, 2018 00:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants