-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IllegalArgumentException when adding a file longer than 2 GiB #2560
Comments
In general, it never hurts to submit a issue for a problem however be prepared you will need to do your homework w.r.t. investigation. The Subversion might be a local problem. The |
for the svn you can go to the directory |
Thanks for the suggestion. It would seem we're getting authentication errors most likely because I'm running the opengrok indexer on an account that does not have access to the SVN repository.
|
There is a (clunky) way how to pass username/password to Subversion process - using the |
Can you try to get more info about the |
(FYI: I accidentally closed and reopened this issue) The file contains secret information about our company but it's a 3GB text file inside of a 300MB .gz file. Is there anything I could check without share the actual file itself?
|
Is there a stack trace in the log associated with the The logger used in this case should log one I believe as it is called like this from 1184 try {
1185 if (alreadyClosedCounter.get() > 0) {
1186 ret = false;
1187 } else {
1188 pctags = ctagsPool.get();
1189 addFile(x.file, x.path, pctags);
1190 successCounter.incrementAndGet();
1191 ret = true;
1192 }
...
1208 } catch (RuntimeException|IOException e) {
1209 String errmsg = String.format("ERROR addFile(): %s",
1210 x.file);
1211 LOGGER.log(Level.WARNING, errmsg, e);
1212 x.exception = e;
1213 ret = false;
1214 } finally { This is because The exception likely comes from one of the analyzers - 580 if (fa != null) {
581 Genre g = fa.getGenre();
582 if (g == Genre.PLAIN || g == Genre.XREFABLE || g == Genre.HTML) {
583 doc.add(new Field(QueryBuilder.T, g.typeName(), string_ft_stored_nanalyzed_norms));
584 }
585 fa.analyze(doc, StreamSource.fromFile(file), xrefOut); In your case it could be |
Also, maybe worth trying to bisect the original file (assuming the exception is caused by the contents and not the compressed image) and see if you could find the spot which causes the problem. |
Unfortunately, someone should have never checked a 300MB compressed (3GB uncompressed) text file like this into our repo. I have no desire to get opengrok to index the file but if you guys need me to debug it for future development, I will. I was planning to either ignore the file or delete it Here is the stack trace.
|
If you run OpenGrok before 1.1-rc80 the chances are you are bumping into the issue fixed in cset 3e49081 - normally the Java lexer classes generated from the .lex descriptions should be changed not to accept too long tokens. |
Or perhaps this is actually a bug triggered by file size greater than 2 GiB.
119 /**
120 * Clears, and then resets the instances attributes per the specified
121 * arguments.
122 * @param str the matched symbol
123 * @param start the match start position
124 * @param end the match end position
125 */
126 protected void setAttribs(String str, int start, int end) {
127 clearAttributes();
128 //FIXME increasing below by one(default) might be tricky, need more analysis
129 // after lucene upgrade to 3.5 below is most probably not even needed
130 this.posIncrAtt.setPositionIncrement(1);
131 this.termAtt.setEmpty();
132 this.termAtt.append(str);
133 this.offsetAtt.setOffset(start, end);
134 } The trouble starts in 39 private final int start;
40 private final int end; and bubbles up to The trouble is that JFlex's |
In the meantime we could limit the maximum size of files to 2 GiB. Maybe time to revisit #534. |
Actually, limiting on input file size cannot work given that how GZip analyzer works - it is based on streams. |
We've been using opengrok for many years and I've come to accept the 'warning' messages that show up in the logs. That being said, at what point are they "bugs" or things you guys want to know and I/we should report them?
Examples:
The text was updated successfully, but these errors were encountered: