Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing long files #791

Closed
philiprbrenan opened this issue Dec 7, 2015 · 25 comments
Closed

Parsing long files #791

philiprbrenan opened this issue Dec 7, 2015 · 25 comments

Comments

@philiprbrenan
Copy link

I have a large Java file (4.8MB). If I insert a new { near the start of this file, the run time for the parser becomes very long (5 minutes or so) - while the parser is running the editor is unusable. Is there some way to prevent this from happening? By parser I mean the code that determines the coloration of the keywords, strings, etc. Thanks!

@b4n
Copy link
Member

b4n commented Dec 7, 2015

Hum, that's odd. 4.8M doesn't look very large, and certainly nothing that would require 5 minutes to process. Could you provide the file so we can check what's going on?

Also, which version of Geany are you using, and on which OS?

Anyway, to work around this you can try a few things:

  • Disable real-time symbol parsing (set Edit → Preferences → Editor → Completions → Symbol list update frequency to 0). This would prevent updating the Symbols pane while typing, so if it's the bottleneck it should help a lot. Note however that the symbols will still be extracted when saving the file.

  • If the above helped but it's still too slow when saving the file, you can disable symbol parsing for the Java filetype by setting the tag_parser settings option to an empty value in filetypes.java:

    [settings]
    tag_parser=
    

    This is kind of a hack but it should disable all symbol parsing for Java files.

  • The nuclear option, disabling all filetype-specific features (highlighting, symbol parsing, etc.), is to use the filetype None for Java files.

@codebrainz
Copy link
Member

If it's 5MB source code file, I'm guessing it's auto-generated (or else Java is way more verbose than I imagined). If so, it could be all on one line which would make it really slow. If it's not all on one line but has many long lines and long-line wrapping is on, it could also take a long time.

@philiprbrenan
Copy link
Author

It is mainly hand written and so most of the lines are 40-120 chars long.
Line wrapping is off. However there are some generated lines included: the
longest of which is 4205 chars long. Java Lint takes 6 seconds. The parser
black-out when it occurs typically takes about 5 minutes to recover. If I
understand you correctly, you are suggesting breaking up the generated
lines or better removing them to a separate file? Geany built on or after
2015-07-13.

I appreciate your time and thoughts on this matter.

Thanks,

Phil

Philip R Brenan

On Mon, Dec 7, 2015 at 11:29 PM, Matthew Brush notifications@github.com
wrote:

If it's 5MB source code file, I'm guessing it's auto-generated (or else
Java is _way_more verbose than I imagined). If so, it could be all on one
line which would make it really slow. If it's not all on one line but has
many long lines and long-line wrapping is on, it could also take a long
time.


Reply to this email directly or view it on GitHub
#791 (comment).

@codebrainz
Copy link
Member

If I understand you correctly, you are suggesting breaking up the generated lines or better removing them to a separate file?

Well Scintilla is known not to handle really long lines well (ex. minified JS), but also "word wrapping" "line wrapping" (in Document menu) causes a major performance hit. I would try to disable the latter, if you enabled it, before messing with the code.

Edit: NVM, I didn't notice you said "Line wrapping is off".

@elextr
Copy link
Member

elextr commented Dec 8, 2015

@philiprbrenan did you try the suggestions by @b4n to allow identifying if its the symbol parser or highlighting lexer?

@philiprbrenan
Copy link
Author

Following your suggestions:

Disable real-time symbol parsing prevents the problem from occurring.

  • Disable symbol parsing at save:* I saved with an extraneous { in place
    and the save was very slow. However this is less of a problem because I
    can control when a save occurs and just not save when the extra the bracket
    is outstanding.

Disabling all file type-specific features works as suggested - this was
how I was getting around the problem earlier but it takes time to set and
unset and one forgets to do it in advance and so this is an error prone
process.

I appreciate your help!

Thanks,

Phil

Philip R Brenan

On Mon, Dec 7, 2015 at 4:23 PM, Colomban Wendling notifications@github.com
wrote:

Hum, that's odd. 4.8M doesn't look very large, and certainly nothing that
would require 5 minutes to process. Could you provide the file so we can
check what's going on?

Also, which version of Geany are you using, and on which OS?

Anyway, to work around this you can try a few things:

  • Disable real-time symbol parsing (set Edit → Preferences → Editor →
    Completions → Symbol list update frequency
    to 0). This would prevent
    updating the Symbols pane while typing, so if it's the bottleneck it should
    help a lot. Note however that the symbols will still be extracted when

    saving the file.

    If the above helped but it's still too slow when saving the file, you
    can disable symbol parsing for the Java filetype by setting the
    tag_parser settings option to an empty value in filetypes.java:

    [settings]
    tag_parser=

    This is kind of a hack but it should disable all symbol parsing for
    Java files.

  • The nuclear option, disabling all filetype-specific features
    (highlighting, symbol parsing, etc.), is to use the filetype None for Java
    files.


Reply to this email directly or view it on GitHub
#791 (comment).

@philiprbrenan
Copy link
Author

Test.java.zip

When I save the attached Java file of 2K lines it takes about 30 seconds to save during which time Geany is unresponsive. Lint of course finds lots of errors very quickly and bails out after less than a second. The brackets match correctly, but class test1 is defined multiple times.

If I shorten the file the parse blackout problem disappears at around 600 lines - although the is still a noticeable pause before the file is reported as saved in the Messages tab.

@elextr
Copy link
Member

elextr commented Dec 8, 2015

Looks like the particular file is encountering pathological worst case performance of the ctags parser or symbol handling software.

Improvements are welcome.

@codebrainz
Copy link
Member

It's not that slow here, maybe 0.5-0.75 seconds to save, but my computer is really fast. It seems to perform better if replacing { with \n{\n so that the lines aren't so long (or so that fold points are more spread out, not sure).

Edit: I didn't notice you didn't attach the whole 2K line file as said. It takes about 3-4 seconds when I expand the file to ~2500 lines, and while pasting copies of it, I got it to lock up for about 10 seconds.

@techee
Copy link
Member

techee commented Dec 11, 2015

Maybe one more suggestion - do you experience the same slowness if you switch from the "Symbols" tab in the sidebar to something else? IMO the parser should be fast in this case but I suspect the symbols tree generation is the slow one here.

@philiprbrenan
Copy link
Author

philiprbrenan commented Dec 11, 2015 via email

@techee
Copy link
Member

techee commented Dec 11, 2015

Sorry, maybe I said it in a confusing way - in the sidebar select e.g. Documents instead of Symbols. The thing is that when the Symbols tree isn't shown, it isn't rebuilt when you type. I believe this might fix the problem you mentioned in your first post about the freeze when typing { at the beginning of the file. But I doubt it will have any effect on saving.

Could you provide some bigger file for testing? The file you provided is just 100 lines and is insufficient to trigger the issue for me.

@techee
Copy link
Member

techee commented Dec 11, 2015

OK nevermind, I can reproduce it when copy-pasting the lines in your file several times. I'll have a look at the profiler output if I can see something.

@philiprbrenan
Copy link
Author

philiprbrenan commented Dec 11, 2015 via email

@techee
Copy link
Member

techee commented Dec 11, 2015

Alright, just tried with the profiler and the big part of the problem should be solved by the patch here:

techee@a11f82b

It doesn't fix the slowness completely but at least it should fix the non-linear part of it. With about 4000 lines from your example about 75% of time was spent by rehighlighting the document. The remaining 25% were spent by the parser (I'm afraid we cannot do much with this).

If you are able to recompile, could you try the patch if it helps?

@philiprbrenan
Copy link
Author

This is absolutely marvellous - I will try to do this - as this is my first
attempt at patching and compiling it will take a day or two as I will have
to discover the exact procedure.

One of the (many ) reasons I use Geany for Java in preference to using the
standard Java tools is that Geany scales much better on large files: it
responds quickly and predictably with far fewer distracting glitchs and
unexpected slow downs. I greatly appreciate your efforts to eliminate this
problem - it makes a big difference.

Thanks,

Phil

Philip R Brenan

On Fri, Dec 11, 2015 at 3:51 PM, Jiří Techet notifications@github.com
wrote:

Alright, just tried with the profiler and the big part of the problem
should be solved by the patch here:

techee/geany@a11f82b
techee@a11f82b

It doesn't fix the slowness completely but at least it should fix the
non-linear part of it. With about 4000 lines from your example about 75% of
time was spent by rehighlighting the document. The remaining 25% were spent
by the parser (I'm afraid we cannot do much with this).

If you are able to recompile, could you try the patch if it helps?


Reply to this email directly or view it on GitHub
#791 (comment).

@techee
Copy link
Member

techee commented Dec 11, 2015

By the way, I've been playing with the file a bit more and I can see quite some time spent in the symbol tree building too (switch to the Documents tab, edit the file so some symbols get added/removed and switch back to the Symbols tree - it takes quite some time to rebuild it).

In the past I was suggesting we should limit the number of entries in the tree to some sane number, say 10000 entries (your file contains 234 entries per line, with 1000 lines it becomes 234000 entries), because the current implementation doesn't scale very well:

#475 (comment)

I think we should introduce some limit.

Yes, scaleability is one of my favorite Geany features too (not only for big files but also for big projects with thousands of files). So I'm definitely interested in improving any code that doesn't scale well.

@techee
Copy link
Member

techee commented Dec 11, 2015

@philiprbrenan By the way, the patch removes a single line from the code so instead of pulling the patch you can just get Geany from master and comment-out the single line which might be easier for you.

If you are on Debian, just run

apt-get build-dep geany

which should install all the dependencies and then run

./autogen.sh --disable-html-docs
make
sudo make install

@techee
Copy link
Member

techee commented Dec 11, 2015

The symbol tree slowness may also be caused by this:

#577 (comment)

The repeated symbol names make things worse for the tree generation - this might be fixable though.

Note to self: learn the difference between multiplication and addition: in the numbers above there are just 35 tags per line.

@philiprbrenan
Copy link
Author

A huge improvement - down from 10's of seconds to too fast to notice. Thank
you very much!

On Fri, Dec 11, 2015 at 4:36 PM, Jiří Techet notifications@github.com
wrote:

@philiprbrenan https://github.com/philiprbrenan By the way, the patch
removes a single line from the code so instead of pulling the patch you can
just get Geany from master and comment-out the single line which might be
easier for you.

If you are on Debian, just run

apt-get build-dep geany

which should install all the dependencies and then run

./autogen.sh --disable-html-docs
make
sudo make install


Reply to this email directly or view it on GitHub
#791 (comment).

@codebrainz
Copy link
Member

A bit off-topic, but ... @techee are you using the XCode profiler, GNU grof, or other? I tried to do a profile build to test this previously, but I was unable to get gprof to produce any output in the report, I suspect because of splitting libgeany out of the main app. All I did was to put ./configure [some options] CFLAGS="-pg" CXXFLAGS="-pg" && make install (I also tried -pg in the LDFLAGS) and then run Geany and produce a report from gmon.out. Is there anything else that needs to be done to make it work (assuming you're using gprof)?

@techee
Copy link
Member

techee commented Dec 12, 2015

@codebrainz I guess you missed the announcement of the wiki page about profiling Geany:

http://article.gmane.org/gmane.editors.geany.devel/9439

@techee
Copy link
Member

techee commented Dec 12, 2015

@philiprbrenan Great to hear! Even though the original patch wasn't quite right as Colomban noticed, the updated version should fix the performance issue too.

@codebrainz
Copy link
Member

I guess you missed the announcement of the wiki page about profiling Geany

Yep, I missed that. Will give it a read, thanks.

@codebrainz
Copy link
Member

Closing as the issue seems to be resolved according to the comments. Feel free to reopen if it is still an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants