Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

org-ref-get-labels taking very long time #647

Closed
XushanLu opened this issue Apr 10, 2019 · 24 comments
Closed

org-ref-get-labels taking very long time #647

XushanLu opened this issue Apr 10, 2019 · 24 comments

Comments

@XushanLu
Copy link

Through profiling emacs, I found that org-ref-get-labels function is taking a lot of the cpu time and memory. Emacs is almost unresponsive when the org file is large.

I am running Spacemacs develop brunch version 0.300.0, and org-ref version is 20190318-1558 and the emacs version is 25.2.2. Is there anything that I can do to improve the speed?

Here is the screenshot of the profiling of cpu time:

  • redisplay_internal (C function) 13608 89%
  • jit-lock-function 13520 88%
  • jit-lock-fontify-now 13520 88%
  • jit-lock--run-functions 13520 88%
    - run-hook-wrapped 13520 88%
    • #<compiled 0x4373dd> 13520 88%
    • font-lock-fontify-region 13496 88%
    • font-lock-default-fontify-region 13496 88%
    • font-lock-fontify-keywords-region 13496 88%
      - org-activate-links 13492 88%
      • org-ref-ref-face-fn 13492 88%
      • org-ref-get-labels 13492 88%
      • org-element-parse-buffer 11776 77%
      • save-excursion 11776 77%
        - org-element--parse-elements 11776 77%
        • save-excursion 11776 77%
        • let 11776 77%
        • while 11776 77%
        • let* 11772 76%
      • org-ref-get-tblnames 1180 7%
      • org-ref-get-custom-ids 492 3%
        org-ref-get-latex-labels 16 0%
        org-ref-get-names 4 0%
        org-ref-get-org-labels 4 0%
        + org-do-latex-and-related 4 0%
    • turn-on-pangu-spacing 24 0%
  • eval 84 0%
  • #<compiled 0x43b19d> 4 0%
  • command-execute 598 3%
  • flyspell-post-command-hook 461 3%
  • evil-normal-post-command 391 2%
  • ... 223 1%
  • sp--save-pre-command-state 8 0%

Here is the screenshot for profiling the memory:

  • command-execute 733,515,189 95%
  • call-interactively 733,510,965 95%
  • funcall-interactively 732,989,943 95%
  • evil-previous-visual-line 655,529,820 85%
    - evil-line-move 655,529,820 85%
    • previous-line 655,529,820 85%
    • line-move 655,529,820 85%
    • line-move-visual 655,519,340 85%
    • posn-at-point 655,298,807 85%
      - jit-lock-function 655,264,803 85%
      • jit-lock-fontify-now 655,263,747 85%
      • jit-lock--run-functions 655,263,747 85%
      • run-hook-wrapped 655,263,747 85%
      • #<compiled 0x1402081> 655,263,747 85%
        - font-lock-fontify-region 655,257,523 85%
        • font-lock-default-fontify-region 655,257,523 85%
        • font-lock-fontify-keywords-region 655,257,523 85%
        • org-activate-links 655,207,351 85%
        • org-ref-ref-face-fn 654,521,457 85%
          - org-ref-get-labels 654,521,457 85%
          • org-ref-get-tblnames 654,043,797 85%
          • org-element-parse-buffer 654,043,797 85%
          • save-excursion 654,043,797 85%
          • org-element--parse-elements 654,043,797 85%
            - save-excursion 654,042,741 85%
            • let 654,041,685 85%
          • org-ref-get-custom-ids 461,280 0%
          • org-ref-get-latex-labels 10,236 0%
            org-ref-get-org-labels 2,048 0%
            org-ref-get-names 2,048 0%
        • org-ref-cite-link-face-fn 602,682 0%
        • org-ref-label-face-fn 49,788 0%
        • org-element-link-parser 13,344 0%
        • org-remove-flyspell-overlays-in 400 0%
          org-activate-dates 9,212 0%
        • org-do-latex-and-related 6,144 0%
        • org-do-emphasis-faces 5,120 0%
          org-fontify-drawers 1,024 0%
          org-activate-tags 1,024 0%
        • org-activate-footnote-links 1,024 0%
          org-fontify-macros 1,024 0%
          org-font-lock-add-priority-faces 1,024 0%
          org-activate-code 1,024 0%
        • org-fontify-meta-lines-and-blocks 1,024 0%
        • #<lambda 0x2147e7b35> 1,024 0%
          + turn-on-pangu-spacing 6,224 0%
          - eval 34,004 0%
      • spaceline-ml-main 34,004 0%
  • counsel-M-x 73,100,942 9%
  • evil-next-visual-line 2,632,458 0%
  • evil-normal-state 1,387,460 0%
  • evil-ex 258,725 0%
  • org-self-insert-command 36,428 0%
  • evil-append 21,504 0%
  • undo-tree-undo 10,692 0%
  • evil-org-delete 9,866 0%
  • evil-force-normal-state 2,048 0%
  • byte-code 384,992 0%
  • list 136,030 0%
  • flyspell-post-command-hook 18,683,461 2%
  • redisplay_internal (C function) 13,190,392 1%
  • evil-normal-post-command 3,136,232 0%
  • indent-guide-pre-command-hook 63,088 0%
  • timer-event-handler 46,740 0%
  • evil-escape-pre-command-hook 38,912 0%
  • winner-save-old-configurations 22,176 0%
  • ... 20,632 0%
  • jit-lock-function 13,440 0%
  • evil-repeat-post-hook 8,188 0%
  • company-post-command 7,392 0%
  • sp--save-pre-command-state 6,336 0%

Thanks!

@XushanLu XushanLu changed the title org-ref-ref-get-labels taking very long time org-ref-get-labels taking very long time Apr 10, 2019
@jagrg
Copy link
Contributor

jagrg commented Apr 10, 2019 via email

@jkitchin
Copy link
Owner

That is a good idea. I am sure This was done early in org-ref development and done in a way to test each kind of label. I think there are benefits to the parsing, but if used it should probably only be done once, and maybe there could be a flag that either switches for large buffers, or is a user preference. I won't have time to work on this until probably late in May, so if anyone wants to tackle it and pull request the change I would be happy to take a look at it.

@XushanLu
Copy link
Author

Thanks for all the comments. Unfortunately, I do not know much about Emacs Lisp. I have found a workaround for now which can mitigate the problem. As the function is called in jit-lock-function, by disabling font-lock-mode in the large org-mode buffer and only fontify the buffer using font-lock-fontify-block to fontify the current block can significantly reduce the cpu time and memory use.

Hopefully, in the future, this problem can be solved by modifying the org-ref-get-labels function in org-ref.

@matthuszagh
Copy link

To add to the above discussion, when I use org-ref with poly-org I experience a several second lag after each keypress (see this issue). Using either mode by itself seems to work fine. I believe the issue on org-ref's side is related to the one here: the function proposed by @jagrg helps, but editing is still to slow to be usable.

@jkitchin
Copy link
Owner

If you add (setq org-ref-colorize-links nil) to your init file, does this issue go away?

You might be interested in trying https://github.com/jkitchin/org-ref/tree/cache-parse. I am trying a version where I cache the parse-data so it gets re-used. it might still be slow for heavy editing, but I think it is a step in the right direction.

@matthuszagh
Copy link

Strangely, org-ref-colorize-links doesn't work for me: I still get org-ref link colors after setting it to nil. The cache-parse branch is a solid improvement. It's still too slow to use in a large file for me, but definitely a step in the right direction as you say.

@vspinu
Copy link

vspinu commented Jun 1, 2019

So the font-lock triggers the re-parsing of the entire file, maybe multiple times, right?

I have zero familiarity with the org/org-ref internals, but could the org-element-parse-buffer be triggered only on the modified part (with some sane pre-extension) in post-command-hook instead?

@jkitchin
Copy link
Owner

jkitchin commented Jun 1, 2019

@matthuszagh you get colored links even after restarting emacs?

@jkitchin
Copy link
Owner

jkitchin commented Jun 1, 2019

Another option could be try (setq org-ref-show-broken-links nil).

@vspinu the buffer does get parsed frequently. This is the most accurate way to ensure you get all the relevant information. I have been thinking of some ways to do it incrementally, but they all sacrifice some accuracy. For example, you could incrementally add labels to a variable, but there isn't a simple way to remove them from that list if you delete the label. Eventually, the variable will need to be remade by parsing the buffer again.

It looks possible to narrow the scope of org-element-parse-buffer with narrowing, so that might be a path towards incremental updates, if I can figure out how to narrow to the visible region. I might work on an approach like this next.

@matthuszagh
Copy link

@jkitchin I do.

I've tried with:

(require 'org-ref)
(setq org-ref-colorize-links nil)

and with setting org-ref-colorize-links before requiring org-ref.

Here's the version info:
org-ref: Version 1.1.1 (git-commit 6b2e0441ddb75b562f15565dc9bd1d7a9a67864e)
GNU Emacs 26.2.50 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.8) of 2019-05-05

I've tested this with only straight.el and org-ref (plus dependencies if it has them), so there shouldn't be any conflicts with other packages.

@vspinu
Copy link

vspinu commented Jun 2, 2019

For example, you could incrementally add labels to a variable, but there isn't a simple way to remove them from that list if you delete the label.

I think adding a before-change-hook to scan the deleted labels might work well here.

It looks possible to narrow the scope of org-element-parse-buffer with narrowing, so that might be a path towards incremental updates, if I can figure out how to narrow to the visible region. I might work on an approach like this next.

What is the purpose to narrow to the "visible region"? If it's only for the sake of font-lock then the font-lock/jit-lock machinery already has all pieces in place. You can add a function to jit-lock-functions (preferably before all others), and update local cache (add text-properties) only on the region which are being fontified. Another option which is commonly used for such purpose is syntax-propertize-function but it's generally reserved for major-modes, not add-on tools like org-ref.

In fact if org-ref-ref-face-fn is only used for fontifying keywords then I believe you can safely parse the buffer only between (point) and limit argument which each font-lock keyword function receives.

@jkitchin
Copy link
Owner

jkitchin commented Jun 2, 2019

The point of narrowing is to limit what org-element-parse-buffer does. By default it scans the whole buffer, unless narrowing is in place. this function gets called during fontification I think. You may be correct about what needs to be parsed, but you still have to use narrowing to limit the scope of org-element-parse-buffer.

org-element-parse-buffer provides the most accurate, up-to-date and easy to use source of data for this, but it is the slowest.

I think you can avoid that if you (setq org-ref-show-broken-links nil) though. Can you check that?

The slowness comes about because org-ref is checking the validity of a ref link, and doing that requires making sure there is a label it refers to. That is currently done with a full parse of the buffer, eventually inside of org-ref-get-labels. That function (in cache-parse) makes 6 different checks of potential labels, using a combination of regexp searches, and parse-buffer searches. Not all of these are easily updated using font-lock, because some of them are defined outside of org-ref.

it would probably further help performance to replace the parse-buffer code with regexp searches. The parse-buffer code is simpler and works, and regexp code is harder to write and debug I think, but it might be worth exploring. It feels like it would be a lot of work to make this work in an incrementally updating way that also used a before-change-hook to remove deleted labels. I would like to see performance be better for large documents though, it even has an effect on me when I write them!

@vspinu
Copy link

vspinu commented Jun 3, 2019

The point of narrowing is to limit what org-element-parse-buffer does.

I actually was wondering about "visible region" not the narrowing. From what you said it looked that only the "visible region" is of importance for the user experience.

Not all of these are easily updated using font-lock, because some of them are defined outside of org-ref.
it would probably further help performance to replace the parse-buffer code with regexp searches

What I was proposing is that you still use org-element-parse-buffer but only on the region which font-lock fontifies, or better after change hook and parse only modified regions. You know for sure that outside of the modified regions labels are the same, right? So you would need a cache with all the labels and their positions (which I assume you already have). On modification, purge all the labels from the modified region from the cache and re-parse the region with org-element-parse-buffer. Looks simple to me.

@jkitchin
Copy link
Owner

jkitchin commented Jun 3, 2019

LOL. It isn't that simple. Many times the region defined in the before-change-hook is either empty (an insertion), or a single character (a delete), and otherwise a set of characters that may or may not include a full label. org-element-parse-buffer cannot deal with any of these scenarios because they don't contain full labels. Depending on the situation you have to look back and forward to see if the change is on a changing label. You cant rely on storing the positions, because they change as you edit the document, eg after the change, all labels after it have new positions.

Another issue I have run into with this overall approach is that when you first open a document, it is not fully fontified, and if there is some folding of headlines, some labels are missed.

I have a mostly working draft of this in https://github.com/jkitchin/org-ref/tree/font-lock-labels. I haven't solved the issue of how to fontify everything when the buffer opens yet, but it otherwise seems to be working in test buffers. I would be grateful to know how it performs on your large buffers.

@vspinu
Copy link

vspinu commented Jun 4, 2019

LOL.

Ok. let's think aloud.

org-element-parse-buffer cannot deal with any of these scenarios because they don't contain full labels.

Well, you have to extend the region of course. Most of the tooling relying on after-change does that (font-lock, syntax) and I bet you can just re-use font-lock-extend-region-functions for what you need.

Many times the region defined in the before-change-hook is either empty

So what? All you have to know is the range of the deleted region. Before-change-functions are used to clean up the region from the cache. The actual updating of the cache is done in the after-change-functions.

In fact you can use only after-change-functions because you have the 3rd argument that gives you the length of the deleted region.

You cant rely on storing the positions, because they change as you edit the document, eg after the change, all labels after it have new positions.

Either store markers or (better) update the entries in the cash by shifting the positions by the length of the inserted text.

Another issue I have run into with this overall approach is that when you first open a document, it is not fully fontified,

You would need to have an original pass if that's what is required.

I have a mostly working draft of this in https://github.com/jkitchin/org-ref/tree/font-lock-labels.

AFAICS the change caches the results between the modifications. I bet it's a good improvement but it doesn't solve the core issues that the entire buffer is parsed after each character insertion.

Hopefully @matthuszagh could give it a try on his document. I am not an org-ref user. I just want it to be working with poly-org.

BTW, just a hunch. Do you add text properties outside of font-lock? Those should be protected with with-silent-modification in order to avoid triggering after-change-functions. This might be a cause of the bad interaction between polymode and org-ref.

@jkitchin
Copy link
Owner

jkitchin commented Jun 4, 2019

@vspinu have you tried (setq org-ref-show-broken-links nil) yet? This should avoid the font-lock issue I think.

@jkitchin
Copy link
Owner

jkitchin commented Jun 4, 2019

@vspinu would it be possible for you to send me a poly-org setup and example file that you use that is a problem?

@matthuszagh
Copy link

@jkitchin (setq org-ref-show-broken-links nil) does indeed fix the issue for me. There does still appear to be a very minor lag, but at least for the size file I'm currently editing is not an issue.

You can use the same test file I provided to @vspinu, example.zip. Here's my setup for polymode and org-ref.

(use-package org-ref
  :after (org)
  :config
  (setq org-ref-show-broken-links nil)
  (setq org-ref-colorize-links nil))

(use-package polymode)

(use-package poly-org
  :after (polymode org)
  :config
  (add-to-list 'auto-mode-alist '("\\.org" . poly-org-mode)))

I'm still not seeing any effect from setting org-ref-colorize-links to nil.

@jkitchin
Copy link
Owner

jkitchin commented Jun 4, 2019

👍 I got this error trying your setup and example file:
File mode specification error: (void-variable poly-org-hostmode)

Is there some detail missing maybe? update: my poly-mode was out of date.

(setq org-ref-show-broken-links nil) should bypass the call to org-ref-get-labels which is the main issue in the master branch as it does use org-element-parse-buffer which is slow on large files. In the font-lock-labels branch this should be completely avoided, with only incremental updates based on changes in the buffer, so it should be quite a bit faster. There is probably still some smarter way to do this, but it probably warrants a new profiling exercise to find out what the residual lag is.

org-ref-colorize-links was a bad suggestion earlier, I meant the org-ref-show-broken-links instead. It looks like org-ref-colorize-links is only active when org-mode v8 is active. I may look into a way to use it with org-9, which would drop out all link-related fontification. that should make it performant, although, at some loss of functionality.

@jkitchin
Copy link
Owner

jkitchin commented Jun 4, 2019

@matthuszagh if you can do some profiling similar to the one at the beginning of this thread it might help identify where the performance lag is.

@jkitchin
Copy link
Owner

jkitchin commented Jun 4, 2019

LOL.

Ok. let's think aloud.

org-element-parse-buffer cannot deal with any of these scenarios because they don't contain full labels.

Well, you have to extend the region of course. Most of the tooling relying on after-change does that (font-lock, syntax) and I bet you can just re-use font-lock-extend-region-functions for what you need.

This is what the font-lock-labels branch does. Maybe there is a smarter way to do this with font-lock-extend-region-functions too, but I think it does the right thing now.

Many times the region defined in the before-change-hook is either empty

So what? All you have to know is the range of the deleted region. Before-change-functions are used to clean up the region from the cache. The actual updating of the cache is done in the after-change-functions.

In fact you can use only after-change-functions because you have the 3rd argument that gives you the length of the deleted region.

You cant rely on storing the positions, because they change as you edit the document, eg after the change, all labels after it have new positions.

Either store markers or (better) update the entries in the cash by shifting the positions by the length of the inserted text.

thanks for this suggestion. I might try something like this, as it would allow a sorted list of labels to be returned, e.g. in the order they appear in the buffer.

Another issue I have run into with this overall approach is that when you first open a document, it is not fully fontified,

You would need to have an original pass if that's what is required.

this is also done in the branch.

I have a mostly working draft of this in https://github.com/jkitchin/org-ref/tree/font-lock-labels.

AFAICS the change caches the results between the modifications. I bet it's a good improvement but it doesn't solve the core issues that the entire buffer is parsed after each character insertion.

This isn't the case, all changes are incremental after the initial buffer parse as far as I know.

Hopefully @matthuszagh could give it a try on his document. I am not an org-ref user. I just want it to be working with poly-org.

BTW, just a hunch. Do you add text properties outside of font-lock? Those should be protected with with-silent-modification in order to avoid triggering after-change-functions. This might be a cause of the bad interaction between polymode and org-ref.

Thanks for this suggestion too, this branch was doing that for now, but I have wrapped it in that macro to protect it.

I am reasonable sure this branch should majorly improve performance of org-ref-get-labels, but there might still be other places in org-ref that are problematic on big files.

@jkitchin
Copy link
Owner

jkitchin commented Jun 5, 2019

I have run some timing experiments. On the current master branch, it takes 13-14 seconds to open the example/info.org file!, and editing is irritatingly slow with second delays between keypresses as reported. But, if you set org-ref-show-broken-links to nil, then the file opens in about half a second, and editing is snappy, but you don't see if ref links are bad in the buffer.

with the cache-parse branch, interestingly it only takes about 0.25 seconds to open the file, and editing is much faster, but still with a shorter, but still quite noticeable lag editing.

with the font-lock-labels branch, it takes 0.7-1 second to open the file, but after that editing feels very snappy with no noticeable lags, even with org-ref-show-broken-links set t! This is a big improvement! Thanks @vspinu for the hints that got it there, @matthuszagh for the test file, and @CHDtdem for the original report and identification of the issue!

I plan to test this locally for a few more days, but it looks like I will be merging the font-lock-labels branch into master after that if no issues are reported.

@vspinu
Copy link

vspinu commented Jun 5, 2019

Amazing job! Thanks!

@jkitchin
Copy link
Owner

jkitchin commented Jun 7, 2019

I have merged this into master. I am going to close it. If there are remaining problems with this, feel free to reopen it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants