Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Massive CPU Inefficiency (54x longer) introduced with the 2.15.5 release (vale_2.15.5_macOS_64-bit.tar.gz) #567

Closed
universemaster opened this issue Feb 13, 2023 · 9 comments

Comments

@universemaster
Copy link

universemaster commented Feb 13, 2023

The 2.15.5 release takes 54x times longer than 2.15.4.

I am on MacOS 10.15.7 (Catalina) with 2.5 GHz Quad-Core Intel Core i7.

I am using the releases from the release page of github:

The command is of the form:

time /path/to/vale /path/to/markdown/measurements.md --config /path/to/.vale.ini

With vale_2.15.5_macOS_64-bit.tar.gz:

346.57s user 3.23s system 408% cpu 1:25.56 total

With vale_2.15.4_macOS_64-bit.tar.gz:

6.37s user 0.16s system 174% cpu 3.747 total


And with the latest vale_2.23.0_macOS_64-bit.tar.gz the problem is still present:

389.82s user 3.13s system 408% cpu 1:36.23 total

And building 2.23.0 from source on my machine using brew install vale, I get a similar result.


Note that these observations do not void my observations in errata-ai/vale-vscode#113 because I was actually using an older version that didn't have the problem described here. So, these "noisy fan" issues still remain a problem.

@jdkato
Copy link
Member

jdkato commented Feb 14, 2023

I can't reproduce this.

What does your .vale.ini look like?

@universemaster
Copy link
Author

universemaster commented Feb 14, 2023

Here's my .vale.ini I have no skipped scopes, no ignored scopes, no TokenIgnores.

Many of my rules do use nonword: true and scope: raw.
I do sometimes use lookarounds.
I do have an accept.txt and reject.txt.

I have rerun my tests today and double checked vale --version and my results still hold.

This issue seems similar.

@DominikaK
Copy link

I can confirm that the same issue affects Windows 11 (Intel i7-8665U @1.90 GHz).

Running Vale over the same folder with Measure-Command { vale <folder-name> } takes respectively (other versions tested for reference):

vale version 2.22.0

Seconds           : 26
Milliseconds      : 452

vale version 2.15.5

Seconds           : 18
Milliseconds      : 706

vale version 2.15.4

Seconds           : 1
Milliseconds      : 11

vale version 2.10.2

Seconds           : 0
Milliseconds      : 239

This makes it practically impossible to use Vale over whole documentation sets, or with external extensions such as for VSCode (ref. errata-ai/vale-vscode#100)

@universemaster
Copy link
Author

universemaster commented Feb 14, 2023

Thanks @DominikaK, that clearly shows the massive slow-down was at 2.15.5.
However, it's interesting that that there was also a slowdown between 2.10.2 and 2.15.4, with the later taking 1011/239 = 4.2x times as long.

In my test between 2.10.2 and 2.15.4 on MacOS I also see a speed difference, but not quite as large:
2.15.4: 7.86s user 0.17s system 182% cpu 4.411 total
2.10.2: 3.68s user 0.09s system 105% cpu 3.577 total

@jdkato
Copy link
Member

jdkato commented Feb 14, 2023

Here's what I get on a set of 2802 Markdown files using just the Vale style:

Benchmark 1: vale2.23.0 --no-exit docs
  Time (mean ± σ):     11.295 s ±  0.554 s    [User: 45.897 s, System: 2.428 s]
  Range (min … max):   10.806 s … 12.264 s    10 runs

---

Benchmark 1: vale2.15.4 --no-exit docs
  Time (mean ± σ):     13.810 s ±  0.504 s    [User: 66.606 s, System: 1.895 s]
  Range (min … max):   13.457 s … 15.045 s    10 runs

If I switch to the Google style, I get:

Benchmark 1: vale2.23.0 --no-exit docs
  Time (mean ± σ):     25.788 s ±  1.048 s    [User: 109.830 s, System: 3.202 s]
  Range (min … max):   24.213 s … 27.871 s    10 runs

---

Benchmark 1: vale2.15.4 --no-exit docs
  Time (mean ± σ):     27.285 s ±  0.526 s    [User: 125.640 s, System: 2.867 s]
  Range (min … max):   26.600 s … 28.099 s    10 runs

So, some questions:

  1. Do you observe the slow down with only the Vale style enabled (e.g., BasedOnStyles = Vale)? If not, is there a specific rule that causes it?
  2. What file format(s) (Markdown, AsciiDoc, etc.) are you using?
  3. How large are your files / folders?
  4. Are the linting results the same between versions?

@universemaster
Copy link
Author

Some quick answers, I'll edit post this with full details when I have more time:

  1. TODO, but I have over 200 files for my own rules, so it might take a long time to find particular rules causing problems.
  2. Markdown(-ish) with .md extension but with math dollars, and definition lists, and a few custom non-standard things. Many of my own rules are scope: raw, and I do use lookarounds. My math arrays do use \\ (I say this because I notices that splitting by those characters was added in 2.15.5 (but I think only for a non-markdown type?)
  3. Most in the range of 300-1000 lines.
  4. TODO: But I haven't noticed a difference.

@universemaster
Copy link
Author

universemaster commented Feb 15, 2023

I've found the issue. Bad CPU performance occurs we have scope: raw and an accidental nonexistent token, with only hyphen, like this:

extends: existence
message: 'Start a new sentence rather than %s'
link:
ignorecase: true
nonword: true
level: error
scope: raw
tokens:
  - 'and (therefore|then|so|hence|thus|since)'
  - 'also suppose that'
  - 

If I remove scope: raw: CPU performance is not bad.
If I remove the accidental empty token: CPU performance is not bad.

@DominikaK Do you have a similar issue with scope:raw and an empty token?

Clearly, something introduced in 2.15.5 changes this behaviour?

I'm not sure removing this empty token (or doing a code fix for this) will fix the fan noise and briefly spiking CPU that exists in all versions even before-2.15.5 that I reported here, but I'll report back.

@DominikaK
Copy link

@universemaster, I confirm, I notice the same issue.

Seconds           : 19
Milliseconds      : 931

After disabling a single custom rule that has the raw scope with empty token:

Seconds           : 0
Milliseconds      : 580

@jdkato
Copy link
Member

jdkato commented Feb 15, 2023

Empty tokens will be ignored in the next release. However, you should also try to use scope: raw only in cases where you need to match markup syntax. A rule like

extends: existence
message: 'Start a new sentence rather than %s'
link:
ignorecase: true
nonword: true
level: error
scope: raw
tokens:
  - 'and (therefore|then|so|hence|thus|since)'
  - 'also suppose that'

has no need for it and will only be slower if you use it.

@jdkato jdkato closed this as completed Feb 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants