Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip trying to substitute macros into lines that do not contain them #296

Merged

Conversation

albertziegenhagel
Copy link
Contributor

@albertziegenhagel albertziegenhagel commented Jun 22, 2023

While investigating #295 I used cProfile on the code and noticed that most of the parsing time is spend in regex.subn in fortls/parse_fortran.py#L2241.

In that code part, we try to substitute all macros into all lines by calling subn on the compiled regex, which results in an unmodified line if the line did not contain the macro at all. This call to subn still seems to be quite expensive (runtime-wise).

To spare the cost, I added a check whether the line actually contains the macro text and skip the substitution check otherwise. Please note that a line containing the macro text does not necessarily mean that a substitution will take place, since the simple containment test is less strict than the regex (which checks that the macro is an actual token separated from the surrounding context, as in FOO being in FOO_BAR but it should not be substituted).

With these changes I was able to reduce the parsing time of the example described in #295 from ~9.8 seconds to ~1.4 seconds, which comes far closer to the time it takes without specifying any pre-processor definitions (~1 second).

In a real world example I could reduce the time to parse a single file from ~107 seconds to ~7seconds.

The parsing time of our whole code-base was reduced from more than 1.5 hours to less than 7 minutes.

NOTE: If I exchange the simple containment test via

if not def_tmp in line: continue

by checking the actual regex via

if not def_regex.match(line): continue

a few lines further down, this still reduces the time to parse the real world file from ~107 seconds to ~26 seconds.

@codecov
Copy link

codecov bot commented Jun 22, 2023

Codecov Report

Merging #296 (325b821) into master (1f54125) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #296   +/-   ##
=======================================
  Coverage   86.95%   86.96%           
=======================================
  Files          12       12           
  Lines        4569     4571    +2     
=======================================
+ Hits         3973     3975    +2     
  Misses        596      596           
Impacted Files Coverage Δ
fortls/parse_fortran.py 89.12% <100.00%> (+0.01%) ⬆️

Currently, we try to substitute all macros into all lines by calling `subn` on the compiled regex, which results in an unmodified line if the line did not contain the macro at all. This call to `subn` is still quite expensive (runtime-wise).

To spare the cost, we now first check whether the line actually contains the macro text and skip the substitution check otherwise.
Please note that a line containing the macro text does not necessarily mean that a substitution will take place, since the simple containment is less strict than the regex (which checks that the macro is an actual token separated from the sounding context, as in `FOO` being in `FOO_BAR` but it should not be substituted).
@gnikit
Copy link
Member

gnikit commented Jun 22, 2023

This is a good catch. The preprocessor is a complete mess IMO. Some of the future plans were/are to completely remove the preprocessor and instead replace it with an off the shelf C/C++ preprocessor. Unfortunately, I haven't found a preprocessor that fits our needs

Copy link
Member

@gnikit gnikit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gnikit gnikit linked an issue Jun 22, 2023 that may be closed by this pull request
@gnikit gnikit merged commit a3a0f07 into fortran-lang:master Jun 22, 2023
20 checks passed
@albertziegenhagel albertziegenhagel deleted the improve-pp-def-performance branch June 22, 2023 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pre-processor definitions slow down the parser significantly
2 participants