Skip trying to substitute macros into lines that do not contain them #296

albertziegenhagel · 2023-06-22T08:49:49Z

While investigating #295 I used cProfile on the code and noticed that most of the parsing time is spend in regex.subn in fortls/parse_fortran.py#L2241.

In that code part, we try to substitute all macros into all lines by calling subn on the compiled regex, which results in an unmodified line if the line did not contain the macro at all. This call to subn still seems to be quite expensive (runtime-wise).

To spare the cost, I added a check whether the line actually contains the macro text and skip the substitution check otherwise. Please note that a line containing the macro text does not necessarily mean that a substitution will take place, since the simple containment test is less strict than the regex (which checks that the macro is an actual token separated from the surrounding context, as in FOO being in FOO_BAR but it should not be substituted).

With these changes I was able to reduce the parsing time of the example described in #295 from ~9.8 seconds to ~1.4 seconds, which comes far closer to the time it takes without specifying any pre-processor definitions (~1 second).

In a real world example I could reduce the time to parse a single file from ~107 seconds to ~7seconds.

The parsing time of our whole code-base was reduced from more than 1.5 hours to less than 7 minutes.

NOTE: If I exchange the simple containment test via

if not def_tmp in line: continue

by checking the actual regex via

if not def_regex.match(line): continue

a few lines further down, this still reduces the time to parse the real world file from ~107 seconds to ~26 seconds.

codecov · 2023-06-22T08:53:45Z

Codecov Report

Merging #296 (325b821) into master (1f54125) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #296   +/-   ##
=======================================
  Coverage   86.95%   86.96%           
=======================================
  Files          12       12           
  Lines        4569     4571    +2     
=======================================
+ Hits         3973     3975    +2     
  Misses        596      596

Impacted Files	Coverage Δ
fortls/parse_fortran.py	`89.12% <100.00%> (+0.01%)`	⬆️

Currently, we try to substitute all macros into all lines by calling `subn` on the compiled regex, which results in an unmodified line if the line did not contain the macro at all. This call to `subn` is still quite expensive (runtime-wise). To spare the cost, we now first check whether the line actually contains the macro text and skip the substitution check otherwise. Please note that a line containing the macro text does not necessarily mean that a substitution will take place, since the simple containment is less strict than the regex (which checks that the macro is an actual token separated from the sounding context, as in `FOO` being in `FOO_BAR` but it should not be substituted).

gnikit · 2023-06-22T12:27:31Z

This is a good catch. The preprocessor is a complete mess IMO. Some of the future plans were/are to completely remove the preprocessor and instead replace it with an off the shelf C/C++ preprocessor. Unfortunately, I haven't found a preprocessor that fits our needs

gnikit

LGTM

albertziegenhagel requested a review from gnikit as a code owner June 22, 2023 08:49

albertziegenhagel force-pushed the improve-pp-def-performance branch from dd7649c to 325b821 Compare June 22, 2023 09:07

gnikit approved these changes Jun 22, 2023

View reviewed changes

gnikit linked an issue Jun 22, 2023 that may be closed by this pull request

Pre-processor definitions slow down the parser significantly #295

Closed

gnikit merged commit a3a0f07 into fortran-lang:master Jun 22, 2023
20 checks passed

albertziegenhagel deleted the improve-pp-def-performance branch June 22, 2023 12:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip trying to substitute macros into lines that do not contain them #296

Skip trying to substitute macros into lines that do not contain them #296

albertziegenhagel commented Jun 22, 2023 •

edited

Loading

codecov bot commented Jun 22, 2023 •

edited

Loading

gnikit commented Jun 22, 2023

gnikit left a comment

Skip trying to substitute macros into lines that do not contain them #296

Skip trying to substitute macros into lines that do not contain them #296

Conversation

albertziegenhagel commented Jun 22, 2023 • edited Loading

codecov bot commented Jun 22, 2023 • edited Loading

Codecov Report

gnikit commented Jun 22, 2023

gnikit left a comment

Choose a reason for hiding this comment

albertziegenhagel commented Jun 22, 2023 •

edited

Loading

codecov bot commented Jun 22, 2023 •

edited

Loading