Skip to content

Poor Performance of Decorator extraction when file does not contain Tags #466

@jlittle-ptc

Description

@jlittle-ptc

Describe the bug
When running GCToolkit against a larger G1GC log file it appears that the regex used to parse out the decorator tags is causing a performance hit.

Using the sample with a 58MB log file, the timing of the maven run returns:

[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  01:06 min

After running some profiling, the poorly performing lines in question are (com.microsoft.gctoolkit.parser.jvm.Decorators):

        Matcher tagMatcher = UnifiedLoggingTokens.TAGS.matcher(line);
        if (tagMatcher.find()) {
            numberOfDecorators++;
            tags = String.join(",", Arrays.asList(tagMatcher.group(1).trim().split(",")));
        }

If these lines are commented out and the sample is re-run against the same file, the result is significantly (10x) better:

[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  7.869 s

To Reproduce
Steps to reproduce the behavior:

  • Using sample application, run it against attached file.

largegc.zip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions