Skip to content

Conversation

@jsoriano
Copy link
Member

@jsoriano jsoriano commented Mar 18, 2022

While trying some changes in pipeline tests I found that testing some packages was taking much more time than I would have expected.

I have enabled profiling and I have found that most of the time, elastic-package is compiling regular expressions.

Refactoring the code to don't use regular expressions drastically improves the execution times. For example in the case of the panw module it goes in my laptop from more than 3 minutes to less than 10 seconds.

@jsoriano jsoriano requested a review from a team March 18, 2022 20:48
@jsoriano jsoriano self-assigned this Mar 18, 2022
@elasticmachine
Copy link
Collaborator

elasticmachine commented Mar 18, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-03-22T18:32:13.689+0000

  • Duration: 20 min 14 sec

Test stats 🧪

Test Results
Failed 0
Passed 602
Skipped 1
Total 603

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

Copy link
Contributor

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit that I didn't expect it to happen. Good catch!

@jsoriano
Copy link
Member Author

I have refactored the code to don't need regular expressions, I see similar improvements in time than using a regular expressions cache.

@jsoriano jsoriano marked this pull request as ready for review March 21, 2022 16:35
@jsoriano jsoriano requested a review from mtojek March 21, 2022 16:38
@jsoriano
Copy link
Member Author

jsoriano commented Mar 21, 2022

I found profiling that strings.Split was also expensive (though nothing close to regexp compilation), I have further refactored to do char by char comparison, this reduced the testing time of the panw module to less than 6 seconds (~3 mins original code, ~10 seconds with regexp cache or with first iteration without regexps, ~6 seconds with second iteration).

Shared profile with current code:
cpu.profile.gz

Copy link
Contributor

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice optimization, although the code readability might be impaired a bit. Do you think that you can get rid of "regexp" dependency here at all?

Before merging this PR, I suggest releasing a new minor version, so we can easily roll back if this PR becomes problematic.

func compareKeys(key string, def FieldDefinition, searchedKey string) bool {
k := strings.ReplaceAll(key, ".", "\\.")
k = strings.ReplaceAll(k, "*", "[^.]+")
// Loop over every byte in `key` to find if there is a matching byte in `searchedKey`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think that we need some unit tests as it looks like a custom logic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that we already had good coverage for this, but adding a unit test has helped me found some wrong cases with wildcards. Good idea, thanks 👍

@jsoriano
Copy link
Member Author

Nice optimization, although the code readability might be impaired a bit.

Well, this is relative, others may also find counterintuitive to build regular expressions on the fly 🙂 In any case the code is quite localized inside a single function with well-defined expectations. I have added comments in case it helps.

Do you think that you can get rid of "regexp" dependency here at all?

The other case where we use regexp is to validate patterns defined in the field definition, we need regexp compilation there. We could consider caching these regular expressions if they are a problem, but I think they are used only once most of the times, I haven't found these compilations while profiling.
We may consider optimizing this in a future PR if we find this is problematic.

Before merging this PR, I suggest releasing a new minor version, so we can easily roll back if this PR becomes problematic.

👍

@jsoriano
Copy link
Member Author

/test

@jsoriano jsoriano requested a review from mtojek March 22, 2022 11:59
@jsoriano
Copy link
Member Author

/test

1 similar comment
@jsoriano
Copy link
Member Author

/test

@jsoriano jsoriano merged commit 61362f7 into elastic:main Mar 22, 2022
@jsoriano jsoriano deleted the optimize-fields-validation branch March 22, 2022 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants