-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Make regexes fall back to binary matching on incompatible runtime version #9040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make regexes fall back to binary matching on incompatible runtime version #9040
Conversation
…sion. Rgexes are precompiled to binary values during code compilation. These binary values may be incompatible between different PCRE versions and OS endianness. This makes code less portable to achieve slight performance improvement. PCRE version and endianness are parts of Regex structure and may be checked in runtime. This PR adds this check and makes regex execution fall back to binary matching if versions are incompatible. This slightly reduces performance (around 5% for simple regexes) for compatible versions because there is an additional version read. Also for incompatible versions it's as fast as binary matching.
This looks great! ❤️ Some notes:
Agreed, I would still keep the warning (we can have a flag to disable it) and note about performance.
Yeah, we can change it to say something like "Recompile is useful if you know you are running on a different system that the current one AND you are doing multiple matches with the regex" Btw, we should also do a pass on the Elixir codebase and remove all instances of |
The warning will be printed every time elixir starts, which may be unneccessary for prebuilt releases or escripts.
lib/elixir/src/elixir.erl
Outdated
"This may impact regex matching performance. " | ||
"You may want to recompile elixir and and source files in a machine with " | ||
"the same endianness as the current one: ~ts~n", | ||
[CompiledEndianness, Endianness]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer for the warning to be kept the same, given we provide a way for you to disable it. If you are running Elixir in production and you see this warning, then you want to act on it (and it should be scary). It may have other implications besides only regexes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can revert this.
I had an impression that regexes were the only implication. Do you know any examples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know any but I don't want to pass the impression there isn't any other, if that makes any sense. :)
1ff61fe
to
21da203
Compare
Another question is how we should pass the option to disable the warning? |
@hairyhum I thought that for escripts we load the configuration options before we start Elixir, no? If not, I would change them to guarantee that's the case. |
I thied to create a Mix project like this: https://gist.github.com/hairyhum/45e1a4c80495a86ab3ca9bda6c8e11b9 And added loging to
What I see is:
|
@hairyhum please move this line to outside of the |
Config should be available on elixir application start to read config environment variables like `check_endianness`
Moved config loading before app start. Also call |
lib/elixir/lib/regex.ex
Outdated
However, if you find yourself in a scenario where cross-compilation is | ||
necessary, you can manually invoke `Regex.recompile/1` or `Regex.recompile!/1` | ||
to perform a runtime version check and recompile the regex if necessary. | ||
If you know you are running on a different system that the current one AND |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AND should go in lowercase IMO, as we don't use that style. You can still use markdown to emphasize it
@@ -319,6 +319,7 @@ defmodule Mix.Tasks.Escript.Build do | |||
|
|||
defp load_config(config) do | |||
each_fun = fn {app, kw} -> | |||
:application.load(app) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So theoretically you don't need to load the application if you are using the persistent: true
, as persistent: true
gives it higher precedence:
iex(1)> Application.get_all_env :ex_unit
[]
iex(2)> Application.put_env(:ex_unit, :stacktrace_depth, 30, persistent: true)
:ok
iex(3)> Application.get_all_env :ex_unit
[stacktrace_depth: 30]
iex(4)> Application.load :ex_unit
:ok
iex(5)> Application.get_all_env :ex_unit
[
exclude: [],
trace: false,
slowest: 0,
after_suite: [],
assert_receive_timeout: 100,
formatters: [ExUnit.CLIFormatter],
autorun: true,
max_failures: :infinity,
capture_log: false,
refute_receive_timeout: 100,
timeout: 60000,
colors: [],
stacktrace_depth: 30,
include: [],
module_load_timeout: 60000
]
In the example above, the default value for the stacktrace depth is 20.
Co-Authored-By: Eksperimental <eksperimental@users.noreply.github.com>
Note to the team: once this PR is merged we should remove the uses of Regex.recompile from the Elixir codebase itself. |
Thanks @hairyhum! 💜 |
Related to discussion: https://groups.google.com/forum/#!topic/elixir-lang-core/CgFdxIONvGg
Current implementation
Regexes are precompiled to binary values during code compilation.
These binary values may be incompatible between different PCRE versions
and OS endianness. This makes code less portable to achieve slight
performance improvement.
Proposed changes
PCRE version and endianness are parts of Regex structure and may be
checked in runtime.
This PR adds this check and makes regex execution fall back to binary
matching if versions are incompatible.
Notes
This slightly reduces performance (around 5% for simple regexes) for
compatible versions because there is an additional version read.
Also for incompatible versions it's as fast as binary matching.
I'm not sure if there should be additional options to not check for version,
having that option would make matching slightly faster.
Also probably the endianness warning should be removed or modified,
because technically now all code is portable, although not so performant.
Also I'm not sure about the doc string regarding recompiles. Recompile will
still help for performance reasons, but not necessary to make things correct
and may not improve performance for one-off matches.