Skip to content

Make regexes fall back to binary matching on incompatible runtime version #9040

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
May 21, 2019

Conversation

hairyhum
Copy link
Contributor

Related to discussion: https://groups.google.com/forum/#!topic/elixir-lang-core/CgFdxIONvGg

Current implementation

Regexes are precompiled to binary values during code compilation.
These binary values may be incompatible between different PCRE versions
and OS endianness. This makes code less portable to achieve slight
performance improvement.

Proposed changes

PCRE version and endianness are parts of Regex structure and may be
checked in runtime.

This PR adds this check and makes regex execution fall back to binary
matching if versions are incompatible.

Notes

This slightly reduces performance (around 5% for simple regexes) for
compatible versions because there is an additional version read.

Also for incompatible versions it's as fast as binary matching.

I'm not sure if there should be additional options to not check for version,
having that option would make matching slightly faster.

Also probably the endianness warning should be removed or modified,
because technically now all code is portable, although not so performant.

Also I'm not sure about the doc string regarding recompiles. Recompile will
still help for performance reasons, but not necessary to make things correct
and may not improve performance for one-off matches.

…sion.

Rgexes are precompiled to binary values during code compilation.
These binary values may be incompatible between different PCRE versions
and OS endianness. This makes code less portable to achieve slight
performance improvement.

PCRE version and endianness are parts of Regex structure and may be
checked in runtime.

This PR adds this check and makes regex execution fall back to binary
matching if versions are incompatible.

This slightly reduces performance (around 5% for simple regexes) for
compatible versions because there is an additional version read.
Also for incompatible versions it's as fast as binary matching.
@josevalim
Copy link
Member

This looks great! ❤️ Some notes:

Also probably the endianness warning should be removed or modified, because technically now all code is portable, although not so performant.

Agreed, I would still keep the warning (we can have a flag to disable it) and note about performance.

Also I'm not sure about the doc string regarding recompiles.

Yeah, we can change it to say something like "Recompile is useful if you know you are running on a different system that the current one AND you are doing multiple matches with the regex"

Btw, we should also do a pass on the Elixir codebase and remove all instances of Regex.recompile, as there is likely nothing to gain there.

hairyhum added 2 commits May 14, 2019 14:29
The warning will be printed every time elixir starts, which may be
unneccessary for prebuilt releases or escripts.
"This may impact regex matching performance. "
"You may want to recompile elixir and and source files in a machine with "
"the same endianness as the current one: ~ts~n",
[CompiledEndianness, Endianness])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer for the warning to be kept the same, given we provide a way for you to disable it. If you are running Elixir in production and you see this warning, then you want to act on it (and it should be scary). It may have other implications besides only regexes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can revert this.
I had an impression that regexes were the only implication. Do you know any examples?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know any but I don't want to pass the impression there isn't any other, if that makes any sense. :)

@hairyhum hairyhum force-pushed the regex_platform_independent branch from 1ff61fe to 21da203 Compare May 14, 2019 19:15
@hairyhum
Copy link
Contributor Author

Another question is how we should pass the option to disable the warning?
From the OTP point of view, application environment is the way to go, but configuring that in mix may be not that straight forward.
config.exs is loaded after elixir application prints this warning and will not work unless it is a prebuilt release.
Escript will need additional emu_args to set this variable.
@josevalim what would be the best place to document that?

@josevalim
Copy link
Member

@hairyhum I thought that for escripts we load the configuration options before we start Elixir, no? If not, I would change them to guarantee that's the case.

@hairyhum
Copy link
Contributor Author

I thied to create a Mix project like this: https://gist.github.com/hairyhum/45e1a4c80495a86ab3ca9bda6c8e11b9

And added loging to elixir.erl to check the environment variable (based on this branch):

diff --git a/lib/elixir/src/elixir.erl b/lib/elixir/src/elixir.erl
index 19caba540..732c24dc1 100644
--- a/lib/elixir/src/elixir.erl
+++ b/lib/elixir/src/elixir.erl
@@ -32,6 +32,7 @@ start(_Type, _Args) ->
   set_stdio_and_stderr_to_binary_and_maybe_utf8(),
   check_file_encoding(Encoding),
 
+  io:format("~nEndianness setting: ~p~n", [application:get_env(elixir, check_endianness)]),
   case application:get_env(elixir, check_endianness, true) of
     true  -> check_endianness();
     false -> ok

What I see is:

$ mix escript.build

Endianness setting: {ok,true}
Generated escript escript/app with MIX_ENV=dev
$ ./escript/app 

Endianness setting: {ok,true}
HELLO

@josevalim
Copy link
Member

Config should be available on elixir application start
to read config environment variables like `check_endianness`
@hairyhum
Copy link
Contributor Author

Moved config loading before app start. Also call application:load/1 for each application in the config to make sure default application env does not override the loaded config.

However, if you find yourself in a scenario where cross-compilation is
necessary, you can manually invoke `Regex.recompile/1` or `Regex.recompile!/1`
to perform a runtime version check and recompile the regex if necessary.
If you know you are running on a different system that the current one AND
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AND should go in lowercase IMO, as we don't use that style. You can still use markdown to emphasize it

@@ -319,6 +319,7 @@ defmodule Mix.Tasks.Escript.Build do

defp load_config(config) do
each_fun = fn {app, kw} ->
:application.load(app)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So theoretically you don't need to load the application if you are using the persistent: true, as persistent: true gives it higher precedence:

iex(1)> Application.get_all_env :ex_unit
[]
iex(2)> Application.put_env(:ex_unit, :stacktrace_depth, 30, persistent: true)
:ok
iex(3)> Application.get_all_env :ex_unit
[stacktrace_depth: 30]
iex(4)> Application.load :ex_unit
:ok
iex(5)> Application.get_all_env :ex_unit
[
  exclude: [],
  trace: false,
  slowest: 0,
  after_suite: [],
  assert_receive_timeout: 100,
  formatters: [ExUnit.CLIFormatter],
  autorun: true,
  max_failures: :infinity,
  capture_log: false,
  refute_receive_timeout: 100,
  timeout: 60000,
  colors: [],
  stacktrace_depth: 30,
  include: [],
  module_load_timeout: 60000
]

In the example above, the default value for the stacktrace depth is 20.

hairyhum and others added 2 commits May 16, 2019 10:32
Co-Authored-By: Eksperimental <eksperimental@users.noreply.github.com>
@josevalim josevalim changed the title Make regexes fall back to binary matching on incompatible runtime version. Make regexes fall back to binary matching on incompatible runtime version May 18, 2019
@josevalim
Copy link
Member

Note to the team: once this PR is merged we should remove the uses of Regex.recompile from the Elixir codebase itself.

@fertapric fertapric merged commit 145b701 into elixir-lang:master May 21, 2019
@fertapric
Copy link
Member

Thanks @hairyhum! 💜

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants