-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Performance improvements #85
WIP: Performance improvements #85
Conversation
This change makes the dynamically generated functions be written first, allowing overridable functions to be defined. Previously, when the match macro was run as the last statement, it would overwrite any previously defined functions.
Using pattern matching on the start of the string significantly increases the performance. Before ## PhoneBench benchmark name iterations average time Phone.parse/1 with an Andorra phone number 5000 673.05 µs/op Phone.parse/1 with a Zimbabwe phone number 5000 691.99 µs/op Phone.parse/1 with an NANP phone number 1000 1047.04 µs/op With these changes ## PhoneBench benchmark name iterations average time Phone.parse/1 with an Andorra phone number 10000 294.79 µs/op Phone.parse/1 with a Zimbabwe phone number 10000 298.49 µs/op Phone.parse/1 with an NANP phone number 5000 713.80 µs/op Difference ## PhoneBench Phone.parse/1 with a Zimbabwe phone number 0.43 Phone.parse/1 with an Andorra phone number 0.44 Phone.parse/1 with an NANP phone number 0.68 I still think we can do better, especially on NANP phone numbers, but this is a good start.
1 similar comment
Regarding doing phone number parsing with multiple function definitions, the MIME library shows a great example of how to do this: https://github.com/elixir-lang/mime/blob/master/lib/mime.ex#L93-L100. I think I'll look into that and see if I can put together something. |
Great to know that pattern matching even with string partials are faster than regex. I'll take a look at it and try to come up with some strategy to have less pattern matching possible. |
I've done some tests with pure pattern matching and pure regex match, I've come up with those results:
I did it with only one case of pattern matching, I think if that number increase the performance would drop, but the difference is huge. I thought of changing the |
Ideally I'm thinking we could use metaprogramming to generate one big parse function with multiple definitions like this. def parse("1970" <> <<number::byte_size(7)>>) do
# Run regex just to be sure
Regex.match?(...)
{:ok, %{a2: "US", a3: "USA", country: "United States", international_code: "1", area_code: "970", number: number, area_abbreviation: "CO", area_type: "state", area_name: "Colorado"}}
end
def parse("1303" <> <<number::byte_size(7)>>) do
# Run regex just to be sure
Regex.match?(...)
{:ok, %{a2: "US", a3: "USA", country: "United States", international_code: "1", area_code: "303", number: number, area_abbreviation: "CO", area_type: "state", area_name: "Colorado"}}
end My thought is calling one big function will be faster than calling a function for each country (but we should benchmark that to see if it's true). The trick would be storing all of this data so you could generate this function at compile time. Also, I wonder if it would be faster to generate separate function definitions for each area code in a state or use the in guard clause like this: def parse("1"<<area_code::byte_size(3)>> <> <<number::byte_size(7)>>) when area_code in ~w(303 719 720 970) do
# Run regex just to be sure
Regex.match?(...)
{:ok, %{a2: "US", a3: "USA", country: "United States", international_code: "1", area_code: area_code, number: number, area_abbreviation: "CO", area_type: "state", area_name: "Colorado"}}
end This sure is exciting to find room for such a huge performance gain. I don't think we have to do all of the checking via pattern matching, just enough to quickly exclude the other countries so we only have to run 1 regex. Thanks again for looking at this. Please let me know how I can help. |
I understand that performance is a huge issue, but I think the main objective is try to keep the lib easy to contribute(phone rules use to change a lot, even more in small countries). My first idea was to change the edit: |
@aaronrenner if you agree and want to spend more time with that, we could do it both ways, you do it the way you said and I create a pr with the way I'm thinking, and we compare the performances and readability of the code. |
@fcevado I'd be happy to spend a little more time on this so we can try it both ways. Looking forward to the learning experience! 😄 |
@aaronrenner I'll close those PRs, I worked on that this weekend and I believe that i reached it, follows my results:
In a counterpart the complexity was thrown to the compiler, the compilation time is really big. |
I think we can achieve a lot of performance gains in this library by using pattern matching to quickly narrow down the list of possible phone number parsers and then using regular expressions to extract the necessary data and parse the phone number. I tried the following benchmark earlier today to compare pattern matching vs regular expressions and found there's a huge speed difference.
That makes me think if we can leverage elixir's pattern matching, we have a tone of room for performance improvements.
This is my first try at using pattern matching, and I think there are a lot more gains we can make. I changed how you set the the match type for each country module (to eventually allow for overriding functions) and added a
number_prefix
option that helps with the pattern matching in the generatedmatch?
function. I would recommend looking at the commits one at a time and letting me know what you think.Here's the before and after performance metrics.
I'm curious to see if we could have even bigger gains if we could move the phone number parsing into one giant (generated) parse function with one definition for each prefix. I'm not sure how we'd do it, but it would be interesting to see how fast we can make this library.