-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PATCH] Optimization of pattern-matching on strings #6269
Comments
Comment author: @alainfrisch I'm very much in favor of such a patch. Benoit: can you say a few words about the algorithm to dispatch on strings? Does it apply only when matching on a single string, or also on more complex patterns involving strings? A minor comment: code such as let lt_pats = List.filter (fun (pat, _) -> Array.length pat < med_sz) pats and ge_pats = List.filter (fun (pat, _) -> Array.length pat >= med_sz) pats in can be simplified using List.partition. |
Comment author: @gasche I haven't reviewed the code carefully, but here are a few words about what I understood from the algorithm. (0) it collects all the constant strings in a pattern and matches on all of them at once (1) Benoît first turns strings into array of machine words, and does the matching on them (he uses (Cload Word) to ask cmm to read a full word from a given offset in the matched string) (2) A combination of two strategies is used: (2a): we look in the constant word-arrays for a good "discriminant", which is an index smaller than the minimum length of all arrays, which has the biggest number of groups of word-arrays having the same value at this index, and outputs a binary tree of dichotomic tests on the scrutinee word at this index (2b): if no such discriminant exists (all such indices have been checked already, which happens if you match eg. "foo" "foobar" "foobaz" after the "foo" prefix has been tested), it does a tree of dichomotic tests on the size of each constant string at the leaves of the test of either strategy, the set of possible matches has been reduced, and (2) starts again I have a couple remarks from looking quickly at the code:
|
Comment author: @alainfrisch
So, each index defines a partition of candidates, and the strategy is to select the index in order to maximize the number of classes in this partition? Intuitively, it would be better to find the index which minimizes the maximum size of a class in the partition, no? |
Comment author: bvaugon Thank you for reviewing the code and comments.
In fact, this optimization is performed after the pattern-matching are match l with is near than 3.1 times faster with this optimization.
Indeed, it might be better to call List.partition instead of calling
Semantically, yes. For efficiency, my version is a bit faster since it
This assumption is statically verified, so if a problem appears later,
No because the "set of cases" changes at each call. Indeed, the
Indeed, it terminates. In fact, all string patterns are distinct at It can be added as a comment, if you want.
Obviously not. According to the memory representation of strings, an
Indeed, to reduce the size of the generated code, it is interesting to |
Comment author: @alainfrisch
I don't think that my proposal can be described as "the less discriminant" column. What I propose to minimize is not the number of classes in the partition, but the size of the largest class in it. This strategy should minimize the maximum number of comparisons required at runtime. Let's consider the following example: a000 According to your criterion (assuming it operates on characters, not words), one should first dispatch on the first character (it gives a partition with 3 classes, while other choices give partitions with only 2 classes), but when it is 'a', we still have 5 candidates, and we need 3 more dispatches to conclude. It seems better to start with any of the last three characters, because they all give partitions whose largest class has 4 elements (and 2 more comparisons are then enough), while the first character gives a partition whose largest class has 5 elements. In general, finding the optimal strategy (which minimizes the depth of the decision tree) is probably difficult, but this seems to be a good heuristics. Also, I'm wondering whether it wouldn't be appropriate to dispatch first on the size: since it is usually quite small, the dispatch could be compiled as a table-based switch. |
Comment author: @gasche
In fact you could want to see "testing character at offset i" and
I was thinking of using a hashtable internally to each and extract_distinct pats i =
My understanding of the previous implementation is that the List.assoc |
Comment author: bvaugon
Ah ok, sorry. Well, I compared and benchmarked 4 strategies: It was particulary hard to find examples giving significant It is also this strategie (2/) which generates the fastest code (just match s with Please find attached the new version, using the strategie (2/), using |
Comment author: @alainfrisch Did you try a strategy based on always dispatching on the size (either with a decision tree, or with a Cswitch) first? |
Comment author: @gasche I discussed this proposal with Luc. He points out that his paper on compiling pattern-matching with decision trees tries a lot of different heuristics and comes to the conclusion that it is indeed the "least discriminant choice" that is better. He also points out that the Switch module (which provides a functor) should be used instead of building the dichomotic test sequences by hand -- see how the call_switcher function is used in in bytecomp/matching.ml, and transl_switch in cmmgen. I now have an intuitive understanding of why the least discriminant choice is better; it strongly depends on the existence of a fallback case if none of the constants are matched. Indeed, what happens if the N string constants were all the same, except on the first character? The least-discriminant match would recommend testing the second character, third characters, etc., before the first one. This seems useless but is necessary to check that we're not in the "none of the constants matched" case anyway. After we made all those non-discriminating checks, we check the first character that makes a difference. Note that if you test the first character first, you duplicate a lot of code, as all the tests for all the other characters are done N times (in each branch you still need to check for the failure case). In Luc's pattern-matching paper, this was avoided of by hashconsing the decision trees (to share identical test tails). But the least-discriminating strategy was still best, and his argument is that locally (if you only look at the level of one test), it is the one that produces the less code (less branches, smaller tree), and empirically this property extends in a glutton way. Now let's be clear about the fact that:
I also wonder whether we couldn't actually do that in the typedtree/lambda-code level, by using the %caml_string_get{32,64} primitives. I'm not sure this would be better, though, in particular I don't know how this would handle the extraneous padding bits at the end of the string (I'm still not sure how the present patch handles this either, though). |
Comment author: @damiendoligez
Possibly irrelevant, but this reminds me of the Paige & Tarjan partition-refinement algorithm (IIRC, for computing bisimulations on automata). |
Comment author: @maranget I have reviewed the patch and well I have rewritten strmatch.ml... I have somehow simplified the algorithm by first discriminating on I have also made some effort to use code defined in cmmgen that is passed
The rest of the v2 patch has not changed. |
Comment author: @maranget I am to include the patch in trunk. Given the interaction with the new feature of shared constants, Bytecode compilation has also improved, from N string tests to |
Original bug ID: 6269
Reporter: bvaugon
Assigned to: @maranget
Status: closed (set by @xavierleroy on 2015-12-11T18:25:57Z)
Resolution: fixed
Priority: normal
Severity: feature
Version: 4.01.0
Fixed in version: 4.02.0+dev
Category: back end (clambda to assembly)
Tags: patch
Monitored by: @gasche @ygrek @jmeber @hcarty
Bug description
I wrote a patch to improve performances of pattern-matching on strings
in native code.
For example, with this optimization, the following code:
match s with
| "let" -> 0
| "match" -> 1
| "with" -> 2
| "if" -> 3
| "then" -> 4
| "else" -> 5
| "constraints" -> 6
| "module" -> 7
| "type" -> 8
| "struct" -> 9
| "sig" -> 10
| "end" -> 11
| _ -> -1
becomes about:
In order to be uniform, this optimization also impacts simple
comparisons with string constants. Consequently, this kind of code:
if s = "Hello World" then ... else ...
is also a bit more efficient.
This optimization is performed at the translation of clambda
to C-- because it generates low-level instructions unrepresentable in
higher intermediate languages.
The size of the generated code remains proportional to the size of the
source code. However, it may become a bit smaller or bigger than it is
currently. For example, with this optimization, the bootstraped
ocamlopt.opt is 1.0015 times bigger, and the preceding match with 13
cases is 224 bytes smaller.
To replace the current implementation of pattern-matching on strings,
people usually uses a hash table. Indeed, Hashtbl access is in O(1),
while current string pattern-matching is in O(n) and the new one is in
O(log(n)) where n is the number of cases. With this optimized string
pattern-matching, it is no longer necessary to resort to Hashtbl
unless the number of cases is over 17000 (on an "Intel(R) Core(TM)
i7-3770 CPU @ 3.40GHz")!
File attachments
The text was updated successfully, but these errors were encountered: