Original bug ID: 6826 Reporter:@alainfrisch Assigned to:@alainfrisch Status: resolved (set by @alainfrisch on 2017-03-24T14:36:56Z) Resolution: fixed Priority: normal Severity: feature Target version: later Fixed in version: 4.06.0 +dev/beta1/beta2/rc1 Category: typing Monitored by:@diml@jmeber@hcarty@yakobowski
Bug description
Every time an "open" is type-checked, all items from the opened signature are added to the environment. This can become problematic esp. with local opens. Consider for instance:
module A = struct
let x1 = 1
...
let xN = N
end
let x = A.(x1)
...
let x = A.(xN)
Type-checking this with ocamlc.opt shows a clear quadratic behavior:
N = 1000: 1.04s
N = 2000: 4.54s
N = 3000: 10.98s
I don't know how bad this is in practice, since people don't often have modules with so many elements, but this show that perhaps something could be improved.
This is not related to sub-modules, so I don't think that #5916 is relevant here. I checked manually, and all the time in Env.open_signature is spent in computing newenv, not on computing the substituted signature sg. And even more specifically, most of the time is spent in Ident.add (and a non-negligible part of it in the string comparison between id.name and k.ident.name).
This could be improved in several ways:
Optimize Ident tables. For instance, using a different data structure (note: it seems that allowing a height difference of 2 as in Set/Map, and not 1 as currently, actually degrades performance) or a more optimized comparison function (either precomputing a hash of names for each ident, perhaps using hash-consing to assign globally unique int to each name). Or tweaking the datatype definition to reduce allocations (e.g. merging the first level of the 'a data record into 'a tbl). This should improve type-checking performance globally, not only for opens.
Memoize the transformation of included signatures into a env-like representation (with quick lookup), and use a version of Ident table with an efficient merge operation. Typically, one could keep the merge symbolically (meaning that each lookup might need to be dispatched to both arguments sequentially) and only materialize it after a given number of lookups aren't found in the left argument (this number would be related to the size of this argument). The idea being that there is usually a small number of lookups done in the environment resulting from a local open, and if the included signature is large enough, it's better not to actually merge the maps. (Actually, if the depth of nested opens is not too big, it might be relevant to never merge the maps.)
I've attached a quick POC to evaluate the idea of keeping a symbolic representation merged Env.t to type-check "open" constructs. This is mostly encapsulated in the EnvTbl module, which now has an optional "previous" field.
Results are as follow:
N = 1000: 0.07s
N = 2000: 0.13s
N = 3000: 0.26s
This is very lightly tested, and doesn't support warnings on open statements (unused opens, shadowing) for now. Also, one should probably use this new representation only for local opens, not global ones (EDIT: done in the second version of the uploaded patch).
I've done some other micro-benchmarks, and already doing 1000 local opens on a module of size 150 (which is not completely unrealistic even for hand-written code) gives a x10 speedup (0.23s -> 0.02s).
Original bug ID: 6826
Reporter: @alainfrisch
Assigned to: @alainfrisch
Status: resolved (set by @alainfrisch on 2017-03-24T14:36:56Z)
Resolution: fixed
Priority: normal
Severity: feature
Target version: later
Fixed in version: 4.06.0 +dev/beta1/beta2/rc1
Category: typing
Monitored by: @diml @jmeber @hcarty @yakobowski
Bug description
Every time an "open" is type-checked, all items from the opened signature are added to the environment. This can become problematic esp. with local opens. Consider for instance:
module A = struct
let x1 = 1
...
let xN = N
end
let x = A.(x1)
...
let x = A.(xN)
Type-checking this with ocamlc.opt shows a clear quadratic behavior:
N = 1000: 1.04s
N = 2000: 4.54s
N = 3000: 10.98s
I don't know how bad this is in practice, since people don't often have modules with so many elements, but this show that perhaps something could be improved.
This is not related to sub-modules, so I don't think that #5916 is relevant here. I checked manually, and all the time in Env.open_signature is spent in computing newenv, not on computing the substituted signature sg. And even more specifically, most of the time is spent in Ident.add (and a non-negligible part of it in the string comparison between id.name and k.ident.name).
This could be improved in several ways:
Optimize Ident tables. For instance, using a different data structure (note: it seems that allowing a height difference of 2 as in Set/Map, and not 1 as currently, actually degrades performance) or a more optimized comparison function (either precomputing a hash of names for each ident, perhaps using hash-consing to assign globally unique int to each name). Or tweaking the datatype definition to reduce allocations (e.g. merging the first level of the 'a data record into 'a tbl). This should improve type-checking performance globally, not only for opens.
Memoize the transformation of included signatures into a env-like representation (with quick lookup), and use a version of Ident table with an efficient merge operation. Typically, one could keep the merge symbolically (meaning that each lookup might need to be dispatched to both arguments sequentially) and only materialize it after a given number of lookups aren't found in the left argument (this number would be related to the size of this argument). The idea being that there is usually a small number of lookups done in the environment resulting from a local open, and if the included signature is large enough, it's better not to actually merge the maps. (Actually, if the depth of nested opens is not too big, it might be relevant to never merge the maps.)
File attachments
The text was updated successfully, but these errors were encountered: