Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hashtbl.MakeSeeded.{add,replace,of}_seq use wrong hash function #2202

Merged
merged 2 commits into from Mar 29, 2019

Conversation

Projects
None yet
5 participants
@dra27
Copy link
Contributor

commented Dec 17, 2018

This GPR initially addresses a bug, which is that Hashtbl.Make.of_seq creates hash tables in randomized mode, if it has been turned on (e.g. with OCAMLRUNPARAM=R). It then goes one step further and, given that it wraps Hashtbl.create, adds the ?random parameter to Hashtbl.of_seq. of_seq also creates hash tables with a hard-coded initial size of 16, so while I was there, I added ?size. This gets propagated to Ephemeron.

The first commit should really go in 4.08 - the interface change in the second only adds optional parameters, so should also be safe to go in, even at this stage? Are we planning 4.07.2 (@damiendoligez)?

cc @c-cube, @bobot

Fix Hashtbl.Make.of_seq creating randomized tables
Book-keeping error only - although it does potentially initialise the PRNG
unnecessarily.

@dra27 dra27 added the bug label Dec 17, 2018

@dra27 dra27 added this to the 4.08 milestone Dec 17, 2018

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Dec 17, 2018

This GPR initially addresses a bug, which is that Hashtbl.Make.of_seq creates hash tables in randomized mode, if it has been turned on (e.g. with OCAMLRUNPARAM=R).

The current implementation is not satisfactory, but is it really buggy? In this case you mention, the seed field will be populated, but the hash function will ignore it, so this should be mostly harmless (except perhaps a small slowdown to produce the useless seed). Did you find an actual problem? (I'm not claiming the problem should not be fixed, but I'm trying to understand the extent of the issue.)

@dra27

This comment has been minimized.

Copy link
Contributor Author

commented Dec 17, 2018

It appears to be a necessary fix for the compiler, but I agree that it doesn't look like it should be fixing it. I'll dig further.

@dra27

This comment has been minimized.

Copy link
Contributor Author

commented Dec 17, 2018

It's definitely a bug, but not exactly the one I'd stated. The problem is actually add_seq and replace_seq both of which should be redefined in MakeSeeded to ensure that they use the hash function from the functor, rather than default the hash function.

Further commit following - thanks @alainfrisch!

Fix Hashtbl.MakeSeeded.{add,replace,of}_seq
Hashtbl.MakeSeeded.{add,replace}_seq were not using the hash function
provided by the functor (Hashtbl.MakeSeeded.of_seq uses replace_seq and
so also has to be redefined locally).

@dra27 dra27 force-pushed the dra27:hashtbl-seq branch 2 times, most recently from 8166903 to b8e4c61 Dec 17, 2018

@dra27

This comment has been minimized.

Copy link
Contributor Author

commented Dec 17, 2018

So, this time it should pass CI. cd5f93c is very important (the functor is broken, so that's a bug), 14dee98 fixes the crime of initialising the PRNG when it's not actually needed (i.e. it's not that serious, but it's better to be correct) and then b8e4c61 is what IMO the API should look like (i.e. it's much more optional a change)

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Dec 17, 2018

Ah, this was a serious bug indeed.

Two comments:

  • What about moving the toplevel definition for key_index in Hashtbl and all its dependencies after the implementation of the functor? That would avoid such bugs.

  • I'm concerned that adding optional arguments could break some code such as foo |> X.of_seq, or at least raise warning 48 ("implicit elimination of optional argument ?..."). Is it why you marked the change as breaking?

@dra27

This comment has been minimized.

Copy link
Contributor Author

commented Dec 17, 2018

For the refactor, yes agreed - although presumably better done after merging (the bug fix) since it's quite disruptive to the diff. Equally, there's a lot of duplication - is that just because the functorial interface is younger than the module itself, or is it that we need flambda for include MakeSeeded(struct ... end) to be as efficient as the duplicated definitions?

For the optional arguments, I'd marked as breaking simply because, as an interface change, it is breaking! My opinion, surprisingly at the moment for someone British, is that perhaps you cannot have your Warning 48 cake and then grumble when your use |> eats it...! There are obviously other scenarios where it breaks code - the testsuite being another example (and, indeed, Ephemeron), but I considered that this would primarily affect Hashtbl extensions (Core, Batteries, ExtLib, etc.), which would expect to have to be updated with a new version of OCaml if another function got added anyway.

I could cope with the ?size parameter being dropped, but I think the lack of ?random would not be great? Also happy to split this into two GPRs, given that the bug bit is more serious than I'd originally realised...

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Dec 17, 2018

I find Warning 48 quite reasonable; without it, a simple identifier reference can end up executing code and allocating. It would be interesting to see how performance is degraded for code such as foo |> X.of_seq due to the addition of the optional arguments; does the optimizer removes the closure allocation?

Anyway, the bug fix part should obviously be accepted quickly, so splitting the PR is a good idea.

@dra27

This comment has been minimized.

Copy link
Contributor Author

commented Dec 17, 2018

I wasn't arguing against warning 48, per se - it was the combination of the two features! In terms of performance, wouldn't we be micro-managing this a bit - of_seq is responsible for allocating an entire hash table with, at present, 16 potentially unused buckets, so are the closures for any optional arguments likely to be a concern?

@dra27 dra27 force-pushed the dra27:hashtbl-seq branch from b8e4c61 to 14dee98 Dec 17, 2018

@dra27 dra27 changed the title Add ?size and ?random to Hashtbl.of_seq and related functions Hashtbl.MakeSeeded.{add,replace,of}_seq use wrong hash function Dec 17, 2018

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Dec 17, 2018

LGTM. One could merge as is if needed, but it would be better if you could add a non-regression test for that bug.

@dra27 dra27 referenced this pull request Feb 6, 2019

Open

Add Hashtbl.{to,of}_list #2236

@gasche

This comment has been minimized.

Copy link
Member

commented Feb 6, 2019

What's the status of this PR? It looks like something we want to have in 4.08 if we have it.

@XVilka

This comment has been minimized.

Copy link
Contributor

commented Mar 19, 2019

Will be this a part of the upcoming 4.08 release if I might ask?

@dra27 dra27 force-pushed the dra27:hashtbl-seq branch from 14dee98 to 77495a3 Mar 19, 2019

@dra27

This comment has been minimized.

Copy link
Contributor Author

commented Mar 19, 2019

Rebased.

@gasche - sorry, this slipped off my radar, but both commits should really go into 4.08 (just to clarify, as this PR got split - there are no interface changes in this PR)

@alainfrisch - I just started trying to write a test, but it seems really fake here - the only way this would break would be by actively removing those functions and any regression test wouldn't guard against the additional of some future from_ and of_ functions (cf. #2236). Conversely, your suggestion of refactoring would prevent both regression and any future errors when adding similar functions - I've set myself a reminder to work on this. Are you happy for this to go in without further tests, therefore?

@dra27

This comment has been minimized.

Copy link
Contributor Author

commented Mar 19, 2019

Thanks for the ping, @XVilka!

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Mar 19, 2019

Yes, ok @dra27, we really want the fix in 4.08, with or without tests.

@gasche

This comment has been minimized.

Copy link
Member

commented Mar 19, 2019

I agree with the idea of getting this in 4.08 and I think there is still time for this. (Someone needs to approve the PR.) If we don't have tests, I would personally be interested in seeing the suggested refactoring done -- I'm assuming that @alainfrisch approves now and would be willing to review the refactoring as an extra commit.

@xavierleroy
Copy link
Contributor

left a comment

Looks good to me as well.

@xavierleroy xavierleroy merged commit 9474860 into ocaml:trunk Mar 29, 2019

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2019

Merged in trunk (commits fc8be50 and 9474860) and cherry-picked to 4.08 (commits b98f075 and d84c3aa).

@dra27

This comment has been minimized.

Copy link
Contributor Author

commented Mar 29, 2019

@xavierleroy - we have another snap! 🙂 I was just about to push the refactor commit, but I'll put in a separate PR (it's not for 4.08 anyway)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.