New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix hash table performance problem in Emitaux (PR#7456) #1092

Merged
merged 4 commits into from Mar 15, 2017

Conversation

Projects
None yet
5 participants
@mshinwell
Contributor

mshinwell commented Mar 10, 2017

The default parameters for Hashtbl.hash are insufficient to hash Debuginfo.t values in any reasonable manner. On the example from PR#7456, which has nearly 300,000 such values in the linearized code, there are a lot of hash collisions and about 30 seconds is taken on a fast machine inside label_debuginfos.

This patch adds a proper hash function which removes the performance regression (introduced between 4.03 and 4.04).

@mshinwell mshinwell added the bug label Mar 10, 2017

@mshinwell mshinwell added this to the 4.05.0 milestone Mar 10, 2017

@@ -95,6 +95,9 @@ let compare dbg1 dbg2 =
in
loop (List.rev dbg1) (List.rev dbg2)
let hash t =

This comment has been minimized.

@alainfrisch

alainfrisch Mar 10, 2017

Contributor

Wouldn't it work to use Hashtbl.hash_param with sufficiently high numbers? Or are the hard-coded caps too low?

@alainfrisch

alainfrisch Mar 10, 2017

Contributor

Wouldn't it work to use Hashtbl.hash_param with sufficiently high numbers? Or are the hard-coded caps too low?

This comment has been minimized.

@mshinwell

mshinwell Mar 10, 2017

Contributor

I don't think it's reasonable to estimate the numbers in general.

@mshinwell

mshinwell Mar 10, 2017

Contributor

I don't think it's reasonable to estimate the numbers in general.

@alainfrisch

This comment has been minimized.

Show comment
Hide comment
@alainfrisch

alainfrisch Mar 10, 2017

Contributor

Well spotted! Is the total time still spend in the compare/hash for debuginfos still noticeable? If needed, equality could be optimized quite easily (by not reversing the lists, in particular).

Contributor

alainfrisch commented Mar 10, 2017

Well spotted! Is the total time still spend in the compare/hash for debuginfos still noticeable? If needed, equality could be optimized quite easily (by not reversing the lists, in particular).

@xavierleroy

This comment has been minimized.

Show comment
Hide comment
@xavierleroy

xavierleroy Mar 10, 2017

Contributor

Once you have a proper compare function, a reference to a finite map could be used instead of a hashtable, with the benefit that you don't need to hack a hash function and don't risk collisions, and the potential risk of being slightly slower in the average case. Just saying.

Contributor

xavierleroy commented Mar 10, 2017

Once you have a proper compare function, a reference to a finite map could be used instead of a hashtable, with the benefit that you don't need to hack a hash function and don't risk collisions, and the potential risk of being slightly slower in the average case. Just saying.

@alainfrisch

This comment has been minimized.

Show comment
Hide comment
@alainfrisch

alainfrisch Mar 10, 2017

Contributor

FWIW, the current compare function does not seem very good, since it reverses the two lists. I don't know how long the lists are, nor if the specific currently implemented ordering is useful in other contexts, but if one wants to use Map, a more efficient comparison should be considered.

Contributor

alainfrisch commented Mar 10, 2017

FWIW, the current compare function does not seem very good, since it reverses the two lists. I don't know how long the lists are, nor if the specific currently implemented ordering is useful in other contexts, but if one wants to use Map, a more efficient comparison should be considered.

@mshinwell

This comment has been minimized.

Show comment
Hide comment
@mshinwell

mshinwell Mar 10, 2017

Contributor

@alainfrisch I checked a perf report with this patch and couldn't see any evidence of Debuginfo being a performance problem any more. I would encourage that we don't bikeshed too much, because there are still plenty of other more serious bugs to fix ;)

Contributor

mshinwell commented Mar 10, 2017

@alainfrisch I checked a perf report with this patch and couldn't see any evidence of Debuginfo being a performance problem any more. I would encourage that we don't bikeshed too much, because there are still plenty of other more serious bugs to fix ;)

@alainfrisch

This comment has been minimized.

Show comment
Hide comment
@alainfrisch

alainfrisch Mar 10, 2017

Contributor

Well, it would be unfortunate to miss an easy non-negligible improvement once one have identified a potential relatively hot spot. But I hear you, and I'm ok for merging the PR in its current state. Let's perhaps keep the Mantis ticket open for further investigation.

Contributor

alainfrisch commented Mar 10, 2017

Well, it would be unfortunate to miss an easy non-negligible improvement once one have identified a potential relatively hot spot. But I hear you, and I'm ok for merging the PR in its current state. Let's perhaps keep the Mantis ticket open for further investigation.

@mshinwell

This comment has been minimized.

Show comment
Hide comment
@mshinwell

mshinwell Mar 10, 2017

Contributor

I don't think the Mantis ticket needs to remain open; the reported problem has been fixed. (I think in general we need to try harder not to have Mantis issues lying around "just in case", or we'll never clear them up.)

Contributor

mshinwell commented Mar 10, 2017

I don't think the Mantis ticket needs to remain open; the reported problem has been fixed. (I think in general we need to try harder not to have Mantis issues lying around "just in case", or we'll never clear them up.)

@mshinwell

This comment has been minimized.

Show comment
Hide comment
@mshinwell

mshinwell Mar 10, 2017

Contributor

(I've left a comment in the code noting that the comparison function might be improved.)

Contributor

mshinwell commented Mar 10, 2017

(I've left a comment in the code noting that the comparison function might be improved.)

@ygrek

This comment has been minimized.

Show comment
Hide comment
@ygrek

ygrek Mar 11, 2017

Contributor

will this get into 4.04.1 by any chance?

Contributor

ygrek commented Mar 11, 2017

will this get into 4.04.1 by any chance?

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Mar 11, 2017

Member

To me the patch as proposed looks safe enough to be included in 4.04.1 -- plus it fixes something that is a regression in 4.04.0, so it sort of makes sense.

The compare function was added in f8abc41#diff-594700c2711e1ea5b31e5c5659fe3471R79 , from the pull request #666. This is when the conversion to buit-in lists was made by @lpw25; as far as I can tell, there is no specific reason for reversing besides keeping the order of the previous version.

Member

gasche commented Mar 11, 2017

To me the patch as proposed looks safe enough to be included in 4.04.1 -- plus it fixes something that is a regression in 4.04.0, so it sort of makes sense.

The compare function was added in f8abc41#diff-594700c2711e1ea5b31e5c5659fe3471R79 , from the pull request #666. This is when the conversion to buit-in lists was made by @lpw25; as far as I can tell, there is no specific reason for reversing besides keeping the order of the previous version.

@gasche gasche modified the milestones: 4.04.1, 4.05.0 Mar 12, 2017

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Mar 14, 2017

Member

@mshinwell you could rebarge and merge in 4.04.1, 4.05, trunk.

Member

gasche commented Mar 14, 2017

@mshinwell you could rebarge and merge in 4.04.1, 4.05, trunk.

@mshinwell mshinwell merged commit 0903941 into ocaml:4.05 Mar 15, 2017

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@mshinwell

This comment has been minimized.

Show comment
Hide comment
@mshinwell

mshinwell Mar 15, 2017

Contributor

Merged to 4.05, I will cherry-pick to 4.04 and trunk presently.

Contributor

mshinwell commented Mar 15, 2017

Merged to 4.05, I will cherry-pick to 4.04 and trunk presently.

mshinwell added a commit that referenced this pull request Mar 15, 2017

camlspotter pushed a commit to camlspotter/ocaml that referenced this pull request Oct 17, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment