Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8268358: [lworld] toString for primitive class should return ClassName@hash #438

Closed
wants to merge 3 commits into from

Conversation

@mlchung
Copy link
Member

@mlchung mlchung commented Jun 8, 2021

Object::toString implementation of a primitive class should return the traditional ClassName@hash (rather than listing the field values) not to leak any private and security-sensitive information. A primitive class can override toString implementation for their custom string representation.


Progress

  • Change must not contain extraneous whitespace

Issue

  • JDK-8268358: [lworld] toString for primitive class should return ClassName@hash

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/valhalla pull/438/head:pull/438
$ git checkout pull/438

Update a local copy of the PR:
$ git checkout pull/438
$ git pull https://git.openjdk.java.net/valhalla pull/438/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 438

View PR using the GUI difftool:
$ git pr show -t 438

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/valhalla/pull/438.diff

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Jun 8, 2021

👋 Welcome back mchung! A progress list of the required criteria for merging this PR into lqagain will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

@openjdk openjdk bot commented Jun 8, 2021

@mlchung This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8268358: [lworld] toString for primitive class should return `ClassName@hash`

Reviewed-by: rriggs

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the lqagain branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the lqagain branch, type /integrate in a new comment.

@mlbridge
Copy link

@mlbridge mlbridge bot commented Jun 8, 2021

Webrevs

Copy link
Collaborator

@RogerRiggs RogerRiggs left a comment

If the hashcode of a primitive class uses the normal multiply and add technique,
values may still be leaked because the algorithm is predictable and can be replicated.
Perhaps with a secure hash...

@mlchung
Copy link
Member Author

@mlchung mlchung commented Jun 8, 2021

This is the default implementation of Object::toString. If one wants a class to use secure hash, they can override the hashCode method.

@rose00
Copy link
Collaborator

@rose00 rose00 commented Jun 9, 2021

I agree strongly with this change. The correct
generalization to primitives of System.identityHashCode and
the hex number mentioned by Object.toString is not something
that looks like java.lang.Record field reports, but rather a
VM-chosen number, suitable for hash codes, and hard to
predict.

But, using the algorithm of Objects.hashCode should not be
the final word in producing the VM-chosen number.

(This is not the PR to change that, but this is a very good
moment to make the point.)

We should use a variable (salted) hash code, so that people
will not rely on the presence of the 31*x+ hashcode. It
would be an own-goal if we locked ourselves forever into a
new use of that weak and leaky algorithm.

(The 31*x+ hash code computes a weighted checksum of
Sum[i] x[L-i]*(31**i), truncated to 32 bits. It is very
prone to collisions, especially when neighboring x’s are
linearly related. A better hash code would (a) be more
resistant to simple collisions and (b) would tend to have
all bits of output depend on all bits of input.)

At a bare minimum, the JVM should configure a constant
random 32-bit “salt” at startup time, and xor that number
into the computation of the primitive hash code in such a
way that the output (for one object) is unguessable.

That is not secure (nor am I expecting this) since after you
hash a few thousand test objects you can reverse-engineer
the constants. But adding the randomness now would give
us the ability to tune the algorithm going forward.

Note that the JVM’s identity hash code is configurable and
relatively unpredictable 1.

We could make the same true of the built-in primitive hash
code. The key step would be replacing the 31*x+ step with
a step that uses a larger state (at least 64 bits) and
better mixing (at least another shift and xor).

The cost of such upgrades is negligible. The Marsaglia
register used for identity hash code is 128 bits (a good
size) and each hash step uses four xor operations to produce
one 32-bit value.

A different use of a similar register could consume about 64
bits of primitive field material for each step. More steps
for smaller chunks of field material would improve mixing at
a growing CPU cost. A little parametric salt could be added
either at the beginning or during hash steps. All such
options could be configured at VM startup time, as with
object identity hash codes.

(I think, in the future, using a couple iterations of the
hardware AES instruction, or a 64-bit multiply, might be
slightly more performant with better mixing, but that's a
reseach project for later, and in any case produces more
bits than would be useful to a 32-bit result.)

Again: None of the above suggestions are cryptographic; they
can all be reversed straightforwardly. But they are all
superior to the legacy 31*x+ hash code, and (crucially) if
we adopt a variable (salted) scheme, we can evolve the
algorithm. If we use a fixed algorithm, we are stuck with
it forever.

And 31*x+, as a fixed algorithm, is about the worst
choice. We are stuck with it for java.lang.String. We
don’t need new uses of it.

@mlchung
Copy link
Member Author

@mlchung mlchung commented Jun 9, 2021

This reminds me JDK-8201462.

FWIW the current implementation already adds the randomness to
the hash code computation with a constant random 32-bit “salt”
configured at startup but it still carries the 31*x+ pattern which
should be cleaned up.

@mlbridge
Copy link

@mlbridge mlbridge bot commented Jun 9, 2021

Mailing list message from John Rose on valhalla-dev:

On Jun 9, 2021, at 10:43 AM, Mandy Chung <mchung at openjdk.java.net<mailto:mchung at openjdk.java.net>> wrote:

On Tue, 8 Jun 2021 19:55:46 GMT, Mandy Chung <mchung at openjdk.org<mailto:mchung at openjdk.org>> wrote:

`Object::toString` implementation of a primitive class should return the traditional `ClassName at hash` (rather than listing the field values) not to leak any private and security-sensitive information. A primitive class can override `toString` implementation for their custom string representation.

This reminds me [JDK-8201462](https://bugs.openjdk.java.net/browse/JDK-8201462).

Yes. That is a larger brain dump. My comments
on this thread are hopefully narrower and more
actionable.

FWIW the current implementation already adds the randomness to
the hash code computation with a constant random 32-bit ?salt?
configured at startup but it still carries the `31*x+` pattern which
should be cleaned up.

I didn?t know that; good! Flexibility for the
future my main concern right now.

1 similar comment
@mlbridge
Copy link

@mlbridge mlbridge bot commented Jun 9, 2021

Mailing list message from John Rose on valhalla-dev:

On Jun 9, 2021, at 10:43 AM, Mandy Chung <mchung at openjdk.java.net<mailto:mchung at openjdk.java.net>> wrote:

On Tue, 8 Jun 2021 19:55:46 GMT, Mandy Chung <mchung at openjdk.org<mailto:mchung at openjdk.org>> wrote:

`Object::toString` implementation of a primitive class should return the traditional `ClassName at hash` (rather than listing the field values) not to leak any private and security-sensitive information. A primitive class can override `toString` implementation for their custom string representation.

This reminds me [JDK-8201462](https://bugs.openjdk.java.net/browse/JDK-8201462).

Yes. That is a larger brain dump. My comments
on this thread are hopefully narrower and more
actionable.

FWIW the current implementation already adds the randomness to
the hash code computation with a constant random 32-bit ?salt?
configured at startup but it still carries the `31*x+` pattern which
should be cleaned up.

I didn?t know that; good! Flexibility for the
future my main concern right now.

@mlchung
Copy link
Member Author

@mlchung mlchung commented Jun 11, 2021

/integrate

@openjdk
Copy link

@openjdk openjdk bot commented Jun 11, 2021

Going to push as commit 569dd54.

@openjdk openjdk bot closed this Jun 11, 2021
@openjdk openjdk bot added integrated and removed ready rfr labels Jun 11, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Jun 11, 2021

@mlchung Pushed as commit 569dd54.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@mlchung mlchung deleted the substitutable branch Jun 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
3 participants