-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8320971: Use BufferedInputStream.buf directly when param of implTransferTo() is trusted #16879
Conversation
👋 Welcome back stsypanov! A progress list of the required criteria for merging this PR into |
@stsypanov The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
// trust all OutputStreams from java.io | ||
if (out.getClass().getPackageName() == BufferedInputStream.class.getPackageName()) { | ||
out.write(getBufIfOpen(), pos, count); | ||
} else { | ||
// Prevent poisoning and leaking of buf | ||
byte[] buffer = Arrays.copyOfRange(getBufIfOpen(), pos, count); | ||
out.write(buffer); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// trust all OutputStreams from java.io | |
if (out.getClass().getPackageName() == BufferedInputStream.class.getPackageName()) { | |
out.write(getBufIfOpen(), pos, count); | |
} else { | |
// Prevent poisoning and leaking of buf | |
byte[] buffer = Arrays.copyOfRange(getBufIfOpen(), pos, count); | |
out.write(buffer); | |
} | |
out.write(getBufIfOpen(), pos, count); |
What do you think of passing the buffer as is?
ByteArrayInputStream
passes the buffer without extra copies anyway:
jdk/src/java.base/share/classes/java/io/ByteArrayInputStream.java
Lines 207 to 213 in 9a6ca23
public synchronized long transferTo(OutputStream out) throws IOException { | |
int len = count - pos; | |
if (len > 0) { | |
int nwritten = 0; | |
while (nwritten < len) { | |
int nbyte = Integer.min(len - nwritten, MAX_TRANSFER_SIZE); | |
out.write(buf, pos, nbyte); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of passing the buffer as is?
No, it should only do for trusted targets. BAIS has an issue in that area that should be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The buffer in question is protected, so any subclass can directly access it. In other words, untrusted code can easily acoess the buffer, and it does not sound fair to add extra overhead to the method which was created for the performance reasons.
Does copyOfRange do any good here? Do you mean JDK should copy every buffer it passes to non-JDK code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The buffer in question is protected, so any subclass can directly access it. In other words, untrusted code can easily acoess the buffer, and it does not sound fair to add extra overhead to the method which was created for the performance reasons.
If something extends BIS then it has access to the protected fields, this isn't a concern here. Instead, the concern is about passing a reference to the target output stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vlsi Yes, unless the JRE comes up with read-only buffers all untrusted code should get copies of JRE-internal buffers only to provide buffer poisoning and spying data located beyond range limits. Subclasses are free to do what they want with the inherited buffer (it is their buffer implicitly), but target output stream might be an injected bad guy that we must not trust in any regard.
byte[] buffer = Arrays.copyOfRange(getBufIfOpen(), pos, count); | ||
out.write(buffer); | ||
// trust all OutputStreams from java.io | ||
if (out.getClass().getPackageName() == BufferedInputStream.class.getPackageName()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think Class::getPackageName documents that the returned String is intern so I wonder if the == check will lead to questions and suggestions of a bug. Classes with names starting with "java." can only be defined to the boot or platform class loader (details in the ClassLoader API docs) so you could just check if the package name equals "java.io". We of course first need to think through the implications of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we only want to trust java.io or anything starting with java.*?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think checking if the package is java.io is secure:
ByteArrayInputStream bais = new ByteArrayInputStream(bytes);
BufferedInputStream bis = new BufferedInputStream(bais);
UntrustedOutputStream uos = new UntrustedOutputStream();
bis.transferTo(new java.io.DataOutputStream(uos));
You have to know that it is in the java.io package and it doesn't wrap another stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to know that it is in the java.io package and it doesn't wrap another stream.
That is a good point. In the previous work on this override, we converged on the current implementation to not leak the internal byte[] to the target. It could be special cased for trusted targets but at the cost of auditing and complexity. So more thought needed on this, I don't think the current change can be integrated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of adding OutputStream extends WritableByteChannel
, so transferTo
could be implemented with if (target instanceof WritableByteChannel) { target.write(ByteBuffer.wrap(buf, off, len).asReadOnly()); }
?
Read-only byte buffer will not allow modifying the data in the buffer, it would eliminate buffer copies, and adding write(ByteBuffer)
to OutputStream
seems reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created a draft change for OutputStream#write(ByteBuffer)
, and I have benchmarked several cases including ByteArrayInputStream.transferTo(BufferedOutputStream(ByteArrayOutputStream))
, ByteArrayInputStream.transferTo(DataOutputStream(FileOutputStream))
.
Benchmarks show there's improvement in both allocation rate and latency.
Read-only ByteBuffers
can address poisoning concerns as non-jdk code can't peek into read-only arrays.
The write(ByteBuffer)
approach supports cases like CheckedOutputStream
, DataOutputStream
which would be hard or impossible to optimize when passing naked byte arrays.
Here are the results: https://gist.github.com/vlsi/7f4411515a4f2dbb0925fffde92ccb1d
Here is the diff: vlsi/jdk@ce10844...write_bytebuffer
@mkarg , @bplb , @AlanBateman , could you please review OutputStream.write(ByteBuffer)
approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vlsi This is a very interesting solution 🤩, but it opens a number of questions! 🤔 As a start, here are mine:
- You propose a new public method (
OutputStream.write(ByteBuffer)
) as part of your solution. We need to discuss whether this is worth the effort to change all implementations (and to perceive acceptable performance, all custom authors need to optimize their custom classes, too). We also need to discuss whether we like the design choice that de facto the public API (not just the implementation) of the legacy IO domain (OutputStream
) is now linked to the NIO domain (ByteBuffer
) (which, IMHO, feels some kind of scary beyondChannelOutputStream
). - Just thinking loudly: I wonder if we could simplify the solution, or if we could turn parts of it into some generic
byte[] readOnly(byte[])
utility. - I would like to know how much faster the solution with
ByteBuffer
is compared toArrays.copyOfRange()
: Is it really so huge that it is worth the additional trouble? - I would like to know how much slower the solution with
ByteBuffer
is compared to skippingArrays.copyOfRange()
for trusted cases (as you removed the skipping). - I would like to know the performance of custom streams, because your default implementation is filling a temporary byte array with a Java-implemented byte-by-byte loop, which IMHO would be rather painful compared to hardware-supported
copyOrRange()
, and it does that even in case of trusted targets. - @briangoetz Wouldn't we rather teach the Java language some day to provide native read-only arrays? That would be much faster, much better to read and implies less complex code for the end user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mkarg , thank you for the review, I replied in JIRA to avoid mixing comments across different issues: https://bugs.openjdk.org/browse/JDK-8321271?focusedId=14634794&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14634794
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While frozen arrays are on our radar, this is a significant lift, so unlikely to be coming all that soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose to separate this PR from Vladimir's proposal, so we can merge this PR first, and if we later see that Vladimir's changes are accepted, we can merge them ontop. Agreed everybody?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for further optimizing transferTo()! If you fix what Alan proposed, this PR LGTM! 👍
byte[] buffer = Arrays.copyOfRange(getBufIfOpen(), pos, count); | ||
out.write(buffer); | ||
// trust all OutputStreams from java.io | ||
if (out.getClass().getPackageName() == BufferedInputStream.class.getPackageName()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we only want to trust java.io or anything starting with java.*?
If there is already test coverage of this area (I don't think there is), then the |
Did you review if all Java.* streams are safe? There are a few stream adapters in sun.nio.ch, which would benefit this optimization too, unfortunately they wrap the arrays with ByteBuffer.wrap, I guess that’s not safe, so the package can’t be allowed? |
@ecki I've checked the streams in |
i think modification is not the problem, the querstion is if they get exposed to user code. (but yes the readonly ByteBuffer wrapper looks like a good thing to use more). |
@ecki , what do you think of using read-only See #16879 (comment) It looks like there might be |
i am not completely sure if exposing buffers is a problem in terms of dirty data and if thats an issue with those wrappers. Well honestly it cant be anissue since we dont have untrusted code, but I understand future undertakings need to take this into account (insert SecurityManager rant here :) |
I think it is safe to poison and leak
Thoughts? |
I think I'm just reliving #10525 (comment) |
* | ||
* @return true if the argument of {@link #write(byte[])}} and {@link #write(byte[], int, int)}} needn't be copied | ||
*/ | ||
boolean trusted() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a strange construction. Any subclass could simply implement this as return true;
. Where is the guard against this, and why not doing it that way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically speaking, OutputStream
is an abstract class
, so this declaration of boolean trusted()
is a package-protected method that will be visible and overridable only within JDK itself.
However, I agree it looks suspicious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on this method doesn't look right. The issue for BAIS is that its lifetime may be different to the byte[] that it wraps, think about use after the BAIS has been discarded. You don't want a sink keeping a reference to the byte[] even if it doesn't scribble on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mkarg I guess the method can only be implemented by subclasses residing in the same package with OutputStream
, right?
@AlanBateman fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stsypanov Yes but still it is just weird to ask any output stream if it is trusted. I mean, it feels just unsecure to ask: "Do you pretend to be trusted?" instead of "Do we trust you?". I could sleep better if this method would not be part of each OutputStream subclass. We should either move it back to the place where needed, or into some static utility like OutputStreams::isTrusted(OutputStream)
(mind the plural!), or it should at least be final
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Deleted) the new version doesn’t have the issue (albeit now it’s rather complicated formulated) “If stream can be used by JCL callers without extra copies of the byte[] argument to write(). This is not over writeable by user code.”
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike BAIS, the BufferedInputStream can wrap an untrusted InputStream. BIS.buf is passed directly to wrapped InputStream so I would assume that we would want to avoid exposing BIS.buf to the out
parameter of transferTo. This way we know the input stream is not able to poison the output stream when a write is in progress.
I assume that the current small list of trusted classes are also immune to poison so I imagine this patch is safe. However, for any FilterInputStream we should either always use Arrays.copyOfRange because the input side can poison the output side or it needs a mirroring allow list for the target input stream to insure that the wrapped input stream is not poisoning the out parameter.
For instance, java.util.zip.CheckedOutputStream in theory could be added as trusted class as it doesn't leak or poison but, looking at the code it would appear that it is not immune to poison.
if (!Arrays.equals(buf, bis.readAllBytes())) { | ||
throw new RuntimeException("Internal buffer was modified"); | ||
} | ||
var internalBuffer = bis.getClass().getDeclaredField("buf"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO you don't need to whitebox here, as the blackbox approach described by me earlier should be sufficient to assert the claim of this PR.
} | ||
var internalBuffer = bis.getClass().getDeclaredField("buf"); | ||
internalBuffer.setAccessible(true); | ||
if (!Arrays.equals(buf, Arrays.copyOf((byte[]) internalBuffer.get(bis), length))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not testing the absence of a copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but how could I intercept the argument of OutputStream.write() and check it's identity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I wrote earlier today to Vladimir (see above) I actually do not see a solution for this, frankly spoken, as the interceptor would not be trusted. That's why I asked Brian for his ok to keep the test as-is. We need to wait for his judgement, as he is the reviewer, not me (I only try to give initial advice). My comment was more a hint for me to wait for Brian's judgement, less a request to you to fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bplb are you ok with the current implementation and its test?
* <li>does not leak a reference to the {@code byte[]} to non-trusted classes</li> | ||
* <li>does not modify the contents of the {@code byte[]}</li> | ||
* <li>{@code OutputStream.write(byte[], int, int)} write does not read the contents outside of the offset/length bounds</li> | ||
* </ul> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation change looks fine, just a typo at L671 where it says "write write", I think you can shorten this to say that the write method doesn't read the contents outside of the offset/length bounds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Can we somehow modify the test to make it white-box one? Maybe it's possible to measure memory allocation before and after method invocation in the way that we could use the difference as a proof of non-allocating invocation for trusted OutputStreams?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you mean "to make it a blackbox"? Actually I do not see how we could do that reliably, as I already wrote recently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
com.sun.management.ThreadMXBean#getCurrentThreadAllocatedBytes
is reliable, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to its JavaDocs it only returns an approximation, whatever that means. And it is measuring memory, while I already explained that the aim of this PR IMHO is not to spare memory, but to spare time. I wonder why we do not simply merge this PR as-is to get the actual benefits integrated ASAP instead of holding the train by heating up this discussion again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mkarg apologies, of course I meant black box test.
/integrate |
@stsypanov This pull request has not yet been marked as ready for integration. |
@AlanBateman Kindly requesting your review. :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation changes looks fine, that's for taking all the feedback to get this one to a good place.
@bplb has been reviewing the test, I'll stay out of that and let Brian finish his review.
@stsypanov This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 21 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@AlanBateman, @bplb) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
I think that the test looks all right, but the copyright might need to be changed to |
/integrate |
@stsypanov |
/sponsor |
Going to push as commit 38042ad.
Your commit was automatically rebased without conflicts. |
@bplb @stsypanov Pushed as commit 38042ad. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
It looks like we can skip copying of
byte[]
inBufferedInputStream.implTransferTo()
forOutputStreams
residing injava.io
.See comment by @vlsi in https://github.com/openjdk/jdk/pull/10525/files#diff-e19c508d1bb6ee78697ecca66947c395adda0d9c49a85bf696e677ecbd977af1R612
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16879/head:pull/16879
$ git checkout pull/16879
Update a local copy of the PR:
$ git checkout pull/16879
$ git pull https://git.openjdk.org/jdk.git pull/16879/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 16879
View PR using the GUI difftool:
$ git pr show -t 16879
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16879.diff
Webrev
Link to Webrev Comment