Fix #3796, #3786: Implement UTF-8 support in java.util.zip classes #3814
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix #3796 Fix #3786 Fix #937
javalib
java.util.zip
classes now support writing and reading UTF-8 ("Unicode Transformation Format – 8-bit")entry names and archive and entry comments.
java.util.zip.ZipOutputStream
now follows the JVM practice of not throwing an Exception is zero entriesare written. Former behavior was sensible, but not the JVM way.
both now use standard
java.lang.String
methods to do Charset conversions. In particular, thisshould now handle 4-byte UTF-8 codepoints.
Scala Native
java.util.zip
methods are still limited to original zip format. No zip64.Java 8 and above support zip64, so there is room for improvement here.
It should be noted that .zip files have a lot of time honored complexity, both at the zip level and, especially,
with .zip files written on one operating system being readable on another. The support implemented
in this is designed to match the JVM behavior. A file written by JVM ought to be readable by this code
and so on for the various 2x2 combinations.
Differing operating systems may or may not be able to display the UTF-8 file and comments of this
PR. The intention is that extracting files from archives created by the code of this PR should succeed
and have the expected UTF-8 names, to the greatest extent feasible.
TL; DR - When using UTF-8 names, if it works for you, great! If not, sorry. It may be a Scala Native bug
or it may be the joys of zip and interoperability. The goal is to reduce the former without
diminishing the latter.
Scala Native
java.lang
currently supports Unicode 13.0. The September 2023 Unicode version is 15.1.The very latest emojis, etc may not be available. That is an open question.