Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #3796, #3786: Implement UTF-8 support in java.util.zip classes #3814

Merged

Conversation

LeeTibbert
Copy link
Contributor

@LeeTibbert LeeTibbert commented Mar 6, 2024

Fix #3796 Fix #3786 Fix #937

  • javalib java.util.zip classes now support writing and reading UTF-8 ("Unicode Transformation Format – 8-bit")
    entry names and archive and entry comments.

  • java.util.zip.ZipOutputStream now follows the JVM practice of not throwing an Exception is zero entries
    are written. Former behavior was sensible, but not the JVM way.

  • both now use standard java.lang.String methods to do Charset conversions. In particular, this
    should now handle 4-byte UTF-8 codepoints.

  • Scala Native java.util.zip methods are still limited to original zip format. No zip64.
    Java 8 and above support zip64, so there is room for improvement here.

It should be noted that .zip files have a lot of time honored complexity, both at the zip level and, especially,
with .zip files written on one operating system being readable on another. The support implemented
in this is designed to match the JVM behavior. A file written by JVM ought to be readable by this code
and so on for the various 2x2 combinations.

Differing operating systems may or may not be able to display the UTF-8 file and comments of this
PR. The intention is that extracting files from archives created by the code of this PR should succeed
and have the expected UTF-8 names, to the greatest extent feasible.

TL; DR - When using UTF-8 names, if it works for you, great! If not, sorry. It may be a Scala Native bug
or it may be the joys of zip and interoperability. The goal is to reduce the former without
diminishing the latter.

Scala Native java.lang currently supports Unicode 13.0. The September 2023 Unicode version is 15.1.
The very latest emojis, etc may not be available. That is an open question.

@LeeTibbert LeeTibbert changed the title Fix #3798, #3786: Implement UTF-8 support in java.util.zip classes Fix #3796, #3786: Implement UTF-8 support in java.util.zip classes Mar 6, 2024
@LeeTibbert
Copy link
Contributor Author

Ready for review, when its turn comes around. Thank you.

The two failures are in macOS JVM compliance. One is a "signal 4". The other
is an error in "pipedOutput". Gratifying but strange that all the straight
macOS tests pass just fine. Go figure. Is there something different
in the environment of those two sets of tests?

Copy link
Contributor

@WojciechMazur WojciechMazur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@WojciechMazur WojciechMazur merged commit 3c5c8d4 into scala-native:main Mar 7, 2024
61 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants