Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MANIFEST.MF created by java plugin not always valid UTF-8 #5225

Closed
floscher opened this issue May 1, 2018 · 9 comments
Closed

MANIFEST.MF created by java plugin not always valid UTF-8 #5225

floscher opened this issue May 1, 2018 · 9 comments

Comments

@floscher
Copy link

floscher commented May 1, 2018

Expected Behavior

When creating a *.jar file with the java plugin and a custom manifest, the resulting manifest should be a plain text UTF-8 file.

Example MANIFEST.MF (how I'd expect it)

Current Behavior

If the manifest attributes contain characters that are represented as multiple bytes, these characters can be cut in two by line breaks. The result is a file that is not a valid UTF-8 text file.

Example MANIFEST.MF (how it is currently produced by the jar task)

Context

The JAR File specification rules that each line must be at most 72 bytes long (it does not say if a line that continues in the next lines is allowed to be shorter).

Gradle uses the class java.util.jar.Manifest to write the manifest. This class inserts a newline for overlong lines at exactly 72 bytes, even if that's in the middle of a character.

I reported this issue already to Oracle, but maybe you can fix this by using an alternative for creating the MANIFEST.MF file, especially because there are also other issues with java.util.jar.Manifest, like #2295.

Steps to Reproduce

Add custom manifest attributes to the jar task. The attribute should be long (so it will end up on multiple lines) and contain multi-byte characters (optimally use only multibyte characters as value).

Then inspect the MANIFEST.MF in the *.jar file that is produced by Gradle.

I created an example project demonstrating this issue: https://github.com/floscher/gradle-manifest-multibyte-demo

Related

@marcphilipp
Copy link
Contributor

Thanks for the bug report! Are there other implementations for writing the manifest that do not have this flaw?

@floscher
Copy link
Author

floscher commented May 2, 2018

@marcphilipp I'm currently not aware of one, but haven't really looked into other implementations either.

@floscher
Copy link
Author

floscher commented May 2, 2018

This would be a fix for the implementation in the JDK (based on commit ea246151be08 in the OpenJDK mercurial repo):

Subject: [PATCH] 8202525: Fix make72Safe() to never split UTF-8 characters

---
 src/java.base/share/classes/java/util/jar/Manifest.java | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/src/java.base/share/classes/java/util/jar/Manifest.java b/src/java.base/share/classes/java/util/jar/Manifest.java
index 8cc53d6..0654d1d 100644
--- a/src/java.base/share/classes/java/util/jar/Manifest.java
+++ b/src/java.base/share/classes/java/util/jar/Manifest.java
@@ -168,14 +168,19 @@ public class Manifest implements Cloneable {
   /**
    * Adds line breaks to enforce a maximum 72 bytes per line.
    */
   static void make72Safe(StringBuffer line) {
     int length = line.length();
     int index = 72;
     while (index < length) {
+      // Decrement index until it points at the first byte of a UTF-8 encoded character
+      final int minIndex = index - 3;
+      while ((line.charAt(index) & 0xC0) == 0x80 && index > minIndex) {
+        index--;
+      }
       line.insert(index, "\r\n ");
       index += 74; // + line width + line break ("\r\n")
       length += 3; // + line break ("\r\n") and space
     }
     return;
   }
 

Edit: I added the patch to my bugreport, let's see if it makes its way through…

@floscher
Copy link
Author

floscher commented May 3, 2018

Marked as won't fix in the OpenJDK issue tracker, because they say it conforms to the spec (my proposal would do that also and make MANIFEST.MF always a valid text file).

@eskatos
Copy link
Member

eskatos commented May 4, 2018

Please also note that the Gradle implementation, just like all other implementations out there, has to properly consume such valid (but surprising) manifest files as they are produced like that by the default implementation.

One thing that could be considered is to always produce manifest files that do not split multi-byte characters over lines. But as I understand it, it would be more about manifest files being easily read by humans than for better interoperability.

Again, all implementations out there consuming manifest files shouldn't choke on split multi-bytes characters. Moreover, "fixing" the default implementation won't change the state of affairs for already published manifest files.

@floscher
Copy link
Author

floscher commented May 4, 2018

@eskatos Absolutely, my intention is for MANIFEST.MF files that will be created in the future to be valid text files, not binary files. That's what the patch above would have changed.

Any implementation reading manifest files must be able to read files with either variant (split or not split multibyte characters), sure. Both are allowed by the specification.

@floscher
Copy link
Author

floscher commented May 4, 2018

I just re-read the specification and found that it does really only allow full UTF-8 characters:

header:          name : value
name:            alphanum *headerchar
value:           SPACE *otherchar newline *continuation
continuation:    SPACE *otherchar newline
alphanum:        {A-Z} | {a-z} | {0-9}
headerchar:      alphanum | - | _
otherchar:       any UTF-8 character except NUL, CR and LF

So a value for an attribute consists of a space, then zero or more UTF8-characters (except NUL, CR and LF) and then a newline.

I'll try to post that also to the OpenJDK bugtracker, but they didn't publish the last comment I sent them, so this one might also not show up.

@vlsi vlsi mentioned this issue Sep 15, 2019
16 tasks
vlsi added a commit to vlsi/gradle that referenced this issue Oct 27, 2019
* Print entries in the proper order (Java8 uses HashMap)
* Do not split surrogate pair characters when wrapping manifest lines

fixes gradle#2295, gradle#5225

Signed-off-by: Vladimir Sitnikov <sitnikov.vladimir@gmail.com>
@ljacomet ljacomet added the @jvm label Feb 18, 2020
@stale
Copy link

stale bot commented Feb 17, 2021

This issue has been automatically marked as stale because it has not had recent activity. Given the limited bandwidth of the team, it will be automatically closed if no further activity occurs. If you're interested in how we try to keep the backlog in a healthy state, please read our blog post on how we refine our backlog. If you feel this is something you could contribute, please have a look at our Contributor Guide. Thank you for your contribution.

@stale stale bot added the stale label Feb 17, 2021
@stale
Copy link

stale bot commented Mar 11, 2021

This issue has been automatically closed due to inactivity. If you can reproduce this on a recent version of Gradle or if you have a good use case for this feature, please feel free to reopen the issue with steps to reproduce, a quick explanation of your use case or a high-quality pull request.

@stale stale bot closed this as completed Mar 11, 2021
floscher added a commit to floscher/gradle-josm-plugin that referenced this issue Feb 6, 2022
@wolfs wolfs closed this as not planned Won't fix, can't repro, duplicate, stale Sep 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants