-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MANIFEST.MF created by java plugin not always valid UTF-8 #5225
Comments
Thanks for the bug report! Are there other implementations for writing the manifest that do not have this flaw? |
@marcphilipp I'm currently not aware of one, but haven't really looked into other implementations either. |
This would be a fix for the implementation in the JDK (based on commit Subject: [PATCH] 8202525: Fix make72Safe() to never split UTF-8 characters
---
src/java.base/share/classes/java/util/jar/Manifest.java | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/src/java.base/share/classes/java/util/jar/Manifest.java b/src/java.base/share/classes/java/util/jar/Manifest.java
index 8cc53d6..0654d1d 100644
--- a/src/java.base/share/classes/java/util/jar/Manifest.java
+++ b/src/java.base/share/classes/java/util/jar/Manifest.java
@@ -168,14 +168,19 @@ public class Manifest implements Cloneable {
/**
* Adds line breaks to enforce a maximum 72 bytes per line.
*/
static void make72Safe(StringBuffer line) {
int length = line.length();
int index = 72;
while (index < length) {
+ // Decrement index until it points at the first byte of a UTF-8 encoded character
+ final int minIndex = index - 3;
+ while ((line.charAt(index) & 0xC0) == 0x80 && index > minIndex) {
+ index--;
+ }
line.insert(index, "\r\n ");
index += 74; // + line width + line break ("\r\n")
length += 3; // + line break ("\r\n") and space
}
return;
}
Edit: I added the patch to my bugreport, let's see if it makes its way through… |
Marked as won't fix in the OpenJDK issue tracker, because they say it conforms to the spec (my proposal would do that also and make MANIFEST.MF always a valid text file). |
Please also note that the Gradle implementation, just like all other implementations out there, has to properly consume such valid (but surprising) manifest files as they are produced like that by the default implementation. One thing that could be considered is to always produce manifest files that do not split multi-byte characters over lines. But as I understand it, it would be more about manifest files being easily read by humans than for better interoperability. Again, all implementations out there consuming manifest files shouldn't choke on split multi-bytes characters. Moreover, "fixing" the default implementation won't change the state of affairs for already published manifest files. |
@eskatos Absolutely, my intention is for Any implementation reading manifest files must be able to read files with either variant (split or not split multibyte characters), sure. Both are allowed by the specification. |
I just re-read the specification and found that it does really only allow full UTF-8 characters:
So a value for an attribute consists of a space, then zero or more UTF8-characters (except NUL, CR and LF) and then a newline. I'll try to post that also to the OpenJDK bugtracker, but they didn't publish the last comment I sent them, so this one might also not show up. |
* Print entries in the proper order (Java8 uses HashMap) * Do not split surrogate pair characters when wrapping manifest lines fixes gradle#2295, gradle#5225 Signed-off-by: Vladimir Sitnikov <sitnikov.vladimir@gmail.com>
This issue has been automatically marked as stale because it has not had recent activity. Given the limited bandwidth of the team, it will be automatically closed if no further activity occurs. If you're interested in how we try to keep the backlog in a healthy state, please read our blog post on how we refine our backlog. If you feel this is something you could contribute, please have a look at our Contributor Guide. Thank you for your contribution. |
This issue has been automatically closed due to inactivity. If you can reproduce this on a recent version of Gradle or if you have a good use case for this feature, please feel free to reopen the issue with steps to reproduce, a quick explanation of your use case or a high-quality pull request. |
…e always valid UTF-8 This was reported in gradle/gradle#5225
Expected Behavior
When creating a
*.jar
file with thejava
plugin and a custom manifest, the resulting manifest should be a plain text UTF-8 file.Example MANIFEST.MF (how I'd expect it)
Current Behavior
If the manifest attributes contain characters that are represented as multiple bytes, these characters can be cut in two by line breaks. The result is a file that is not a valid UTF-8 text file.
Example MANIFEST.MF (how it is currently produced by the
jar
task)Context
The JAR File specification rules that each line must be at most 72 bytes long (it does not say if a line that continues in the next lines is allowed to be shorter).
Gradle uses the class
java.util.jar.Manifest
to write the manifest. This class inserts a newline for overlong lines at exactly 72 bytes, even if that's in the middle of a character.I reported this issue already to Oracle, but maybe you can fix this by using an alternative for creating the
MANIFEST.MF
file, especially because there are also other issues withjava.util.jar.Manifest
, like #2295.Steps to Reproduce
Add custom manifest attributes to the
jar
task. The attribute should be long (so it will end up on multiple lines) and contain multi-byte characters (optimally use only multibyte characters as value).Then inspect the
MANIFEST.MF
in the *.jar file that is produced by Gradle.I created an example project demonstrating this issue: https://github.com/floscher/gradle-manifest-multibyte-demo
Related
The text was updated successfully, but these errors were encountered: