New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why use something "close" to instead of just base 64 encoding diagram strings? #117
Comments
|
Hi, The bad news is that we cannot change this "legacy" encoding anymore because it's widely used by so many tools. The good news is that we are currently thinking about extending the encoding by adding a single character header in front of the URL. So here is our proposal about the new format: If you intend to use Deflate compression:
If you intend to use Brotli compression:
This way, the decoder could safely decode "legacy" encoding (because "legacy" never starts by "0" or "1") and regular Base64 encoding using the initial character header. Note that this is only a proposal and currently not implemented. What do you think about it ? |
|
@arnaudroques Thanks for the reply. That sounds like all too familiar of a situation 😅. Definitely understand. Having the "permalinks" is a great feature and breaking backwards compat would definitely be bad with so much already sitting out there and tools generating that format already. I see that's already been added since I last pulled the repo :) 🚨 Doesn't the new logic break existing Brotli links though? Doesn't it change leading 0 from: Just based off a quick compare of what I have checked out from last week vs the new logic. Apologies for not being able to look more deeply at release history, the new encoder logic or any tests. At the office atm. :P |
Yes it does. Fortunately, Brolti encoding has never been officially released or documented. So there are no "permalinks" yet. We were just about to release it, so your suggestion of using Base64 arrives just at the good time :-) Yes, we have already commited the leading "0" / leading "1" option, but we realized that we have done it too quickly: we figure out that there are some "legacy" Deflate encoding that does start with "1". So we are going to release yet another version, where the new headers will be different:
Does it sound good to you ? |
My instinct would be to change something else in the URI if for some reason there were not tricky gotchas with it or other constraints. If the only constraint was maintaining backwards compatibility, and objective to create a URI schema that is:
With all of that, this would be the scheme I would propose.
Examples:
If for some reason only the
Thoughts? @arnaudroques If this is interesting - even if as a longer term thing, I could start a PR over the weekend to see it's feasibility. And yes - I took some inspiration from the bcrypt prefixes :) https://en.wikipedia.org/wiki/Bcrypt |
|
Well, there are two different things, although highly related:
Text encoding is just a way of storing information. So your suggestions about new URI schema are good, but they should be moved there https://github.com/plantuml/plantuml-server/issues because they are related to the HTTP server, not to the core library. The core library does know nothing about URI/URL. So I agree with your objective to create a Text Encoding that is (I've added some) :
Inspiration from bcrypt prefixes is ok but $ character is not transfert safe. We could turn it into '-' for example so we may have : But this is not very compact. I prefer the following ones: And what about adding simple hex encoding ? This is even simpler than Base64 to implement. In version 1.2018.5 we have implemented some stuff, (see https://github.com/plantuml/plantuml/blob/master/src/net/sourceforge/plantuml/code/TranscoderSmart.java ) so that you can test and tell us if you like it. So back to URL (despite what we have written :-), the following examples are now working: http://www.plantuml.com/plantuml/png/SyfFKj2rKt3CoKnELR1Io4ZDoSa70000 Last examples are not (yet) permanent: the discussion is still in progress :-) |
|
I've got example on how other systems can impact the URLs. Quip will mangle pasted URLs where there is a matching pair of '_'. The url will now include ''. I've filed a bug on their side, but since I'm hosting my own rendering, I catch and convert the invalid URL. |
|
One thing missing in the encoding issue and the plantuml documentation (making this a much bigger issue) is how the plantuml encoding differs from base64. The difference is actually not that big. Where in base64 the mapping array for values 0-63 is: for plantuml the array is: Going from one to the other is thus a single string mapping. |
The `base64` encoding can cause issues with certain diagram definitions, especially when they contain multiple `include` directives. To prevent additional dependencies from having to be added to the component, the `hex` encoding is the most straightforward to implement and unofficially supported by PlantUML since version 1.2018.5 (see plantuml/plantuml#117). A problematic example is ``` !include <azure/AzureCommon> !include <azure/Containers/AzureContainerInstance> !include <azure/Identity/AzureActiveDirectory> !include <C4/C4_Container> ```
|
It has taken a very long time, but we have updated the documentation :-) Finally, we have choose We are working on brotli compression right now (the header will probably be Does it sound good to you? |
|
Hello! I'm the creator of kroki.io, a service that provides a unified API on top of popular diagrams libraries including PlantUML. The Kroki API is using deflate + base64 but I also support the "legacy" encoding using the following code: String text = URLDecoder.decode(source, "UTF-8");
try {
Transcoder transcoder = TranscoderUtil.getDefaultTranscoder();
text = transcoder.decode(text);
} catch (ArrayIndexOutOfBoundsException | IOException e) {
// Unable to decode with the PlantUML decoder, try the default decoder
text = DiagramSource.decode(text);
}
return text;The above code is still working but java.io.IOException: java.util.zip.DataFormatException: invalid stored block lengths
at net.sourceforge.plantuml.code.CompressionZlib.tryDecompress(CompressionZlib.java:130)
at net.sourceforge.plantuml.code.CompressionZlib.decompress(CompressionZlib.java:92)
at net.sourceforge.plantuml.code.TranscoderImpl.decode(TranscoderImpl.java:83)
at net.sourceforge.plantuml.code.TranscoderSmart.decode(TranscoderSmart.java:60)
at io.kroki.server.decode.DiagramSource.unsafePlantumlDecode(DiagramSource.java:48)
at io.kroki.server.decode.DiagramSource.plantumlDecode(DiagramSource.java:38)
at io.kroki.server.service.Plantuml$1.decode(Plantuml.java:100)
// stacktrace continues...
Cannot decode string
Not Huffman
Cannot decode stringAs far as I know, PlantUML does not use a logging library or slf4j so I cannot suppress the errors/warnings. I guess another solution would be to change the order in my code: String text = URLDecoder.decode(source, "UTF-8");
try {
// Try the default Kroki decoder
text = DiagramSource.decode(text);
} catch (ArrayIndexOutOfBoundsException | IOException e) {
// Unable to decode with the Kroki decoder, try the PlantUML decoder
Transcoder transcoder = TranscoderUtil.getDefaultTranscoder();
text = transcoder.decode(text);
}
return text;Thanks for your help! |
|
We have taken the easiest solution: we have removed the |
Sure, it sounds good, thanks Arnaud 👍 |
|
Note: I am replying here because this ticket is linked in the docs in relation to brotli... Is support for But the does not seem to be correct: I was experimenting with getting brotli compression working, but unsuccessful so far. Is there a working example somewhere? |


Hi there,
I was wondering if there was a particular reason that
AsciiEncoderdoes not just use standard base 64 encoding. Are there important benefits over standard b64 encoding?The use case where this is problematic is in trying to create a new client (particularly one that is not Java) that constructs properly formed URLs to send to a PlantUML server.
Having to essentially copy, paste and translate the Java code (or the PHP or JS version available in the docs) into Python.
Isn't it just much easier to say the encoded format is:
That transformation can be implemented in almost every language using off the shelf components and not having to potentially re-implement encoding, add the entire jar, or make a subprocess call, just to generate the encoded string.
The text was updated successfully, but these errors were encountered: