Conversation
|
This bug still persists: https://gitter.im/red/codecs?at=61b346418853d316403de86f |
|
Gee, I thought it was fixed already, thanks, I’ll fix it again. |
|
Found the problem, it happened only with empty tags. |
|
This PR also needs tests. |
Of course, they will be added. |
|
@dockimbel we're still finalizing this, so don't merge just yet. |
|
Talking now about how best to handle options. |
|
Is there some design document or just a documentation on how the Red datatypes are mapped to XML elements? |
|
https://gitlab.com/rebolek/markup/-/blob/main/README.md docs the output formats. There is no schema type system beyond that (though I built one for Prolific in the past). All tags map to words, unless you say to use strings as keys, and all other data is just kept as strings from being parsed. |
| target: [] | ||
| target-stack: [] | ||
|
|
||
| set 'load-xml func [ |
There was a problem hiding this comment.
There is a general problem with how the codecs are implemented. They don't respect the codec API (load/save), but provide adhoc global functions that are circumventing the load/as and save/as ones instead of just been simple handy wrappers on top of them. Though, as this is a global implementation issue in most codecs, I would not veto this PR based on that. I'm just mentioning it here, so that people gets aware of that and of a future refactoring of most codecs.
There was a problem hiding this comment.
For codecs that need options, the load/save interface needs to change. /as doesn't give you enough control. That means adding something completely general like /with, which has been a deep discussion for this codec, and how it compares to using refinements. Then encode/decode in each codec need to support a general /with options refinement, whether they need them or not.
I'm open to all suggestions, since we may not add many more standard codecs, but users will write them and we want those to use a consistent approach.
There was a problem hiding this comment.
Today, the load-*/save-* funcs are for when you need more control. We could eliminate some options from the codecs, so they work just as the defaults wrapped by encode/decode (so the calls go load -> decode -> load-* today).
The tradeoff in removing features is more work for users. e.g. we can always output /pretty for JSON, and let users trim/lines if they want. I believe we added /ascii after research on compatibility, but we can also say "forget people stuck with that" and always use UTF-8. For XML, we can always emit one format and post-process to get the others if people want them. That eliminates /as . We can also always include metadata and let users remove or ignore it if it's there. That eliminates /meta. I think /str-keys was there because XML names can be invalid words. We can tell people who hit that problem that those aren't supported, or we can always load them as strings and post-process to words or deal with them as strings; neither of which is great. We can leave out /str-keys for now, and see who complains. Adding it later won't be a breaking change. For those issues, we can normalize the interface and push the problem around. Is that a win overall?
Then there's CSV. XML is a pain, but a hard spec. CSV is a very soft spec, with lots of variants in the wild. Wrapping those details in the codec save the user enormous pain and possible errors. If we limit the options, it becomes much less useful. So it's a great, concrete example to use when we ask "How should codecs with options be handled?"
There was a problem hiding this comment.
The load-* first appeared in @giesse's JSON codec and I followed that path with CSV and XML. We can get rid of it but they allow for setting options as @greggirwin mentioned. If we can think of a different solution that would work for all codecs generaly (e.g. setting JPEG quality etc.) then I’m for getting rid of load-*/save-*.
There was a problem hiding this comment.
Cross posting from Gitter:
We are a week away from month end, and my goal is to get XML out, along with Animation. We'll note in the blog post that there may be changes, and that we are reviewing the overall codec interface.
The original JSON interface is a legacy issue for us now. R2 didn't have codecs, and that code has been around in various forms for a very long time. Romano did the first version, I took over maintenance and kept it up with Douglas Crockford on JSON.org, then @giesse, @rebolek, and @hiiamboris contributed to improving it; but the original interface lives on. :^\
I don't want to pull anyone's focus right now, but also don't want to just hack in a different interface without thinking it through. So it will be a pretty big task. It will likely also inform other extension models or how a system catalog works, even for I/O protocols perhaps.
environment/codecs/XML.red
Outdated
|
|
||
| Char: charset [ | ||
| #"^(01)" -#"^(D7FF)" #"^(E000)" - #"^(FFFD)" #"^(10000)" - #"^(10FFFF)" | ||
| ] |
There was a problem hiding this comment.
This charset eats 136 KB of memory.
environment/codecs/XML.red
Outdated
| #"^(200C)" - #"^(200D)" #"^(2070)" - #"^(218F)" #"^(2C00)" - #"^(2FEF)" | ||
| #"^(3001)" - #"^(D7FF)" #"^(F900)" - #"^(FDCF)" #"^(FDF0)" - #"^(FFFD)" | ||
| #"^(010000)" - #"^(0EFFFF)" | ||
| ] |
There was a problem hiding this comment.
This charset eats 120 KB of memory.
environment/codecs/XML.red
Outdated
|
|
||
| ; -- Character Range | ||
| Char: {[#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]} | ||
| RestrictedChar: {[#x1-#x8] | [#xB-#xC] | [#xE-#x1F] | [#x7F-#x84] | [#x86-#x9F]} |
There was a problem hiding this comment.
What's the point of those definitions as these words are redefined immediatly after?
There was a problem hiding this comment.
I'll hazard a guess that @rebolek meant to comment those out.
There was a problem hiding this comment.
My point about the charsets size is that such memory will be allocated on loading the XML codec, whether or not he user code calls the codec or not. We should be careful to not waste memory space unless needed. I suggest moving all the significant allocations to a
initfunction that is called the first time the codec API is invoked.Such
initcallback function should be part of the codec infrastructure and called automatically when needed without having each codec to implement the logic for that.
I’ll agree on the init function, I’ll update the codec.
environment/codecs/XML.red
Outdated
| NameChar: | ||
| name-char: union name-start-char charset [ | ||
| "-." #"0" - #"9" #"^(B7)" #"^(0300)" - #"^(036F)" #"^(203F)" - #"^(2040)" | ||
| ] |
There was a problem hiding this comment.
This charset eats 120 KB of memory.
|
My point about the charsets size is that such memory will be allocated on loading the XML codec, whether or not he user code calls the codec or not. We should be careful to not waste memory space unless needed. I suggest moving all the significant allocations to a Such |
|
As another global comment, I have always been in favor of a streaming approach for XML (and other formats like JSON and CSV) rather than static mappings to a predefined datastructure. XML for Red is just a serialization format to communicate with third-party software, it is not a native format for us. We should then treat it as that and not try to insert a potentially heavy datastructure between the raw XML data and the user's datastructures. The streaming approach (beyond all performance and resource usage benefits) allows the user to map directly the XML raw data to the final datastructure and is used in the industry since two decades. I haven't looked the the generative counterpart to streaming parser, but we can certainly come up with a good design for that part too. Again, I won't veto this PR based on that comment. I plan to implement in the future (in my "20% fun time with Red") at least one streaming parser to show how it can be done in Red efficiently. |
I agree with not wasting space and time, but are we better off making |
|
I'll be very anxious to see how streaming works, and I think @rebolek has done a SAX parser in the past. I will say that I think the static model is easy to reason about, and we should see what existing systems use streaming JSON or XML that we can test with. Formats designed for streaming are a different story. Formats not designed for streaming pose issues like "I did a whole bunch of work and then hit a malformed bit." Now I have to track state, make sure if I've taken any actions that need to be undone they can be undone with information I have, etc. OTOH, eventing systems and aggregation are going to be key components, so thinking about it is important. |
|
Messaging in general is key, and how you envelope things to pass them around, etc. It's also a wide world of needs and approaches. Everything from MQTT, fire and forget, to 0MQ brokering systems. |
@rebolek put the This is not a criticism of Bolek's code, but of how we address issues. We don't have modules today, but we have worse code to live with once we could easily leave XML out. Given this change (and I hate to see the time and effort on |
|
@greggirwin I can unpack the rule names but I agree that it’s much more fragile. Too bad we can’t extend objects as that would solve this problem and we didn’t have to have the words defined twice. |
|
You're overengineering and dramatizing IMO. Nenad's point was to only stash charsets into |
|
You need to list test file in |
|
@greggirwin everything should be ready for merging. |
|
Looks like compiler didn't swallow that :) |
Can you show me the error? I tried compiling it without a problem (after fixing a few lines). |
|
https://ci.appveyor.com/project/red/red/builds/42439531 |
|
I can't reproduce it either. |
|
Is AppVeyor back to being reliable? It had issues for a while. If you can both compile, and I will do that today as I haven't, that's good enough for me. @dockimbel I think we're ready. |
|
@rebolek please don't forget to doc how to separate metadata from the document. |
|
@greggirwin Added a function There is a little inconsistency in the |
|
On key-val using words instead of set-words, yes, please make that change. Good catch. |
|
@rebolek %xml-tools.red is missing a |
|
@greggirwin I must've deleted it by accident and it was still present in the console session. Thanks for finding it out! |
|
Check for leaks ;) |
|
@hiiamboris thanks! fixed. I guess there’s nothing more left to do for now. |
|
Ping @dockimbel. Ready for merging here. |
|
@dockimbel it's been a while now. Do you remember if this merge was reverted because we weren't sure about whether it should be standard or in red/code? Or because we still had questions about the options interface? Something else? |
|
I was looking at my draft blog post and realized XML wasn't there like I thought it was. |
|
I believe the reason was if it should be standard or in red/code. I’m fine putting it into red/code but then we probably should have some (even temporary until we have some packaging system) easy method to include code from red/code. |
|
Agreed, if it goes in red/code, we need a standard solution for loading it. Nenad may also have been concerned about the effect on compile times. But we could wrap it in |
|
No R/S in the code that I see at a glance. |
|
Right, there’s no R/S. |
No description provided.