Generalize xml helper code #839

DanRStevens · 2021-03-18T06:18:01Z

Related to Issue #797, and PR #838.

It's been quite a little while since I worked on this (about 6 months). At this point I don't quite remember why I stopped and left it. I think there was something I was still working on, though maybe I just got busy with other stuff. It's a conflicting approach to PR #838. At any rate, I thought I'd push what I had as a point of comparison:
https://github.com/lairworks/nas2d-core/compare/generalizeXmlHelperCode?expand=1

Maybe some ideas will be worth taking from there.

This allows parsing of alternate file formats, and returning of alternate parsed data structures.

It seems old versions of Clang are not able to match the auto return type parameter in this context. The stock install of Clang on Ubuntu 18.04 is one of those old versions of Clang.

Brett208 · 2021-03-19T02:40:22Z

What are your thoughts about using element values instead of attributes to store data in dictionary format? More like what I did with PlanetAttributes.xml instead of the current save format?

I am generally happy with the concept of storing data in dictionary form and planned to move more towards parsing in this fashion as discussed in the issues. I would just prefer it be in elements. Maybe best for me to switch to attributes though to be more consistent though...

DanRStevens · 2021-03-19T08:51:04Z

Curious question, under what conditions would you consider using attributes?

I would certainly use element values for recursive structures. Outside of that, I find the choice is a little more arbitrary.

DanRStevens · 2021-03-19T10:46:53Z

Found some interesting discussion points here:
https://stackoverflow.com/questions/1096797/should-i-use-elements-or-attributes-in-xml

Brett208 · 2021-03-21T23:14:07Z

@DanRStevens,

Sorry for the long time between replies. I read the stack overflow you pointed to, and that article is striking the heart of what I'm talking about, in addition to the post it refers to at (https://stackoverflow.com/questions/33746/xml-attribute-vs-xml-element).

It seems like there are 2 camps, OPHD sort of falls in the first camp that extensively uses attributes and I am used to the other camp. When writing XML, I avoid attributes almost exclusively unless it is metadata that does not belong in the actual structure of the parsed data.

Current OPHD Serialization:

<OutpostHD_SaveGame version="0.31">
    <mines>
        <mine x="14" y="85" depth="0" active="0" yield="0" flags="001111" />
    </mines>
</OutpostHD_SaveGame>

How I would have written the snippet:

<OutpostHD_SaveGame version="0.31">
    <mines>
        <mine>
            <x>14</x>
            <y>85</y>
            <depth>0</depth>
            <active>0</active>
            <yield>0</yield>
            <flags>001111</flags>
        </mine>
    </mines>
</OutpostHD_SaveGame>

How I would have written it is certainly more verbose. Probably a individual preference on which is easier to read.

I would just accept the OPHD style here of using attributes, but I think it breaks down for large text fields. For example:

<Planets>
	<Planet>
		<PlanetType>Mercury</PlanetType>
		<ImagePath>planets/planet_d.png</ImagePath>
		<Hostility>High</Hostility>
		<MaxDepth>1</MaxDepth>
		<MaxMines>10</MaxMines>
		<MapImagePath>maps/merc_01</MapImagePath>
		<TilesetPath>tsets/mercury.png</TilesetPath>
		<Name>Mercury Type</Name>
		<MeanSolarDistance>0.4</MeanSolarDistance>
		<Description>Small, hot and not very friendly. This planet resembles our very own Mercury. Close to its parent star, few potential mine locations, very shallow dig depth, no magnetic field and no atmosphere. &#xA; &#xA; The Payoff? The few mines you do get are higher yield and you get plenty of solar energy.</Description>
	</Planet>
</Planets>

Versus how I think OPHD currently would format it using attributes:

<Planets>
	<Planet PlanetType="Mercury" ImagePath="planets/planet_d.png" Hostility="High" MaxDepth="1" MaxMines="10" MapImagePath="maps/merc_01" TilesetPath="tsets/mercury.png Name="Mercury Type" MeanSolarDistance="0.4" Description="Small, hot and not very friendly. This planet resembles our very own Mercury. Close to its parent star, few potential mine locations, very shallow dig depth, no magnetic field and no atmosphere. &#xA; &#xA; The Payoff? The few mines you do get are higher yield and you get plenty of solar energy.">
</Planets>

(These fields probably should use the CDATA escapes. You may want to copy into notepadd++ or something similar to view. GitHub doesn't wrap the text in 'code' mode, and if I don't put it in code mode, it doesn't want to display it for probably good security reasons.)

In the PlanetAttributes case the significant amount of text involved in the attributes reduces readability whereas the attributes looked cleaner with more numerical fields in the mine representation in the saved game.

I would have probably just stuck with using attributes, but it seems incorrect in my mind to put large text blocks into attributes such as the PlanetDescription.

In the past, I have hand written my XML field parsing, which means that a solution could be tailored either direction as needed. One of the downsides of hand writing the parsing is it tends to couple tightly with the implementation (XML in this case). I understand you are advocating for a universal dictionary approach in the project which would remove the tight coupling. Unfortunately, the universal approach becomes more complex if you create edge conditions such as if the text being serialized is a paragraph, use a child element, else always use attributes.

Based on what I've read so far, I'm leaning towards rewriting the PlanetAttributes to use attributes for all stored data and more align with the current code base.

Admittedly, I have read through some of the source code, played with some examples, and made a cursory read of most the associated issues, and have not really 'deep dived' the issue.

Anyways, if this all sounds good, I would recommend merging this PR and I'll rewrite the PlanetAttributes to match this style. Then I can split PlanetAttributes to just parse a dictionary and push the parsing into NAS2D to remove all xml calls from PlanetAttributes.

Thanks for reading. -Brett

DanRStevens · 2021-03-22T09:45:21Z

I would agree that attributes are not ideal for large blocks of text, and the CDATA concern is valid.

In terms of using a consistent approach, that might only be relevant for the writing code. It would be possible to adapt the reading code so it could ingest data from either an attribute or a child element. In the case of planet data, it's static data packaged with the game, which will only ever be read.

Previously there was some dual format read code to handle a custom "options" section, which was formatted a little differently. When loading the config file, it would parse either format, but then only save in the one consistent format. It was eventually removed though, after a transition period.

I would be open to more dual format reading code. That would allow the planet data to be read as is, or perhaps with a mixed form where only the longer description field is an element.

DanRStevens · 2021-03-22T12:29:30Z

Ok, I've had a chance to re-read the code. I remember a couple of things I still wanted to work on:

The last commit changed the template parameters to parseXmlFileData, which required manually specifying the return type at the point of call. I wasn't super happy about it. It was needed to work around some compile problems in older versions of Clang, such as the one shipped with Ubuntu 18.04. Perhaps there's a way to write the method so the return type can be auto inferred, and won't cause compile issues with Clang. I never got around to figuring that out.
Using templates means the function definitions need to be in the header file, which required the XML include to also be in the header file. With older code it was possible to only include the XML header from the .cpp file for the parsing code. I wasn't super happy about moving the include to the header file, which has a cascade include effect on all including files. This pollutes the namespace of those translation units, and leads to (slightly) slower compiles.

As mildly frustrating as those flaws are, they're probably not worth holding back the code updates. Maybe they can be fixed in a future branch.

I'm going to go with Brett's recommendation to merge. Particularly since other fresh items have been brought up that may depends on these or similar changes.

DanRStevens added 7 commits March 18, 2021 00:03

Parameterize method used to parse root element

830494b

This allows parsing of alternate file formats, and returning of alternate parsed data structures.

Generalize saveData code

6c00497

Move generic XML parsing helpers to ParserHelper

d41adb4

Move generic XML formatting helpers to ParserHelper

025641a

Move top level XML parsing/formatting functions to ParserHelper

87783cf

Use more specific XML includes

c46b0b0

Add type parameter to avoid use of auto

901d34b

It seems old versions of Clang are not able to match the auto return type parameter in this context. The stock install of Clang on Ubuntu 18.04 is one of those old versions of Clang.

DanRStevens merged commit 721b39b into master Mar 22, 2021

DanRStevens deleted the generalizeXmlHelperCode branch March 22, 2021 12:29

DanRStevens mentioned this pull request Mar 22, 2021

Xml serializer #838

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalize xml helper code #839

Generalize xml helper code #839

DanRStevens commented Mar 18, 2021

Brett208 commented Mar 19, 2021

DanRStevens commented Mar 19, 2021

DanRStevens commented Mar 19, 2021

Brett208 commented Mar 21, 2021

DanRStevens commented Mar 22, 2021

DanRStevens commented Mar 22, 2021

Generalize xml helper code #839

Generalize xml helper code #839

Conversation

DanRStevens commented Mar 18, 2021

Brett208 commented Mar 19, 2021

DanRStevens commented Mar 19, 2021

DanRStevens commented Mar 19, 2021

Brett208 commented Mar 21, 2021

DanRStevens commented Mar 22, 2021

DanRStevens commented Mar 22, 2021