Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse option ArrayMode not working #68

Closed
7 tasks done
bb opened this issue Mar 29, 2018 · 8 comments
Closed
7 tasks done

parse option ArrayMode not working #68

bb opened this issue Mar 29, 2018 · 8 comments

Comments

@bb
Copy link
Contributor

bb commented Mar 29, 2018

Checklist

Please fill below checklist

  • Are you running the latest version? 3.3.6
  • Have you include sample input?
  • Have you include actual output?
  • Have you include expected output?
  • Have you checked if you are using correct configuration?
  • Did you try online tool?
  • Did you bookmarked the repository for further updates? ;)

Input data

<x><y><z>z1</z><z>z2</z></y><w>w1</w></x>

Code

import * as FXP from "fast-xml-parser";
import { parseString } from "xml2js";

const xml = "<x><y><z>z1</z><z>z2</z></y><w>w1</w></x>";
parseString(xml, { explicitArray: false }, (err, res) => console.log("xml2js      ", JSON.stringify(res)));
parseString(xml, { explicitArray: true }, (err, res) => console.log("xml2js array", JSON.stringify(res)));
console.log("FXP         ", JSON.stringify(FXP.parse(xml)));
console.log("FXP array   ", JSON.stringify(FXP.parse(xml, { arrayMode: true })));

Output data

xml2js       {"x":{"y":{"z":["z1","z2"]},"w":"w1"}}
xml2js array {"x":{"y":[{"z":["z1","z2"]}],"w":["w1"]}}
FXP          {"x":{"y":{"z":["z1","z2"]},"w":"w1"}}
FXP array    {"x":{"y":{"z":["z1","z2"]},"w":"w1"}}

expected data

xml2js       {"x":{"y":{"z":["z1","z2"]},"w":"w1"}}
xml2js array {"x":{"y":[{"z":["z1","z2"]}],"w":["w1"]}}
FXP          {"x":{"y":{"z":["z1","z2"]},"w":"w1"}}
FXP array    {"x":{"y":[{"z":["z1","z2"]}],"w":["w1"]}}
@bb
Copy link
Contributor Author

bb commented Mar 29, 2018

BTW, I didn't see arrayMode in the docs, but in the source code, e.g https://github.com/NaturalIntelligence/fast-xml-parser/blob/master/src/x2j.js#L22

If I may add a feature request here: It would be great to have the arrayMode configurable, e.g. per tag name:

parse(xml, {forceArrayTags: ["w1"]}) // expected: {"x":{"y":{"z":["z1","z2"]},"w":["w1"]}}

Instead of a tag list as config value, one might consider a arrayModeDecider like this:

parse(xml, {arrayMode: a => a.match(/^w$/) }) // expected: {"x":{"y":{"z":["z1","z2"]},"w":["w1"]}}

Personally, I currently only need tag names, so any of those approaches would work for me. But maybe there are other use cases which would require other callback parameters.

@amitguptagwl
Copy link
Member

amitguptagwl commented Mar 30, 2018

Array mode was previously added by some PR. But since it was having some issues and the original requirement/need was not cleared to me, I decided to remove it from the docs at the time of massive refactoring to support big XML files. So the existing project who are using FXP will not face any issue and new users will not use it until it is properly implemented and tested.

So I would be happy if you raise a PR having separate test file testing all the expectations.

But before that what we need to understand is the need of array mode. Why and in what scenario it can help along with expectations for different type of inputs.

Thanks for raising this issue.

@bb
Copy link
Contributor Author

bb commented Mar 30, 2018

My use case is quite simple: I have elements which sometimes have one child of a name, sometimes multiple. Here's a very simple example:

<root>
    <item id="a">
        <sub>1</sub>
        <sub>2</sub>
    </item>
    <item id="b">
        <sub>3</sub>
    </item>
</root>
{
	"root": {
		"item": [{
			"sub": [1, 2]
		}, {
			"sub": 3
		}]
	}
}

I'd like to be able to always treat the value of "sub" as an array, because in most of my cases it is. But if there's only one value, it is not. Sometimes, I forgot checking the single child case and run into bugs.

Without the option arrayMode, I need to check, like this:

obj.root.item.map(item => !Array.isArray(item.sub) ? [item.sub] : item.sub)

instead of this version which I'd use with option arrayMode.

obj.root.item.map(item => item.sub)

I usually do not want all elements to be arrays, so instead of a global explicit boolean, I'd prefer a version which is based on tag names or xpath or a shouldEnforceArraysForSingleItemsProcessor as given in my last example.
In xml2js, I handled my parsing of those cases not using its explicitArray option but using a validator:

explicitArray: false,
validator: (xpath: string, currentValue: string, newValue: string): string | string[] => {
    if (!currentValue && xpath.split("/").pop() === "sub") {
        return [newValue];
    }
    return newValue;
}

It's (minimally) documented here: https://github.com/Leonidas-from-XIV/node-xml2js/blob/master/test/parser.test.coffee#L33
And called from here: https://github.com/Leonidas-from-XIV/node-xml2js/blob/master/src/parser.coffee#L150

For the above xml example, the validator calls which would actually be executed be like this:

validator("/root/item/sub", undefined, "1");
validator("/root/item/sub", "1", "2");
validator("/root/item", undefined, {"$":{"id":"a"},"sub":["1","2"]});
validator("/root/item/sub", undefined, "3");
validator("/root/item", {"$":{"id":"a"},"sub":["1","2"]}, {"$":{"id":"b"},"sub":"3"});
validator("/root", undefined, {"item":[{"$":{"id":"a"},"sub":["1","2"]},{"$":{"id":"b"},"sub":"3"}]});

I used this validator to get the above infos:

validator: (xpath, currentValue, newValue) => {
    console.log(`validator("${xpath}", ${JSON.stringify(currentValue)}, ${JSON.stringify(newValue)});`);
    return newValue;
}

So, instead of providing explicitArray support directly, I'd rather implement validator which would cover all those use cases. One might even provide a few default validators, like one for explicitArrays etc.

By the way, I don't like the name validator that much. I think I'd rather call it something like elementPostProcessor. I could even imagine having multiple of those chained together for different aspects.

One downside I see when using this kind of validators is the extra time/memory spent for creating the xpath. In my cases it's worth it because I can modify the output structure directly and don't need to do another pass.

I could even imagine more complex usages like creating real classes directly, etc. - but, honestly, this might be a lot of work, both conceptually and then actually implementing it. I'll see when I have time for this and provide a PR. Feel free to close this issue if you don't hear from me in a while ;)

@amitguptagwl
Copy link
Member

Sorry for the late response. I was away from my machine for few days

I would appreciate that you always include downside of the proposed changes.

I agree on

  • having tags/values as array is a good option against having extra condition or iterating whole JSON again.
  • transforming values of a specific tag instead of all
  • elementPostProcessor is better name than validator. And since there is already a validator, it may confuse usrs.

But

The main goal of FXP is to transform XML into Nimn or JSON as fast as possible and to be compatible with maximum browsers. If there is any feature which is impacting the speed of this library drastically but being used by only few users, I would better not to implement that.

Moreover, can we think of any other post processing feature in addition of array mode for selective tags/ values? If No, we don't need the generic name.

Now for the implementation, I believe we can simply check if there is any child node with length === 1 (which is already present) put it into array for all or specific json/xml path. I hope it should not impact the performance very much in this case.

@rbscott
Copy link

rbscott commented Apr 3, 2018

I just started looking at this library as a possibility for parsing xml, and I have a lot of cases where order is important. I think something like arrayMode is what we are looking for as it seems like there is a no way to maintain order if the child elements are converted into dictionaries.

@bb
Copy link
Contributor Author

bb commented Apr 3, 2018

@rbscott: I'm afraid that's yet another array mode you're looking for. As far as I understood, it's only for sibling elements having the same tag; at least that's my use case. You're right: by using a map/object as container, there's no order, so basically you can't use this library for mixed content markup but only for data structure (de-)serialization.

@amitguptagwl
Copy link
Member

array mode (considering that it is not impacting performance) would be an optional option which can be documented with the warning about how deserialization or the sequence of tags can be impacted.

However as the insertion order is always maintained in Array, I can't understand why we can't maintain the order of child elements.

@amitguptagwl
Copy link
Member

Closing the issue due to no activity.. feel free to reopen or to continue discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants