parse option ArrayMode not working #68

bb · 2018-03-29T22:03:44Z

Checklist

Please fill below checklist

Are you running the latest version? 3.3.6
Have you include sample input?
Have you include actual output?
Have you include expected output?
Have you checked if you are using correct configuration?
Did you try online tool?
Did you bookmarked the repository for further updates? ;)

Input data

<x><y><z>z1</z><z>z2</z></y><w>w1</w></x>

Code

import * as FXP from "fast-xml-parser";
import { parseString } from "xml2js";

const xml = "<x><y><z>z1</z><z>z2</z></y><w>w1</w></x>";
parseString(xml, { explicitArray: false }, (err, res) => console.log("xml2js      ", JSON.stringify(res)));
parseString(xml, { explicitArray: true }, (err, res) => console.log("xml2js array", JSON.stringify(res)));
console.log("FXP         ", JSON.stringify(FXP.parse(xml)));
console.log("FXP array   ", JSON.stringify(FXP.parse(xml, { arrayMode: true })));

Output data

xml2js       {"x":{"y":{"z":["z1","z2"]},"w":"w1"}}
xml2js array {"x":{"y":[{"z":["z1","z2"]}],"w":["w1"]}}
FXP          {"x":{"y":{"z":["z1","z2"]},"w":"w1"}}
FXP array    {"x":{"y":{"z":["z1","z2"]},"w":"w1"}}

expected data

xml2js       {"x":{"y":{"z":["z1","z2"]},"w":"w1"}}
xml2js array {"x":{"y":[{"z":["z1","z2"]}],"w":["w1"]}}
FXP          {"x":{"y":{"z":["z1","z2"]},"w":"w1"}}
FXP array    {"x":{"y":[{"z":["z1","z2"]}],"w":["w1"]}}

The text was updated successfully, but these errors were encountered:

bb · 2018-03-29T22:15:13Z

BTW, I didn't see arrayMode in the docs, but in the source code, e.g https://github.com/NaturalIntelligence/fast-xml-parser/blob/master/src/x2j.js#L22

If I may add a feature request here: It would be great to have the arrayMode configurable, e.g. per tag name:

parse(xml, {forceArrayTags: ["w1"]}) // expected: {"x":{"y":{"z":["z1","z2"]},"w":["w1"]}}

Instead of a tag list as config value, one might consider a arrayModeDecider like this:

parse(xml, {arrayMode: a => a.match(/^w$/) }) // expected: {"x":{"y":{"z":["z1","z2"]},"w":["w1"]}}

Personally, I currently only need tag names, so any of those approaches would work for me. But maybe there are other use cases which would require other callback parameters.

amitguptagwl · 2018-03-30T03:58:27Z

Array mode was previously added by some PR. But since it was having some issues and the original requirement/need was not cleared to me, I decided to remove it from the docs at the time of massive refactoring to support big XML files. So the existing project who are using FXP will not face any issue and new users will not use it until it is properly implemented and tested.

So I would be happy if you raise a PR having separate test file testing all the expectations.

But before that what we need to understand is the need of array mode. Why and in what scenario it can help along with expectations for different type of inputs.

Thanks for raising this issue.

bb · 2018-03-30T15:26:25Z

My use case is quite simple: I have elements which sometimes have one child of a name, sometimes multiple. Here's a very simple example:

<root>
    <item id="a">
        <sub>1</sub>
        <sub>2</sub>
    </item>
    <item id="b">
        <sub>3</sub>
    </item>
</root>

{
	"root": {
		"item": [{
			"sub": [1, 2]
		}, {
			"sub": 3
		}]
	}
}

I'd like to be able to always treat the value of "sub" as an array, because in most of my cases it is. But if there's only one value, it is not. Sometimes, I forgot checking the single child case and run into bugs.

Without the option arrayMode, I need to check, like this:

obj.root.item.map(item => !Array.isArray(item.sub) ? [item.sub] : item.sub)

instead of this version which I'd use with option arrayMode.

obj.root.item.map(item => item.sub)

I usually do not want all elements to be arrays, so instead of a global explicit boolean, I'd prefer a version which is based on tag names or xpath or a shouldEnforceArraysForSingleItemsProcessor as given in my last example.
In xml2js, I handled my parsing of those cases not using its explicitArray option but using a validator:

explicitArray: false,
validator: (xpath: string, currentValue: string, newValue: string): string | string[] => {
    if (!currentValue && xpath.split("/").pop() === "sub") {
        return [newValue];
    }
    return newValue;
}

It's (minimally) documented here: https://github.com/Leonidas-from-XIV/node-xml2js/blob/master/test/parser.test.coffee#L33
And called from here: https://github.com/Leonidas-from-XIV/node-xml2js/blob/master/src/parser.coffee#L150

For the above xml example, the validator calls which would actually be executed be like this:

validator("/root/item/sub", undefined, "1");
validator("/root/item/sub", "1", "2");
validator("/root/item", undefined, {"$":{"id":"a"},"sub":["1","2"]});
validator("/root/item/sub", undefined, "3");
validator("/root/item", {"$":{"id":"a"},"sub":["1","2"]}, {"$":{"id":"b"},"sub":"3"});
validator("/root", undefined, {"item":[{"$":{"id":"a"},"sub":["1","2"]},{"$":{"id":"b"},"sub":"3"}]});

I used this validator to get the above infos:

validator: (xpath, currentValue, newValue) => {
    console.log(`validator("${xpath}", ${JSON.stringify(currentValue)}, ${JSON.stringify(newValue)});`);
    return newValue;
}

So, instead of providing explicitArray support directly, I'd rather implement validator which would cover all those use cases. One might even provide a few default validators, like one for explicitArrays etc.

By the way, I don't like the name validator that much. I think I'd rather call it something like elementPostProcessor. I could even imagine having multiple of those chained together for different aspects.

One downside I see when using this kind of validators is the extra time/memory spent for creating the xpath. In my cases it's worth it because I can modify the output structure directly and don't need to do another pass.

I could even imagine more complex usages like creating real classes directly, etc. - but, honestly, this might be a lot of work, both conceptually and then actually implementing it. I'll see when I have time for this and provide a PR. Feel free to close this issue if you don't hear from me in a while ;)

amitguptagwl · 2018-04-02T07:12:32Z

Sorry for the late response. I was away from my machine for few days

I would appreciate that you always include downside of the proposed changes.

I agree on

having tags/values as array is a good option against having extra condition or iterating whole JSON again.
transforming values of a specific tag instead of all
elementPostProcessor is better name than validator. And since there is already a validator, it may confuse usrs.

But

The main goal of FXP is to transform XML into Nimn or JSON as fast as possible and to be compatible with maximum browsers. If there is any feature which is impacting the speed of this library drastically but being used by only few users, I would better not to implement that.

Moreover, can we think of any other post processing feature in addition of array mode for selective tags/ values? If No, we don't need the generic name.

Now for the implementation, I believe we can simply check if there is any child node with length === 1 (which is already present) put it into array for all or specific json/xml path. I hope it should not impact the performance very much in this case.

rbscott · 2018-04-03T19:08:23Z

I just started looking at this library as a possibility for parsing xml, and I have a lot of cases where order is important. I think something like arrayMode is what we are looking for as it seems like there is a no way to maintain order if the child elements are converted into dictionaries.

bb · 2018-04-03T19:49:09Z

@rbscott: I'm afraid that's yet another array mode you're looking for. As far as I understood, it's only for sibling elements having the same tag; at least that's my use case. You're right: by using a map/object as container, there's no order, so basically you can't use this library for mixed content markup but only for data structure (de-)serialization.

amitguptagwl · 2018-04-04T02:01:47Z

array mode (considering that it is not impacting performance) would be an optional option which can be documented with the warning about how deserialization or the sequence of tags can be impacted.

However as the insertion order is always maintained in Array, I can't understand why we can't maintain the order of child elements.

amitguptagwl · 2018-04-14T01:44:48Z

Closing the issue due to no activity.. feel free to reopen or to continue discussion.

See: NaturalIntelligence/fast-xml-parser#68

amitguptagwl added the enhancement label Apr 5, 2018

amitguptagwl closed this as completed Apr 14, 2018

semekh added a commit to hamravesh/fast-xml-parser that referenced this issue May 28, 2018

Force arrayMode

acc5200

See: NaturalIntelligence/fast-xml-parser#68

amitguptagwl mentioned this issue May 3, 2019

Implement a way of parsing xml to closely resemble that of xml. #160

Closed

ebozduman mentioned this issue Sep 29, 2020

Replaces the old xml parser with a new, faster one minio/minio-js#861

Merged

dantio mentioned this issue Jan 27, 2021

feat(arrayMode): support RegEx and function in arrayMode option #316

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parse option ArrayMode not working #68

parse option ArrayMode not working #68

bb commented Mar 29, 2018

bb commented Mar 29, 2018

amitguptagwl commented Mar 30, 2018 •

edited

bb commented Mar 30, 2018

amitguptagwl commented Apr 2, 2018

rbscott commented Apr 3, 2018

bb commented Apr 3, 2018

amitguptagwl commented Apr 4, 2018

amitguptagwl commented Apr 14, 2018

parse option ArrayMode not working #68

parse option ArrayMode not working #68

Comments

bb commented Mar 29, 2018

Checklist

Input data

Code

Output data

expected data

bb commented Mar 29, 2018

amitguptagwl commented Mar 30, 2018 • edited

bb commented Mar 30, 2018

amitguptagwl commented Apr 2, 2018

rbscott commented Apr 3, 2018

bb commented Apr 3, 2018

amitguptagwl commented Apr 4, 2018

amitguptagwl commented Apr 14, 2018

amitguptagwl commented Mar 30, 2018 •

edited