[EDI] Handle segment compression #114

DGollings · 2020-11-14T19:07:11Z

Disclaimer: I only assume this is segment compression, as defined in the manual

7.1 Exclusion of segments
Conditional segments containing no data shall be omitted
(including their segment tags).

This is what I encountered in the schema, basically a mandatory/conditional sandwich.

SG25 R 99
43 NAD M 1
44 LOC Orts 9 O

SG25 R 99
45 NAD M 1
46 LOC O 9
    SG29 C 9
    47 RFF M 1

SG25 O 99
48 NAD M 1

SG25 D 99
49 NAD M 1

SG25 D 99
50 NAD M 1

SG25 O 99
51 NAD M 1

SG25 M 99
52 NAD M 1
    SG29 C 9
    53 RFF M 1

SG25 D 99
54 NAD M 1

SG25 R 99
55 NAD M 1

SG25 R 99
56 NAD M 1
    SG26 C 9
    57 CTA O 1
    58 COM O 9

None of the conditional statements were present in the data I was trying to parse, ended up fixing it using:

                    "name": "SG25-SENDER",
                    "min": 1,
                    "type": "segment_group",
                    "child_segments": [
                      {
                        "name": "NAD",
                        "min": 1,
                        "elements": [
                          { "name": "cityName", "index": 1 },
                          { "name": "provinceCode", "index": 2 },
                          { "name": "postalCode", "index": 3 },
                          { "name": "countryCode", "index": 4 }
                        ]
                      },
                      { "name": "LOC", "min": 0 }
                    ]
                  },
                  {
                    "name": "SG25-RECEIVER",
                    "min": 1,
                    "type": "segment_group",
                    "child_segments": [
                      { "name": "NAD", "min": 1 },
                      { "name": "LOC", "min": 0 },
                      {
                        "name": "SG29",
                        "min": 0,
                        "type": "segment_group",
                        "child_segments": [{ "name": "RFF", "min": 1 }]
                      }
                    ]
                  },
                  {
                    "name": "SG25-OTHERS",
                    "min": 0,
                    "max": 99,
                    "type": "segment_group",
                    "child_segments": [
                      {
                        "name": "SG26",
                        "min": 0,
                        "type": "segment_group",
                        "child_segments": [
                          { "name": "CTA", "min": 0 },
                          { "name": "COM", "min": 0, "max": -1 }
                        ]
                      },
                      { "name": "NAD", "min": 0, "max": -1 },
                      { "name": "LOC", "min": 0 },
                      {
                        "name": "SG29",
                        "min": 0,
                        "type": "segment_group",
                        "child_segments": [{ "name": "RFF", "min": 1 }]
                      }
                    ]
                  },

The message I'm trying to parse

NAD+CZ+46388514++Foo A/S+Foo 2+Foo++Foo+DK'
NAD+CN+46448510++NL01001 Foo Foo Foo:Foo+Foo 6+Foo++Foo+NL'
CTA+CN+AS:NL01001 Foo'
COM+0031765140344:TE'
COM+NL01001@Foo.com:EM'
NAD+LP+04900000250'

Which basically means, grab the two explicit ones (luckily at top), and do as you wish with the others in whatever order you encounter them. I'm not sure how I would have handled it if I did care about NAD+LP

Also had to use min/max 1 instead of the specified 99, as it only considers NAD, not NAD+FIRSTVALUE when 'collapsing' similar but not same segments.

Basically, the EDI specification has a lot of implicitness which I think is quite hard to easily parse.

The text was updated successfully, but these errors were encountered:

jf-tech · 2020-11-15T04:41:04Z

@DGollings
It's a bit hard to guess lots of things from the excerpt of your EDI spec (the part contains

SG25 R 99
43 NAD M 1
44 LOC Orts 9 O
...

). If you can post your spec, or shoot me an email of your spec and sample data (Is

NAD+CZ+46388514++Foo A/S+Foo 2+Foo++Foo+DK'
NAD+CN+46448510++NL01001 Foo Foo Foo:Foo+Foo 6+Foo++Foo+NL'
CTA+CN+AS:NL01001 Foo'
COM+0031765140344:TE'
COM+NL01001@Foo.com:EM'
NAD+LP+04900000250'

full sample or a section of the sample?) and your schema, I can take a deeper look.

DGollings · 2020-11-15T15:37:31Z

sure, had a look around but can't find your e-mail?

jf-tech · 2020-11-15T16:11:08Z

jf dot tech dot llc at gmail.com

jf-tech · 2020-11-16T04:55:22Z

@DGollings

What you discovered is what we encountered too in the past. There are so many optional SG25 and their child segments all look the same (a single NAD), e.g, like you what you listed in the issue:

SG25 O 99
48 NAD M 1

SG25 D 99
49 NAD M 1

SG25 D 99
50 NAD M 1

SG25 O 99
51 NAD M 1

It's nearly impossible (as far as I'm aware) to deterministically parse such SG25's: say you get a NAD, how do you/does the parser know this NAD is 48 NAD or 49 NAD or 50 NAD or 51 NAD? We were often frustrated by how partner specs were written. We discussed with UPS which uses EDI 240/214, they basically said while their spec is meant to be all inclusive, in their individual stream/channel of EDI files, each stream/channel doesn't contain non-deterministic combo of segs. In other words, let's say in your spec, they won't send something intention with a SG25 of 48 NAD followed by SG25 of 49 NAD, basically it is non-deterministic to decide so.

The problem isn't as trivial as what you described (aka stack popping). This eventually becomes a DFA or NFA matching problem (bit like regex): imagine we look at an input file vertically where each seg line is presented by a single character, now you can imagine this becomes actually regex pattern matching problem. As you are aware, regex pattern matching isn't deterministic and in extreme cases runtime can be exponential because of backtracking.

So we decided to implement our current greedy algorithm, basically the matchSegment() you've discovered.

As far as we're aware, the only other comprehensive EDI open source library https://www.smooks.org/ uses the same logic. I'm not sure how IBM/Oracle/MSFT implement their logic I doubt they go all the way to do DFA/NFA matching.

What it means is: it's kinda hopeless, nor wise, to attempt to implement an EDI schema that is literal and verbatim to a partner spec. We chose to live with the limitation and deal with individual channel and inspect input constructs and work with partner to verify how they generate such EDIs for that particular channel - exactly what you're doing here.

jf-tech · 2020-11-17T03:19:26Z

@DGollings let me know if I can close the issue or there is more to discuss.

DGollings · 2020-11-17T14:09:29Z

Oh no, the only possible 'trivial' solution would be something like this

Spec
Mandatory 1
Mandatory 1
Conditional 1
Conditional 1
Mandatory 1
Mandatory 1

If there's four segments don't do this:

Mandatory 1 <- 1
Mandatory 1 <- 2
Conditional 1 <-3
Conditional 1 <-4
Mandatory 1 <- error
Mandatory 1

but this

Mandatory 1 <- 1
Mandatory 1 <- 2
Conditional 1 <-ignore
Conditional 1 <-ignore
Mandatory 1 <- 3 (taken from C1)
Mandatory 1 <- 4 (taken from C2)

But that only works for very defined (and implicit) situations. I would barely know where to begin to implementing this:

Mandatory 99
Mandatory 99
Conditional 99
Conditional 99
Mandatory 99
Mandatory 99

With the same four segments as input

So agree, the current greedy match is best. And a debug mode would help the user figure out the hopelessness of attempting to implement the specs as designed :)

What might help anyone encountering this problem (mixed and unknown mandatory/conditional) is using a custom func:

                "parcel_identification": {
                  "custom_func": {
                    "name": "javascript",
                    "args": [
                      {
                        "const": "response = {};
for (i = 0; i < input.length; i++) {
    switch (input[i].type) {
        case '24':
            response.id = input[i].value;
            break;
        case '28':
            response.customer_id = input[i].value;
            break
    }
};
response"
                      },
                      { "const": "input" },
                      {
                        "array": [
                          {
                            "xpath": "SG37/PCI",
                            "object": {
                              "value": { "xpath": "value" },
                              "type": { "xpath": "type" }
                            }
                          }
                        ]
                      }
                    ]
                  }
                }

With input being something like
PCI+type+value

This returns an object with each 'type' in its own section.

jf-tech added the EDI label Nov 15, 2020

DGollings closed this as completed Nov 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EDI] Handle segment compression #114

[EDI] Handle segment compression #114

DGollings commented Nov 14, 2020

jf-tech commented Nov 15, 2020 •

edited

Loading

DGollings commented Nov 15, 2020

jf-tech commented Nov 15, 2020

jf-tech commented Nov 16, 2020

jf-tech commented Nov 17, 2020

DGollings commented Nov 17, 2020

[EDI] Handle segment compression #114

[EDI] Handle segment compression #114

Comments

DGollings commented Nov 14, 2020

jf-tech commented Nov 15, 2020 • edited Loading

DGollings commented Nov 15, 2020

jf-tech commented Nov 15, 2020

jf-tech commented Nov 16, 2020

jf-tech commented Nov 17, 2020

DGollings commented Nov 17, 2020

jf-tech commented Nov 15, 2020 •

edited

Loading