Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EDI] Handle segment compression #114

Closed
DGollings opened this issue Nov 14, 2020 · 6 comments
Closed

[EDI] Handle segment compression #114

DGollings opened this issue Nov 14, 2020 · 6 comments
Labels

Comments

@DGollings
Copy link

Disclaimer: I only assume this is segment compression, as defined in the manual

7.1 Exclusion of segments
Conditional segments containing no data shall be omitted
(including their segment tags).

This is what I encountered in the schema, basically a mandatory/conditional sandwich.

SG25 R 99
43 NAD M 1
44 LOC Orts 9 O

SG25 R 99
45 NAD M 1
46 LOC O 9
    SG29 C 9
    47 RFF M 1

SG25 O 99
48 NAD M 1

SG25 D 99
49 NAD M 1

SG25 D 99
50 NAD M 1

SG25 O 99
51 NAD M 1

SG25 M 99
52 NAD M 1
    SG29 C 9
    53 RFF M 1

SG25 D 99
54 NAD M 1

SG25 R 99
55 NAD M 1

SG25 R 99
56 NAD M 1
    SG26 C 9
    57 CTA O 1
    58 COM O 9

None of the conditional statements were present in the data I was trying to parse, ended up fixing it using:

                    "name": "SG25-SENDER",
                    "min": 1,
                    "type": "segment_group",
                    "child_segments": [
                      {
                        "name": "NAD",
                        "min": 1,
                        "elements": [
                          { "name": "cityName", "index": 1 },
                          { "name": "provinceCode", "index": 2 },
                          { "name": "postalCode", "index": 3 },
                          { "name": "countryCode", "index": 4 }
                        ]
                      },
                      { "name": "LOC", "min": 0 }
                    ]
                  },
                  {
                    "name": "SG25-RECEIVER",
                    "min": 1,
                    "type": "segment_group",
                    "child_segments": [
                      { "name": "NAD", "min": 1 },
                      { "name": "LOC", "min": 0 },
                      {
                        "name": "SG29",
                        "min": 0,
                        "type": "segment_group",
                        "child_segments": [{ "name": "RFF", "min": 1 }]
                      }
                    ]
                  },
                  {
                    "name": "SG25-OTHERS",
                    "min": 0,
                    "max": 99,
                    "type": "segment_group",
                    "child_segments": [
                      {
                        "name": "SG26",
                        "min": 0,
                        "type": "segment_group",
                        "child_segments": [
                          { "name": "CTA", "min": 0 },
                          { "name": "COM", "min": 0, "max": -1 }
                        ]
                      },
                      { "name": "NAD", "min": 0, "max": -1 },
                      { "name": "LOC", "min": 0 },
                      {
                        "name": "SG29",
                        "min": 0,
                        "type": "segment_group",
                        "child_segments": [{ "name": "RFF", "min": 1 }]
                      }
                    ]
                  },

The message I'm trying to parse

NAD+CZ+46388514++Foo A/S+Foo 2+Foo++Foo+DK'
NAD+CN+46448510++NL01001 Foo Foo Foo:Foo+Foo 6+Foo++Foo+NL'
CTA+CN+AS:NL01001 Foo'
COM+0031765140344:TE'
COM+NL01001@Foo.com:EM'
NAD+LP+04900000250'

Which basically means, grab the two explicit ones (luckily at top), and do as you wish with the others in whatever order you encounter them. I'm not sure how I would have handled it if I did care about NAD+LP

Also had to use min/max 1 instead of the specified 99, as it only considers NAD, not NAD+FIRSTVALUE when 'collapsing' similar but not same segments.

Basically, the EDI specification has a lot of implicitness which I think is quite hard to easily parse.

@jf-tech jf-tech added the EDI label Nov 15, 2020
@jf-tech
Copy link
Owner

jf-tech commented Nov 15, 2020

@DGollings
It's a bit hard to guess lots of things from the excerpt of your EDI spec (the part contains

SG25 R 99
43 NAD M 1
44 LOC Orts 9 O
...

). If you can post your spec, or shoot me an email of your spec and sample data (Is

NAD+CZ+46388514++Foo A/S+Foo 2+Foo++Foo+DK'
NAD+CN+46448510++NL01001 Foo Foo Foo:Foo+Foo 6+Foo++Foo+NL'
CTA+CN+AS:NL01001 Foo'
COM+0031765140344:TE'
COM+NL01001@Foo.com:EM'
NAD+LP+04900000250'

full sample or a section of the sample?) and your schema, I can take a deeper look.

@DGollings
Copy link
Author

sure, had a look around but can't find your e-mail?

@jf-tech
Copy link
Owner

jf-tech commented Nov 15, 2020

jf dot tech dot llc at gmail.com

@jf-tech
Copy link
Owner

jf-tech commented Nov 16, 2020

@DGollings

What you discovered is what we encountered too in the past. There are so many optional SG25 and their child segments all look the same (a single NAD), e.g, like you what you listed in the issue:

SG25 O 99
48 NAD M 1

SG25 D 99
49 NAD M 1

SG25 D 99
50 NAD M 1

SG25 O 99
51 NAD M 1

It's nearly impossible (as far as I'm aware) to deterministically parse such SG25's: say you get a NAD, how do you/does the parser know this NAD is 48 NAD or 49 NAD or 50 NAD or 51 NAD? We were often frustrated by how partner specs were written. We discussed with UPS which uses EDI 240/214, they basically said while their spec is meant to be all inclusive, in their individual stream/channel of EDI files, each stream/channel doesn't contain non-deterministic combo of segs. In other words, let's say in your spec, they won't send something intention with a SG25 of 48 NAD followed by SG25 of 49 NAD, basically it is non-deterministic to decide so.

The problem isn't as trivial as what you described (aka stack popping). This eventually becomes a DFA or NFA matching problem (bit like regex): imagine we look at an input file vertically where each seg line is presented by a single character, now you can imagine this becomes actually regex pattern matching problem. As you are aware, regex pattern matching isn't deterministic and in extreme cases runtime can be exponential because of backtracking.

So we decided to implement our current greedy algorithm, basically the matchSegment() you've discovered.

As far as we're aware, the only other comprehensive EDI open source library https://www.smooks.org/ uses the same logic. I'm not sure how IBM/Oracle/MSFT implement their logic I doubt they go all the way to do DFA/NFA matching.

What it means is: it's kinda hopeless, nor wise, to attempt to implement an EDI schema that is literal and verbatim to a partner spec. We chose to live with the limitation and deal with individual channel and inspect input constructs and work with partner to verify how they generate such EDIs for that particular channel - exactly what you're doing here.

@jf-tech
Copy link
Owner

jf-tech commented Nov 17, 2020

@DGollings let me know if I can close the issue or there is more to discuss.

@DGollings
Copy link
Author

Oh no, the only possible 'trivial' solution would be something like this

Spec
Mandatory 1
Mandatory 1
Conditional 1
Conditional 1
Mandatory 1
Mandatory 1

If there's four segments don't do this:

Mandatory 1 <- 1
Mandatory 1 <- 2
Conditional 1 <-3
Conditional 1 <-4
Mandatory 1 <- error
Mandatory 1

but this

Mandatory 1 <- 1
Mandatory 1 <- 2
Conditional 1 <-ignore
Conditional 1 <-ignore
Mandatory 1 <- 3 (taken from C1)
Mandatory 1 <- 4 (taken from C2)

But that only works for very defined (and implicit) situations. I would barely know where to begin to implementing this:

Mandatory 99
Mandatory 99
Conditional 99
Conditional 99
Mandatory 99
Mandatory 99

With the same four segments as input

So agree, the current greedy match is best. And a debug mode would help the user figure out the hopelessness of attempting to implement the specs as designed :)

What might help anyone encountering this problem (mixed and unknown mandatory/conditional) is using a custom func:

                "parcel_identification": {
                  "custom_func": {
                    "name": "javascript",
                    "args": [
                      {
                        "const": "response = {};
for (i = 0; i < input.length; i++) {
    switch (input[i].type) {
        case '24':
            response.id = input[i].value;
            break;
        case '28':
            response.customer_id = input[i].value;
            break
    }
};
response"
                      },
                      { "const": "input" },
                      {
                        "array": [
                          {
                            "xpath": "SG37/PCI",
                            "object": {
                              "value": { "xpath": "value" },
                              "type": { "xpath": "type" }
                            }
                          }
                        ]
                      }
                    ]
                  }
                }

With input being something like
PCI+type+value

This returns an object with each 'type' in its own section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants