New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Examples/strategies for dealing with broken file format implementations. #49
Comments
I've took a look at the PDF you're linking in, and the problems you're describing, and, well, I don't think that I can offer any "magic bullet" solutions to these ones. Even if you do it by hand (i.e. writing an imperative parser), both of these hacks would still remain hacks, still ugly and hard to maintain. If more hacks would pile on top of that (i.e. what you mention by "20 or 50"), of course, nothing good will come from that either: would it be .ksy, any other declarative format, or hand-coded algorithm, hacks would still be hacks :( The best we can do is try to keep it as organized and easy-to-maintain as we can. For the first problem, i.e. one missing byte from the beginning of the packet, I would opt for something like that: meta:
id: energycam_mbus
endian: le
seq:
- id: len
type: u1
doc: Number of bytes following (without 2 CRCs * 2 bytes)
- id: b2
type: u1
- id: b3
type: u1
- id: b4
type: u1
if: not is_byte_skipping_vendor_hack
# Read 3 bytes normally, read only 2 bytes if we have a byte_skipping_vendor
- id: ident_number
type: u4
doc: Part of A-Field, IdentNumber
instances:
is_byte_skipping_vendor_hack:
value: b2 == 0x11 and b3 == 0x22 # or whatever ManufacturerID of glitchy vendor would there be
command:
value: 'is_byte_skipping_vendor_hack ? 0x44 : b2'
doc: C-Field, command transmitted
manufacturer:
value: 'is_byte_skipping_vendor_hack ? (b3 << 8) + b2 : (b4 << 8) + b3'
doc: M-Field, ManufacturerID Basically, you start by reading b2-b3, then call It seems fairly clean compromise from my point of view, and it generally as effective as it can get (i.e. you'll have to write something similar with such decision point as |
Guten Tag Mikhail Yakshin,
Anyway, thanks a lot for your example. In the end all of those hacks If not, how easy/maintainable do you think is it to extend your Do you provide protected access to reading methods where I could hook I still need to have a look at the code you generated today... Mit freundlichen Grüßen, Thorsten Schöning Thorsten Schöning E-Mail: Thorsten.Schoening@AM-SoFT.de Telefon...........05151- 9468- 55 AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln |
Flipping a bit sounds not that scary either, as flipping some flip is essentially a XOR with something like meta:
id: energycam_mbus
endian: le
# ...
instances:
is_byte_flipping_hack:
value: manufacturer == 0x3344 # or any other condition
types:
energrycam_apl:
seq:
- id: control_info
type: u1
- id: access_number
type: u1
- id: state_of_meter
type: u1
# ...
# This is the "data_block" for normal, standard-compliant devices
- id: data_block_normal
type: data_block
if: not _parent.is_byte_flipping_hack
# This is the "data_block" for glitchy, byte-flipping devices
- id: data_block_flipped
type: data_block
process: xor(0b10000000)
if: _parent.is_byte_flipping_hack
# ...
instances:
# This is a easy access helper that will map to either "normal" or "flipped" versions depending on which one exists
data_block:
value: _parent.is_byte_flipping_hack ? data_block_flipped : data_block_normal
data_block:
seq:
- id: data_information_field
type: u1
- id: value_information_field
type: u1
# ... There we have 3 "layers": As for "having 50s of them in one file vs having 50 different files" - it's generally a matter of preference. Kaitai Struct allows you to have several different files for different structures and allows you to reference types in different files. If you'll be using So, if you feel like having them all in distinct files - feel free to. If you'd ask for my opinion, given that there are 3 types so far there (DLL, APL and that internal "data_block") - I'd stick them all into one file, at least until more hacks would be needed. |
I guess I'm kind of answered that question, but let me elaborate some more. Right now, it's perfectly fine to have stuff like these: meta:
id: top_level
seq:
- id: child
type: child and meta:
id: child
seq:
- id: something
type: u1 # or anything else in 2 distinct files. It's not possible to separate a single type in 2 different types (at least now). If you feel like that we need to add some sort of include mechanism - let's discuss that - it shouldn't be that hard to implement. Right now we've implemented somewhat working IP networking stack using this approach alone (i.e. some files referencing types in other files).
You've probably meant "generated classes"? That would be perfectly reasonable - that's one of the proposed usage scenarios - you get "generated" class, subclass it and add all the imperative methods that you need.
It really depends on particular target language. AFAIR, for Java and C++, we generate private parsing method + private member fields + public getters. It's very easy to fix that and generate protected anything if you need that, of course. |
I think this can be closed for now. |
Parser for EN 13757-3:2012 needed
I need to create a parser for data defined in the standard EN 13757-3:2012 (Communication systems for and remote reading of meters - Part 3: Dedicated application layer) and you can get an example on page 4 of the following PDF. In the end it's like many other binary formats: Some fields of binary data, some with fixed length, some with variable one, all those fields have a special purpose and some implement a bit mask to decide what this purpose is etc.
http://fastforward.ag/downloads/docu/FAST_EnergyCam-Protocol-wirelessMBUS.pdf
I need to decide what/if I build from scratch, where some tools can be of benefit and I like your approach very much. Reading through your documentation, one question came into my mind, though:
How do I deal with the always present exceptions from the standards because of implementation errors of some vendor?
Currently I have already two of those exceptions, where the vendor claims to support EN 13757-3:2012, but actually doesn't in one case by purpose to optimize transferred data and in the other one simply because a firmware bug has been introduced. I'm unsure currently how I would be able to combine default data parsing and recognizing those errors and maybe X more into your approach. In my mind examples for dealing with such things or thinking of strategies how it could be supported at all are of common interest, so I decided to create this issue. If you think otherwise, feel free to close it...
One missing byte
The first problem I have is interesting, because it changes the data format in the first two bytes already: By default the datagrams I need to parse start with 4 bytes, the first containing the length, the second some communication flag and 3/4 are the vendor providing the package. One of my vendors simply removes the second byte with the communication flag, the C-field in the linked PDF with value 44h, to reduce the amount of data transferred, because this byte doesn't ever change in his use cases. But of course I can only know about that case if I read byte 2 and 3 and check if it's that special vendor or not and if so, the parser mode needs to be changed somehow.
With your approach I would create some root type like "header" or such and that would already be different for standard compatible implementation vs. not like I need it.
So would I need to create one of those header types for each implementation I have to deal with? And all of those would need to check something using "if"? And if it's 20 or 50, would all of those be in the same file? Even if I have to heavily document why such a header is needed? Or would be a better approach to generate a different parser for each such case? But what with all the other types then, would I need to redundantly maintain them per file instead of providing some kind of include/link to other types?
In the end, the first example is missing one byte, which influences the position of all the other types created.
Some flipped bits
The second problem is that in the end some arbitrary values are transferred in the datagram, which consist of a triple: One byte to explain the encoding of the value, one byte to explain the unit of the value, one or more bytes for the value itself. It's e.g. "BCD encoded integer" and "hours" and "10" to tell me that "something" last for 10 hours.
Now one of my vendor has a introduced a firmware bug which results in that some datagrams have a flipped bit e.g. for the first of those three bytes. What I need to do is check the vender, the device type and version and then decide that I need to flip some bit in some byte.
Checking for vendor etc. should be easy using "if", but how do I apply the bit flip? Do I need to implement a "process" statement like "zlib", "xor" etc.? Or is using an instance with a "value" calculation the better approach? And what if things get arbitrary complex, like I need to e.g. check a configuration for all versions of devices where this error occurs? Such configuration is not available to your compiler. Additionally, the same question like before applies: Do I combine 50 of such workarounds in the same ksy file?
I would be very thankful for your ideas, because currently I tend to not use your lib and instead implement the parser by hand to have full control to access things like configuration and such.
The text was updated successfully, but these errors were encountered: