Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upper limit of a string #817

Closed
computerquip opened this issue Oct 7, 2020 · 3 comments
Closed

Upper limit of a string #817

computerquip opened this issue Oct 7, 2020 · 3 comments
Assignees
Labels

Comments

@computerquip
Copy link

computerquip commented Oct 7, 2020

Sorry if this is answered somewhere in the docs, I can't seem to find anything that matches the exact thing I'm looking for.
I have a value that is a NULL-terminated string but can be no longer than 256 bytes (with NULL terminator included). I tried doing:

      - id: name
        type: strz
        size: 256
        encoding: ASCII

but this seems to imply that the string is 256 bytes, not that the maximum is 256 bytes. This doesn't work for me since the field size depends on where the null-terminator is or if it reaches 256 bytes.
Is there anything that can describe what I'm wanting?

EDIT: It would also work if there was maybe an error that occurred if the null-terminator was past the 256-byte mark.

@generalmimon generalmimon self-assigned this Oct 7, 2020
@generalmimon
Copy link
Member

generalmimon commented Oct 7, 2020

Although I personally wouldn't make these hard restrictions even if the spec does explicitly say it (I'd rather stick to "be liberal in what you accept from others"; e.g. if I'd encounter a file with a 260-byte field when it should be at most 256, I would try to parse it anyway, regardless of whether it strictly conforms to the spec or not), sure, it's possible.

You can enforce the 256-byte limit using the valid/expr key (#435 (comment)).

Note that the valid key is not available in the stable 0.8 version, you need to fetch the latest development 0.9 (use the unstable KSC from https://kaitai.io/#download or the devel Web IDE for compiling, and then link the master version of runtime for the language you want).

I see two options how to use it in your situation:

  1. Add a subtype with a string stretching to EOS and wrap this type to a null-terminated substream. The byte size of a substream can be accessed by _io.size:

    seq:
      - id: wrapped_name
        type: str_eos
        terminator: 0
    types:
      str_eos:
        seq:
          - size: 0
            valid: # when this is `false`, a ValidationExprError is thrown
              expr: _io.size <= 256
          - id: s
            type: str
            encoding: ASCII
            size-eos: true
  2. First read a zero-terminated byte array, check its size (again with valid/expr, using the attribute bytes.length) and then reinterpret the byte array as string using the bytes.to_s(encoding) method:

    seq:
      - id: name_raw
        terminator: 0
        # no `type: ` = byte array in KS
        valid:
          expr: _.length <= 256
    instances:
      name:
        value: name_raw.to_s("ASCII")

Nice thing is that neither of these approaches bother to parse the actual string from raw bytes if the <= 256 check fails.

BTW, any chance you're describing the format SoundFont 2.04? If so, I have almost finished a KSY spec for this format (kaitai-io/kaitai_struct_formats#293). If you're interested, let me know.

@computerquip
Copy link
Author

computerquip commented Oct 7, 2020

No, I was making a parser for Microsoft CAB files, mostly for a way to figure out what Kaitai can do and is all about.

I think I will go with the simpler solution you suggested where I just allow any length of string since even though the spec says a specific size, I don't see any reason to limit to that length (outside of buffer optimization purposes maybe).

I put a link to the the ksy below if you want to look at it. Probably some unorthodox stuff in there. For now I'll close the issue. Thanks for the help.

https://hastebin.com/ubazalidim.less

@KOLANICH
Copy link

KOLANICH commented Oct 8, 2020

A parser for a CAB? We are waiting for it for a long time. Please register yourself in https://github.com/kaitai-io/kaitai_struct_formats and note that we have some drafts for CDF in the repo. Also please note https://github.com/kaitai-io/kaitai_compress , you will surely need it, since CAB uses custom compression algo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants