Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bitmap image data type #188

Open
GreyCat opened this issue Jun 23, 2017 · 20 comments
Open

Bitmap image data type #188

GreyCat opened this issue Jun 23, 2017 · 20 comments

Comments

@GreyCat
Copy link
Member

GreyCat commented Jun 23, 2017

Writing down yet another somewhat frequently requested feature: ability to mark up 2D bitmap images in some better syntax than just raw bytes (i.e. size: ...). Technically, it's a good idea. It's more or less obvious that all (uncompressed) bitmap images have lots of things in common:

  • they all have width and height (usually derived from other attributes nearby)
  • they all have some sort of pixel format
    • it's either a reference to a palette (and thus we need to provide that palette)
    • or it's a direct RGB color specification like 24-bit RGB, or 32-bit RGBA,
    • or some more obscure encoding like 16-bit 5+6+5 RGB, or YUV422, or stuff like that)
    • or even something much more obscure, such as
  • they might have some sort of intricate packing specifics:
    • for example, Windows .bmp file format lists rows from bottommost to topmost, and so do some OpenGL applications due to default OpenGL axis orientation
    • another example is planar composition, which can be pretty complex as well

Note that we totally do not touch a question of compression: it should be solved with process: ..., not here.

Draft of proposed spec

Naive proposal dictates something like that:

- id: w
  type: u4
- id: h
  type: u4
- id: bitmap
  type: bitmap
  width: w
  height: h
  pixels: bgr565

or, for paletted image:

- id: w
  type: u4
- id: h
  type: u4
- id: pal
  type: palette
  colors: 256
  pixels: rgb888
- id: bitmap
  type: bitmap
  width: w
  height: h
  pixels: index8
  palette: pal

This yields two new KS datatypes: palette and bitmap (actually, palette is more or less the same as one-dimensional bitmap)

Implementation

Of course, implementation is a huge question and probably it would be better done as some sort of plugin system. Things that were cited in various discussion for now:

  • C/C++ has SDL, which has SDL_Texture class that abstracts operations with bitmap images, such as loading a bitmap with particular width / height / pixel format / planarity settings. The most interesting thing is probably SDL_PixelFormatEnum, which has ready-made answers to many pixel format questions.

  • C/C++ raw OpenGL programming reads image into uncompressed raw byte array and then does something like that:

glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB, width, height, 0, GL_RGB, GL_UNSIGNED_BYTE, data);

Here, GL_RGB and GL_UNSIGNED_BYTE specify pixel format / packing. Probably we could provide some better interface here as well.

  • C++ with Qt has QImage which allows construction of image from arbitrary char * specifying a Format. Nothing too fancy there, though.

  • Desktop Java has awt, which has Raster, which allows creation of usable bitmap image from a DataBuffer, given a pretty intricate pixel format spec / planarity using SampleModel.

  • Java on Android has Bitmap and Bitmap.Config which play roughly the same role. Also there is ImageFormat, although it's not very clear from the very first sight how it's all connected. Range of supported pixel formats is very spartan, though.

  • Python has even more options...

    • PIL provides Image class, which has Image.fromstring method, which allows loading of arbitrary raw data in several pixel formats
    • OpenCV + Numpy allows loading of raw images into arbitrary matrices, according to this SO question, this is done using np.fromfile + reshape of resulting array into a matrix.

Trivial fallback implementation

In case of language / graphic library being unsupported, yet, at the very least, this syntax should provide a better, more readable equivalent of parsing a byte array of required size, i.e.

- id: bitmap
  type: bitmap
  width: w
  height: h
  pixels: bgr565

instead of

- id: bitmap
  size: w * h * 2 # 2 bytes per pixel, as bgr565 = 5 + 6 + 5 = 16 bits

Implementing parsing + new internal KS data type + fallback solution shouldn't be that hard. The only hard way is to make a dictionary of pixel format encodings, but that's more or less already done for us in many libraries.

@KOLANICH
Copy link

At first sight it looks good, but
1 Do we really need this? Isn't it better to decide to use some image format like png instead when designing format? I mean I doubt that this kind of structures are widespread in the formats on disk and network.
2 How about implementing built-in multidimensional arrays first?

@GreyCat
Copy link
Member Author

GreyCat commented Jun 26, 2017

Most, if not all, image file formats have something like that. For example, if you'll uncompress (process: zlib) image data in png, you'll end up with raw rectangle of pixels, exactly width * height * size_of_pixel bytes. The same with GIF (after applying LZW decompression), same with BMP and TIFF (even uncompressed), etc, etc.

Multi-dimensional arrays are:

  • arguably harder to implement (for instance, there is no direct support for true multi-dimensional arrays in many programming languages — out of what we're targeting, probably only C++ and C# support them natively — and, as far as I know, all of them are available in non-resizable versions only)
  • probably much less relevant — as far as I see, most of the time it's used to encode a 2D image-like structure (either a raster image or some sort of 2D map); rarely, it might see some use in scientific applications (i.e. matrices), but even that is pretty special usecase.

@KOLANICH
Copy link

KOLANICH commented Jun 26, 2017

there is no direct support for true multi-dimensional arrays in many programming languages — out of what we're targeting, probably only C++ and C# support them natively — and, as far as I know, all of them are available in non-resizable versions only)

I meant OOP-style arrays for languages with reloadable indexing and invokation operators. In c++ it's trivial to implement that, it's just a nested template with constexpr everywhere possible (but I don't know how efficient this construct is). For JS we can use proxy with returning constructor to overload indexing operator. Rust has Index trait. For languages without any good possibility to create an ndarray-like syntax we can fallback to 1d array.

@LogicAndTrick
Copy link
Collaborator

I also think that this is probably not necessary. This adds a decent amount of weight to the framework for minimal gain, in my opinion. There are already KSY definitions for most of the mainstream image formats, and it's easy for a consumer to load them into the image library of their preference. A dependency like zlib makes sense to have because there may be additional processing on the deflated byte stream, but for an image, there's not really any extra processing that can be achieved in KS, so the additional dependency isn't very advantageous.

Also consider that most non-trivial image formats need reasonably complex processing in order to convert into an image. BMP, PNG, TIFF and most others have the bit depth and other key flags set in the data type itself, so you would need some way to dynamically convert those flags into the equivalent KS pixel format. TIFF has a flag that specifies which orientation the data is stored in, so a naive implementation would often load rotated/flipped images. Most formats have a flag to indicate whether an image is indexed or not. Most of them have various compression flags as well.

Most formats I have encountered that contain images are usually container formats that contain images in a certain format (e.g. Windows DLLs contain ICO files, which contain BMP and PNG files) so I don't see a simple implementation being very useful in practice, and a complex implementation shouldn't be within the scope of this project IMO.

Basically, I wouldn't use this feature. If I want an image from a KS stream, I would simply load it as a complete stream or byte array (including headers/etc) and use a library to convert it to an image format in user code. I will always trust a dedicated image library to do image parsing, as it is by no means a trivial task, even for the simple uncompressed formats.

I could be wrong, but I just don't think it's a big enough use case to justify adding support for all the different variations of image formats just to save the user 2 lines of custom image loading code. I think it would be a heavy load on the project to maintain image support in the compiler and all languages, plus adding support for new languages would be significantly more complex as well. Plus it creates a precedent for other formats to start leaking in. What's next, native audio support? Videos?

@GreyCat
Copy link
Member Author

GreyCat commented Jun 27, 2017

@LogicAndTrick Um, looks like I'm a really poor storyteller ;) Let me try again.

This adds a decent amount of weight to the framework for minimal gain, in my opinion.

It's actually pretty lightweight change. For example, compared to calculated endianness (which required gargantuan amount of changes in generated code on many levels, and even now I'm not 100% sure that it should be left as is), this is:

  1. add one or two built-in data types and a few parameters that define them
  2. do "default" implementation that reads them as byte array — that actually won't even need any per-language fixes, as all languages already have "read X as byte array" implemented

That's it. In a minimal way, that's literally 30-40 lines of code + a dictionary of pixel formats.

There are already KSY definitions for most of the mainstream image formats

All these definitions use one of the following "techniques":

  1. There is a distinct size field in the header that allows to load image data as byte array of known size — gif, png, psx_tim, dicom
  2. There is a width + height fields that allow to calculate size with some expression and load image data as byte array with something like size: width * height * blahjpeg
  3. Parse everything up to the end of file as "raw image data" (i.e. size-eos: true) — bmp, jpeg
  4. Do nothing, parse just the header — tga, pcx, xwd

All of these approaches have its flaws. (2) requires manual calculations, which is what we definitely want to avoid. (4) is denial, we don't really finish the parsing, we just stop, which is bad. (3) is more like a placeholder to avoid (4) — we don't know how to parse pixels, so we just use a catch-all byte array in a vague hope that user will know what to do with it; of course, some formats actually have some trailer after the image data, and we just can't parse that. (1) technically completes the parsing cleanly, but, arguably, from user's perspective it still doesn't provide what it should. It's as if we ignored the string data type, and instead just used raw byte arrays everywhere — like, everyone can convert them to string manually, what's the difference?

Also consider that most non-trivial image formats need reasonably complex processing in order to convert into an image. BMP, PNG, TIFF and most others have the bit depth and other key flags set in the data type itself, so you would need some way to dynamically convert those flags into the equivalent KS pixel format. TIFF has a flag that specifies which orientation the data is stored in, so a naive implementation would often load rotated/flipped images.

That's exactly what I'm talking about. We need that "translation to a single pixel format dictionary" actually to be a part of .ksy — without that, end users would have to resort to seeking and using a human-readable translation table, and do that "translation" manually in their code — and that's exactly what we want to avoid.

Most formats have a flag to indicate whether an image is indexed or not.

Sure, I've included palette into proposed draft.

Most of them have various compression flags as well.

Most, if not all, compression is applied on a byte array level, not as some sort of pixel transformation. So, essentially, this is up to implementation of a (pluggable) library of additional process: XXX packers/unpackers.

Most formats I have encountered that contain images are usually container formats that contain images in a certain format (e.g. Windows DLLs contain ICO files, which contain BMP and PNG files) so I don't see a simple implementation being very useful in practice, and a complex implementation shouldn't be within the scope of this project IMO.

I can't say for people who started raising this issue, but I can say for the formats I've explored recently. I've seen both that PE => ICO => DIB chain, and DICOM file format (which 99% of the time just packs a 16-bit-per-pixel single-channel uncompressed bitmap). This proposal solves the problem of specifying image format of these (and, basically, all of the image formats in our repo and mentioned above) throughly.

I will always trust a dedicated image library to do image parsing, as it is by no means a trivial task, even for the simple uncompressed formats.

The aim is to make generation of such libraries possible. Of course, for png you'd use libpng, and for jpeg you'll use libjpeg, but what about DICOM? What about .tim, .cdr, .psd, and tons of more obscure formats, which are used in games, medical / industrial applications, graphical editors, framebuffers of old hardware, etc?

And it's actually not as complex as it looks. I believe that if we'll just do width + height + pixel-format + orientation that would cover all of the existing image formats in our formats repo and probably a pretty big chunk of potential image formats. Basically, the only thing that I'm slightly concerned about is pixel packing formats that are supposed to be decompressed on hardware, like S3TC.

I could be wrong, but I just don't think it's a big enough use case to justify adding support for all the different variations of image formats just to save the user 2 lines of custom image loading code.

I might be just 2 lines of image loading code, but to produce them one has to explore the format throughly to find out what is the exact format of that raw byte array data. That's more or less the same as stopping in the middle of parsing on any arbitrary point and just saying "ok, bye, you're now on your own" to the user.

What's even more important, that it opens a way for helpful reverse engineering visualizations. Given that visualizer supports that, you can just write 3 lines of code in .ksy file, and, voila, you're seeing if your hypothesis about that the following n bytes is a bitmap image is correct or not — right away. No need to compile, plug the module and put some wrapper code to make an ad-hoc bitmap image viewer. Tools like "Tile Molester" do something like that already — and I suspect that we can do it even better given much wider array of features that .ksy offers.

I think it would be a heavy load on the project to maintain image support in the compiler and all languages, plus adding support for new languages would be significantly more complex as well.

No, it's not. In its minimal form, it requires only byte array parsing, which is already something that we have/require for a language implementation. It's an opportunity, not something mandatory.

Plus it creates a precedent for other formats to start leaking in. What's next, native audio support? Videos?

Technically, there already 2 "precedents":

  • KaitaiFS that adds "filesystem" capabilities to some specs
  • mnakamura's 3D model viewer, which allows to specify which fields in an object should be composed to make up a 3D model

I don't see anything wrong with them (except that they're not 100% declarative).

Uncompressed audio support is actually pretty similar to bitmaps, except that it's much simpler. There's sampling frequency, number of channels, and there is sample format - and that's all. And yeah, I also believe that it's much better to do:

- id: len
  type: u4
- id: buf
  size: len
  type: audio
  freq: 11050
  channels: 1
  sample-format: u1

instead of:

- id: len
  type: u4
- id: buf
  size: len
  doc: |
    This application uses a fixed sampling frequency of 11050 Hz
    for audio, mono, unsigned 1-byte samples.

Exactly because the first one is formal, and the second one requires human brain to process English text.

@LogicAndTrick
Copy link
Collaborator

It's absolutely useful for visualisers, but I see that as a visualiser plugin rather than a feature for the main KSC application. Maybe you're seeing something that I'm not, but it looks to me like functionality to load even simple bitmaps is much more complex than you may think.

A single bitmap may or may not be compressed using one of many different algorithms (process operation needs to be dynamic), it may or may not be indexed (palette option needs to be dynamic), it could have any number of pixel formats (need to dynamically convert bitmap header flags into KS pixel formats, or pixel formats that KS may not know about). I feel like coding support dynamically for reading formats like this is a pretty significant challenge.

I see three typical situations:

  • The image format is common and well known: use the industry standard library, KS just needs to expose a byte stream. (e.g. JPG, PNG)
  • The image format is uncommon and simple: the KSY is trivially written without any additional KS features required. (e.g. Quake sprite files)
  • The image format is uncommon and complex: KSY image features are insufficient and user code is required anyway. (e.g. Source VTF files)

In these situations I just don't see the functionality being useful enough to justify the maintenance cost. Of course my opinion is just one of many, so if there is demand for it from others, there's probably a whole range of operations where this functionality would be useful, and I just don't know about them :)

(Since you mentioned game formats, I'd be interested to see if you think DXT formats could be supported using this kind of syntax. They're not structured in the same way as typical bitmap formats.)

@GreyCat
Copy link
Member Author

GreyCat commented Jun 27, 2017

Ok, to make it clear again: this is not my original idea, so I'm kind of playing a devil's advocate here. This was suggested like 3 or 4 times by different people during the last year, and this time I just actually went down to write it down ;)

A single bitmap may or may not be compressed using one of many different algorithms (process operation needs to be dynamic)

Sure. This is still a useful feature to have by itself, so I guess it's ok to have something like:

- id: buf
  process:
    switch-on: compression_type
    cases:
      'compression::deflate': zlib
      'compression::lzss': lzss
      'compression::none': none # needs more thought
      'compression::fancy_rle': rle(param1, param2, ...)

And, of course, can always go with a bunch of ifs:

- id: buf_uncompressed
  size: len
  if: compression_type == compression::none
- id: buf_deflate
  size: len
  process: zlib
  if: compression_type == compression::deflate
# ...

it may or may not be indexed (palette option needs to be dynamic)

palette is already a calculated expression, so it's ok to pass a null for a missing palette (which most likely would result from a palette parsing attribute being null due to if: has_palette).

(need to dynamically convert bitmap header flags into KS pixel formats, or pixel formats that KS may not know about)

That's one of the most interesting features, I guess. And I see nothing impossible with existing tools to implement it. pixel-format might provide a value from built-in enum, which is, in turn, can be calculated:

# Some pixel format defining flags in the header
- id: is_rgb888
  type: b1
- id: is_rgb565
  type: b1
- id: is_indexed
  type: b1
# Palette, if it presents
- id: colors_in_palette
  type: b5
- id: my_palette
  type: palette
  num-colors: colors_in_palette
  pixel-format: pixel_format::rgba8888
  if: num_colors_in_palette > 0
# Finally, the image
- id: my_image
  type: bitmap
  pixel-format: >
    is_rgb888 ? pixel_format::rgb888 :
    is_rgb565 ? pixel_format::rgb565
    is_indexed ? pixel_format::indexed8
  palette: my_palette
  width: ...
  height: ...

The image format is common and well known: use the industry standard library, KS just needs to expose a byte stream. (e.g. JPG, PNG)

You can say the very same about the metadata and thus avoiding use of KS entirely. There are already tons of libraries that work with common formats. That's not the point: there are still tons of scenarios when one might want to use KS anyway — i.e. for exploration / learning purposes, for forensic purposes, for meddling with some internal structures on lower level, for getting a unified approach with will work with all wanted image formats, etc.

The image format is uncommon and simple: the KSY is trivially written without any additional KS features required. (e.g. Quake sprite files)

In my opinion, even replacing that char Pixels[width*height] with a formal definition is already worth it.

The image format is uncommon and complex: KSY image features are insufficient and user code is required anyway. (e.g. Source VTF files)

Could you pinpoint what exactly can't be done using proposed specification for a VTF file? From what I see at the link you've provided, it exactly defines data areas in a file which contain raster images in a designated pixel or block format, which is clearly described with a field in a header.

I'd be interested to see if you think DXT formats could be supported using this kind of syntax.

That's a slight concern, but from what I've seen, they are not that terribly different. In most cases (i.e. in .dds file, .vtf file, or any other proprietary texture container), there would be attributes that give us width + height + block compression method. There are well-known formulas that allow to calculate size of compressed data, so you can read it as byte array (or we can go with array of byte arrays to read multiple sequential mipmaps — I haven't seen any non-sequential mipmaps anywhere anyway). Then there are two possibilities:

  1. You want to use hardware decoder → you just need a pointer to that byte array(s) for OpenGL/D3D function call to upload texture into GPU RAM.
  2. You want to use software decoder → you call a software decoding function (like this one) and end up with a raw uncompressed frame buffer, usually with raw RGB888 or BGR888 pixels.

Both can be generated by ksc, it requested. So, I don't see anything wrong with specifying something like:

- id: buf
  type: bitmap
  pixel-format: pixel_format::dxt1
  width: my_width
  height: my_height
  mipmaps: 6 # not sure about this one — might as well do a normal repeat-expr loop

@LogicAndTrick
Copy link
Collaborator

A simple implementation is certainly useful for quick exploration and reversing of an unknown format, but I think there's less value if you're trying to write a robust KSY for production usage. Just a thought that this could be something that the visualiser could do, but KSC might not need to know about.

But if you do add it I would be happy to help with the C# implementation, the System.Drawing DLL ships with .NET so it's not too hard to decide on libraries or anything like that. (I would also try to make some evil unit tests with really quirky pixel formats to try and break things! :P )

@koczkatamas
Copy link
Member

I share the same fears as @LogicAndTrick.

Kaitai Experiments

What I wanted to do something I just call "Kaitai Experiments" (after Chrome Experiments) which would be a showpage website containing compiled parsing codes with thin wrappers which can visualize some example file formats (like images, 3D models) and how easy they can be implemented with Kaitai (of course these code will be written only for one language, mostly for Javascript as it can run on the web directly).

Metadata

But I agree that this meta information (width, height, palette, pixel format, etc) is important, so we should include declaratively somehow but I would detach it somehow from the main functionality, maybe in some kind of plugin form (which can be available by default and come with the default Kaitai runtime / compiler).

It would be a really small modification .ksy-wise, eg:

- id: buf
  schema: 
    identifier: bitmap
    pixel-format: pixel_format::dxt1
    width: my_width
    height: my_height
    mipmaps: 6 # not sure about this one — might as well do a normal repeat-expr loop

Where we could describe these uniquely known schemas (bitmap) somewhere, and it could even be extended by 3rd-parties if they want to.

Forensics example

If take the forensics example then for example a PNG would be parsed the following way:

  • read every chunk (supported)
  • create a new stream from IDAT chunks' data field (not supported yet)
  • decompress data from this new stream (supported)
  • parse strides (~rows) where the first byte describes the row parsing algorithm (supported)
  • decode the row data (based on algorithm) where every pixel-byte can depend on the pixel-byte on the left and above (not supported yet, maybe will be after Handling of detached indexes defining the size of variable sized blocks #147)
  • then decode row into colorful pixels based on the pixel format

I think this is the most low-level parsing we can achieve. After this we can map these pixels to a standard bitmap format which can be displayed. But this mapping is a basically a new 'product' in the sense that currently Kaitai parses raw data, but does not convert between existing schemas (I think we've had a 'schema-mapping' proposal like this somewhere in the GH issues).

Leave buffer as is

Other option is the leave the buffer as is and then optionally create a platform-specific wrapper for this buffer supported by the metadata in the .ksy.

In this case I'd use an opaque-type or process like solution which accepts the buffer and every required metadata as parameters and returns a language specific Bitmap class.

And the whole is implemented as generically as it can be: the compiler should not know anything about that the result will be an image, audio or video, etc. We can check the .ksy for an expected schema for convenience and we can prepare the runtimes for the most frequent types (eg. image), but other than that I would keep this as extensionable as we can (so if somebody add a new type into his/her runtime then he/she could use it without modifying the compiler).

@GreyCat
Copy link
Member Author

GreyCat commented Jun 27, 2017

I actually like the idea of detaching this stuff (i.e. bitmap-specific, audio-specific, etc) into some sort of schema subelement and a plugin. There is one slight problem, though: it can't be completely ignored by the compiler, as size actually depends on it.

But you're completely right — compiler is 100% ok with dealing with resulting bitmaps, audio, etc, as opaque types, very similar to our existing opaque types, but not KaitaiStruct-compatible. So we can specify:

- id: buf
  schema: 
    identifier: bitmap
    pixel-format: pixel_format::rgb888
    width: my_width
    height: my_height

and that needs to generate something like:

// header
QImage* m_buf;
QImage* buf() { return m_buf; };

// _read code
m_buf = Bitmap::read(m__io, my_width(), my_height(), PixelFormat::RGB888);

// and Bitmap::read is implemented in a separate library
QImage* Bitmap::read(kaitai::kstream io, int width, int height, PixelFormat fmt) {
    int qtFmt, pixSize;

    switch (fmt) {
        case PixelFormat::RGB888:
            qtFmt = QImage::Format_RGB32;
            pixSize = 3;
            break;
        // ...
    }

    int size = width * height * pixSize;
    std::string raw = io->read_bytes(size);

    return new QImage::QImage(&raw[0], width, height, qtFmt);
}

The only problem here is that ksc must know how to pass arguments into that reader function (i.e. order of arguments and type checks) and what type is expected to return. Probably we can invent some metadata to pass along with such libraries? Something like:

meta:
  id: bitmap
external:
  call_func: 'Bitmap::read'
  args:
    # IO is always passed as first argument implicitly
    - id: width
      type: u4
    - id: height
      type: u4
    - id: pixel_format
      type: u1
      enum: pixel_format
  return: 'QImage*'

Hmm, I think we're revisiting #51.

@GreyCat
Copy link
Member Author

GreyCat commented Jun 27, 2017

I wonder if @adamiwaniuk would be interested in this discussion. After all, these guys do a super cool visualizer that might benefit from formal declaration of imaging formats as well :)

GreyCat added a commit to kaitai-io/kaitai_struct_formats that referenced this issue Nov 8, 2017
  guide.
* Removed dummy external dependencies - at least for a while,
  until kaitai-io/kaitai_struct#188 is
  finalized.
* Made `essid` a raw byte array, not something with a particular
  encoding.
@arekbulski
Copy link
Member

I have never seen a GitHub issue with posts this long. 😮

I was swayed by @LogicAndTrick arguments, we want to offer end users a minimalistic parser, not an video-image-data-processor. But... multidimensional arrays do exist for several targets, Python has numpy. For targets that do not have those, merely returning bytearray would be fine instead. This assumes a semantic, that same field can result in different types on different targets.

I propose a format like:

- id: array
  type: array
  dimensions: w, h
  subtype: u1

Subtype defines the numpy dtype of array elements. Dimensions define the numpy shape of array. If the target does not support narrays, it just returns a bytearray of size w*h*sizeof(u1).

@KOLANICH
Copy link

KOLANICH commented Jan 19, 2018

maybe not dimensions, but just shape?

@arekbulski
Copy link
Member

That would be fine, numpy ndarray uses shape property for that thing too.
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.ndarray.shape.html#numpy.ndarray.shape

There is another problem however. You said in #315 that KS should not contain runtime-specific types (did I understood that right?). If so, pretty much all types that were listed in this post are off due to those same reasons. And numpy too.

@KOLANICH
Copy link

KOLANICH commented Jan 19, 2018

I meant type: numpy in a sense that numpy is a python library, but the idea of kaitai struct is to be a metalanguage: you (or someone else) define your structure once and use it everywhere. For example, you implement some RFC once - and you can use it even in PHP. For example in 2013 I have implemented a part of some network binary protocol (usually implemented in C/C++ in OS drivers) entirely in PHP. It was pain in the ass to calculate offsets, and binary shifts and masks to deserialize structures into objects easily usable. I have implemented only one paragraph of the RFC. Then I started implementing the logic of serialization to send responses and came to the idea that it'd be nice just to declare the structure, set some fields and to make the software to infer everything else. I googled, asked some people on forums and IRC ... and have not found anything, even though there were projects with these (at least parsing) features, but noone I have spoken to and no search engine I have used managed me point on them. Most times I was advised to use protobuf and ASN1, but that was not that I needed. So I got very enhusiastic when I had heard about KS. IMHO: Once serialization implemented, KS should be standardized, KS definitions should be used as primary source of truth of every open format description. RFCs should contain not the ascii tables and wordings, but KS code. This will simplify things a lot. Instead of implementing and updatin format parser implementation for every language you just compile an rfc and get the library ready for use. Instead of guessing what the authors of a spec have meant you just require them to write a spec in a formal language. Instead of having security problems like buffer overflow you just have a compiler which generates the code which is guranteedly (assumming no hardware vulns) memory-safe (KS may be not secure for now for C, but I guess this may be fixed somewhen).

And since this must work for every language, we shouldn't define formats in terms of numpy or other specific library API. Multidimensional arrays are quite universal: you can implement them in any language with OOP and operator overloading: C++, JS, python. Numpy is just a library for python for more efficient storage of numbers and more convenient use of them in math. And we don't want to bind a format to numpy - this will mean that there is no sense to use Kaitai for it, since it canjot be used with any supported language.

@arekbulski
Copy link
Member

You made a fair point. I agree that KS would be a good way to write formal public specs.

Numpy has a serialization protocol, alike pickle protocol. In that sense its not only "a library for more efficient storage of numbers" but a formal binary protocol as well. Albeit only used on Python.

There would be users (like me actually) who would use Kaitai with only 1 target. The fact that numpy type would be usable only on 1 target, doesnt make it any less usable. I would discourage people from using such types when eg. writing schemas for RFCs, but that doesnt justify denying such a type.
It would not be insensible to add types that could only be implemented in half of runtimes or even only one runtime. You just write proper documentation for such a type, its restrictions.

@arekbulski
Copy link
Member

Bottom line: adding runtime-dependant types (eg. numpy) doesnt get in the way of KS being universal language (for writing RFCs). It extends what users can do, not limits them.

@KOLANICH
Copy link

KOLANICH commented Jan 19, 2018

1 we have opaque types. But just multidimensional arrays should not be defined as numpy in a spec just because the author of spec thinks that they should be implemented as numpy. They are not numpy, they are multidimensional arrays.
2 The specs anyway shouldn't mention numpy if they mean binary arrays.

Numpy has a serialization protocol, alike pickle protocol. In that sense its not only "a library for more efficient storage of numbers" but a formal binary protocol as well.

Then it should be possible to describe it as a ksy spec.

@arekbulski
Copy link
Member

Then it should be possible to describe it as a ksy spec.

Possible but not advised. Native impl for Python is a oneliner. Schema would be somewhat complicated and somewhat slow. But then, schema would be usable on all targets. 🙂 Good point.

@theorbtwo
Copy link

I like the idea of a "schema" tag, the contents of which are assumed to be nonstandard and handled by a plugin. I'm not sure I like the image schema as it's been discussed so far -- in particular, I think it doesn't use existing ksy features enough. A pixel format is completely within the capabilities of ksy, so it seems odd to create an opaque enum of allowed pixel formats. Instead, I propose that the "big block" passed in to the schema be a typed array of things representing individual pixels.

Schema is a key attached to a type. The only defined subkey is "kind". If a kind is supported by an implementation, it will know what to do with it. If not, it should ignore the entire schema. The parser will ignore the schema entirely -- it should consider any subkey as allowed.

For the pixel_data schema, each element must have attributes r, g, and b, and may have an attribute a. These are numbers in 0-1, with zero being fully black and 1 being fully saturated (ideally, these are interpreted as sRGB, and ideally values outside of 0-1 are handled in a way matching colour theory, but an implementation shouldn't be considered non-conformant on this basis). There may also be an "a" attribute, which is an alpha 0-1, 1 being fully opaque. a being not defined or null is equivalent to 1.

The pixel data shall appear from lowest (pre-transform) x to highest, and then from lowest to highest y. After transformation, x is horizontally left to right, y is vertically top to bottom.

The following additional schema attributes are defined:
w, a width in elements, required
h, an optional height in elements. If not specified, the height is taken to be the length of the array divided by w.
x_multiply: x indexes are multiplied by this value to get display x coordinates. Use -1 to flip the x axis, and values other then +/-1 for non-square pixels.
y_multiply: y indexes are multiplied by this value to get display y coordinates. Use -1 to flip the y axis, and values other then +/-1 for non-square pixels.
xy_flip: display x coordinates are pixel data y coordinates, and vice-versa. This is done after x and y multiply.

I believe this set of definitions is sufficient for any reasonable image format. It has no explicit pixel formats or pallet support, because it should be sufficient to give ksy code that defines these. It is somewhat more verbose then I would like, but that could also be solved by having a less-verbose syntax for reading a value of a base type and doing arithmetic on it.

pixel_block:
  type: pixel
  repeat: expr
  repeat-expr: w*h
  schema:
    kind: pixel_data
    width: _parent.w
    height: _parent.h
pixel:
  seq:
    - id: xr
      type: b5
    - id: xg
      type: b6
    - id: xb
      type: b5
  instances:
    r:
      value: xr / 0b11111
    g:
      value: xr / 0b111111
    b:
      value: xr / 0b11111

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants