-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bitmap image data type #188
Comments
At first sight it looks good, but |
Most, if not all, image file formats have something like that. For example, if you'll uncompress ( Multi-dimensional arrays are:
|
I meant OOP-style arrays for languages with reloadable indexing and invokation operators. In c++ it's trivial to implement that, it's just a nested template with |
I also think that this is probably not necessary. This adds a decent amount of weight to the framework for minimal gain, in my opinion. There are already KSY definitions for most of the mainstream image formats, and it's easy for a consumer to load them into the image library of their preference. A dependency like zlib makes sense to have because there may be additional processing on the deflated byte stream, but for an image, there's not really any extra processing that can be achieved in KS, so the additional dependency isn't very advantageous. Also consider that most non-trivial image formats need reasonably complex processing in order to convert into an image. BMP, PNG, TIFF and most others have the bit depth and other key flags set in the data type itself, so you would need some way to dynamically convert those flags into the equivalent KS pixel format. TIFF has a flag that specifies which orientation the data is stored in, so a naive implementation would often load rotated/flipped images. Most formats have a flag to indicate whether an image is indexed or not. Most of them have various compression flags as well. Most formats I have encountered that contain images are usually container formats that contain images in a certain format (e.g. Windows DLLs contain ICO files, which contain BMP and PNG files) so I don't see a simple implementation being very useful in practice, and a complex implementation shouldn't be within the scope of this project IMO. Basically, I wouldn't use this feature. If I want an image from a KS stream, I would simply load it as a complete stream or byte array (including headers/etc) and use a library to convert it to an image format in user code. I will always trust a dedicated image library to do image parsing, as it is by no means a trivial task, even for the simple uncompressed formats. I could be wrong, but I just don't think it's a big enough use case to justify adding support for all the different variations of image formats just to save the user 2 lines of custom image loading code. I think it would be a heavy load on the project to maintain image support in the compiler and all languages, plus adding support for new languages would be significantly more complex as well. Plus it creates a precedent for other formats to start leaking in. What's next, native audio support? Videos? |
@LogicAndTrick Um, looks like I'm a really poor storyteller ;) Let me try again.
It's actually pretty lightweight change. For example, compared to calculated endianness (which required gargantuan amount of changes in generated code on many levels, and even now I'm not 100% sure that it should be left as is), this is:
That's it. In a minimal way, that's literally 30-40 lines of code + a dictionary of pixel formats.
All these definitions use one of the following "techniques":
All of these approaches have its flaws. (2) requires manual calculations, which is what we definitely want to avoid. (4) is denial, we don't really finish the parsing, we just stop, which is bad. (3) is more like a placeholder to avoid (4) — we don't know how to parse pixels, so we just use a catch-all byte array in a vague hope that user will know what to do with it; of course, some formats actually have some trailer after the image data, and we just can't parse that. (1) technically completes the parsing cleanly, but, arguably, from user's perspective it still doesn't provide what it should. It's as if we ignored the string data type, and instead just used raw byte arrays everywhere — like, everyone can convert them to string manually, what's the difference?
That's exactly what I'm talking about. We need that "translation to a single pixel format dictionary" actually to be a part of .ksy — without that, end users would have to resort to seeking and using a human-readable translation table, and do that "translation" manually in their code — and that's exactly what we want to avoid.
Sure, I've included palette into proposed draft.
Most, if not all, compression is applied on a byte array level, not as some sort of pixel transformation. So, essentially, this is up to implementation of a (pluggable) library of additional
I can't say for people who started raising this issue, but I can say for the formats I've explored recently. I've seen both that PE => ICO => DIB chain, and DICOM file format (which 99% of the time just packs a 16-bit-per-pixel single-channel uncompressed bitmap). This proposal solves the problem of specifying image format of these (and, basically, all of the image formats in our repo and mentioned above) throughly.
The aim is to make generation of such libraries possible. Of course, for png you'd use libpng, and for jpeg you'll use libjpeg, but what about DICOM? What about .tim, .cdr, .psd, and tons of more obscure formats, which are used in games, medical / industrial applications, graphical editors, framebuffers of old hardware, etc? And it's actually not as complex as it looks. I believe that if we'll just do
I might be just 2 lines of image loading code, but to produce them one has to explore the format throughly to find out what is the exact format of that raw byte array data. That's more or less the same as stopping in the middle of parsing on any arbitrary point and just saying "ok, bye, you're now on your own" to the user. What's even more important, that it opens a way for helpful reverse engineering visualizations. Given that visualizer supports that, you can just write 3 lines of code in .ksy file, and, voila, you're seeing if your hypothesis about that the following n bytes is a bitmap image is correct or not — right away. No need to compile, plug the module and put some wrapper code to make an ad-hoc bitmap image viewer. Tools like "Tile Molester" do something like that already — and I suspect that we can do it even better given much wider array of features that .ksy offers.
No, it's not. In its minimal form, it requires only byte array parsing, which is already something that we have/require for a language implementation. It's an opportunity, not something mandatory.
Technically, there already 2 "precedents":
I don't see anything wrong with them (except that they're not 100% declarative). Uncompressed audio support is actually pretty similar to bitmaps, except that it's much simpler. There's sampling frequency, number of channels, and there is sample format - and that's all. And yeah, I also believe that it's much better to do: - id: len
type: u4
- id: buf
size: len
type: audio
freq: 11050
channels: 1
sample-format: u1 instead of: - id: len
type: u4
- id: buf
size: len
doc: |
This application uses a fixed sampling frequency of 11050 Hz
for audio, mono, unsigned 1-byte samples. Exactly because the first one is formal, and the second one requires human brain to process English text. |
It's absolutely useful for visualisers, but I see that as a visualiser plugin rather than a feature for the main KSC application. Maybe you're seeing something that I'm not, but it looks to me like functionality to load even simple bitmaps is much more complex than you may think. A single bitmap may or may not be compressed using one of many different algorithms (process operation needs to be dynamic), it may or may not be indexed (palette option needs to be dynamic), it could have any number of pixel formats (need to dynamically convert bitmap header flags into KS pixel formats, or pixel formats that KS may not know about). I feel like coding support dynamically for reading formats like this is a pretty significant challenge. I see three typical situations:
In these situations I just don't see the functionality being useful enough to justify the maintenance cost. Of course my opinion is just one of many, so if there is demand for it from others, there's probably a whole range of operations where this functionality would be useful, and I just don't know about them :) (Since you mentioned game formats, I'd be interested to see if you think DXT formats could be supported using this kind of syntax. They're not structured in the same way as typical bitmap formats.) |
Ok, to make it clear again: this is not my original idea, so I'm kind of playing a devil's advocate here. This was suggested like 3 or 4 times by different people during the last year, and this time I just actually went down to write it down ;)
Sure. This is still a useful feature to have by itself, so I guess it's ok to have something like: - id: buf
process:
switch-on: compression_type
cases:
'compression::deflate': zlib
'compression::lzss': lzss
'compression::none': none # needs more thought
'compression::fancy_rle': rle(param1, param2, ...) And, of course, can always go with a bunch of - id: buf_uncompressed
size: len
if: compression_type == compression::none
- id: buf_deflate
size: len
process: zlib
if: compression_type == compression::deflate
# ...
That's one of the most interesting features, I guess. And I see nothing impossible with existing tools to implement it. # Some pixel format defining flags in the header
- id: is_rgb888
type: b1
- id: is_rgb565
type: b1
- id: is_indexed
type: b1
# Palette, if it presents
- id: colors_in_palette
type: b5
- id: my_palette
type: palette
num-colors: colors_in_palette
pixel-format: pixel_format::rgba8888
if: num_colors_in_palette > 0
# Finally, the image
- id: my_image
type: bitmap
pixel-format: >
is_rgb888 ? pixel_format::rgb888 :
is_rgb565 ? pixel_format::rgb565
is_indexed ? pixel_format::indexed8
palette: my_palette
width: ...
height: ...
You can say the very same about the metadata and thus avoiding use of KS entirely. There are already tons of libraries that work with common formats. That's not the point: there are still tons of scenarios when one might want to use KS anyway — i.e. for exploration / learning purposes, for forensic purposes, for meddling with some internal structures on lower level, for getting a unified approach with will work with all wanted image formats, etc.
In my opinion, even replacing that
Could you pinpoint what exactly can't be done using proposed specification for a VTF file? From what I see at the link you've provided, it exactly defines data areas in a file which contain raster images in a designated pixel or block format, which is clearly described with a field in a header.
That's a slight concern, but from what I've seen, they are not that terribly different. In most cases (i.e. in .dds file, .vtf file, or any other proprietary texture container), there would be attributes that give us width + height + block compression method. There are well-known formulas that allow to calculate size of compressed data, so you can read it as byte array (or we can go with array of byte arrays to read multiple sequential mipmaps — I haven't seen any non-sequential mipmaps anywhere anyway). Then there are two possibilities:
Both can be generated by ksc, it requested. So, I don't see anything wrong with specifying something like: - id: buf
type: bitmap
pixel-format: pixel_format::dxt1
width: my_width
height: my_height
mipmaps: 6 # not sure about this one — might as well do a normal repeat-expr loop |
A simple implementation is certainly useful for quick exploration and reversing of an unknown format, but I think there's less value if you're trying to write a robust KSY for production usage. Just a thought that this could be something that the visualiser could do, but KSC might not need to know about. But if you do add it I would be happy to help with the C# implementation, the |
I share the same fears as @LogicAndTrick. Kaitai ExperimentsWhat I wanted to do something I just call "Kaitai Experiments" (after Chrome Experiments) which would be a showpage website containing compiled parsing codes with thin wrappers which can visualize some example file formats (like images, 3D models) and how easy they can be implemented with Kaitai (of course these code will be written only for one language, mostly for Javascript as it can run on the web directly). MetadataBut I agree that this meta information (width, height, palette, pixel format, etc) is important, so we should include declaratively somehow but I would detach it somehow from the main functionality, maybe in some kind of plugin form (which can be available by default and come with the default Kaitai runtime / compiler). It would be a really small modification .ksy-wise, eg: - id: buf
schema:
identifier: bitmap
pixel-format: pixel_format::dxt1
width: my_width
height: my_height
mipmaps: 6 # not sure about this one — might as well do a normal repeat-expr loop Where we could describe these uniquely known schemas ( Forensics exampleIf take the forensics example then for example a PNG would be parsed the following way:
I think this is the most low-level parsing we can achieve. After this we can map these pixels to a standard bitmap format which can be displayed. But this mapping is a basically a new 'product' in the sense that currently Kaitai parses raw data, but does not convert between existing schemas (I think we've had a 'schema-mapping' proposal like this somewhere in the GH issues). Leave buffer as isOther option is the leave the buffer as is and then optionally create a platform-specific wrapper for this buffer supported by the metadata in the .ksy. In this case I'd use an opaque-type or process like solution which accepts the buffer and every required metadata as parameters and returns a language specific Bitmap class. And the whole is implemented as generically as it can be: the compiler should not know anything about that the result will be an image, audio or video, etc. We can check the .ksy for an expected schema for convenience and we can prepare the runtimes for the most frequent types (eg. image), but other than that I would keep this as extensionable as we can (so if somebody add a new type into his/her runtime then he/she could use it without modifying the compiler). |
I actually like the idea of detaching this stuff (i.e. bitmap-specific, audio-specific, etc) into some sort of But you're completely right — compiler is 100% ok with dealing with resulting bitmaps, audio, etc, as opaque types, very similar to our existing opaque types, but not KaitaiStruct-compatible. So we can specify: - id: buf
schema:
identifier: bitmap
pixel-format: pixel_format::rgb888
width: my_width
height: my_height and that needs to generate something like: // header
QImage* m_buf;
QImage* buf() { return m_buf; };
// _read code
m_buf = Bitmap::read(m__io, my_width(), my_height(), PixelFormat::RGB888);
// and Bitmap::read is implemented in a separate library
QImage* Bitmap::read(kaitai::kstream io, int width, int height, PixelFormat fmt) {
int qtFmt, pixSize;
switch (fmt) {
case PixelFormat::RGB888:
qtFmt = QImage::Format_RGB32;
pixSize = 3;
break;
// ...
}
int size = width * height * pixSize;
std::string raw = io->read_bytes(size);
return new QImage::QImage(&raw[0], width, height, qtFmt);
} The only problem here is that ksc must know how to pass arguments into that reader function (i.e. order of arguments and type checks) and what type is expected to return. Probably we can invent some metadata to pass along with such libraries? Something like: meta:
id: bitmap
external:
call_func: 'Bitmap::read'
args:
# IO is always passed as first argument implicitly
- id: width
type: u4
- id: height
type: u4
- id: pixel_format
type: u1
enum: pixel_format
return: 'QImage*' Hmm, I think we're revisiting #51. |
I wonder if @adamiwaniuk would be interested in this discussion. After all, these guys do a super cool visualizer that might benefit from formal declaration of imaging formats as well :) |
guide. * Removed dummy external dependencies - at least for a while, until kaitai-io/kaitai_struct#188 is finalized. * Made `essid` a raw byte array, not something with a particular encoding.
I have never seen a GitHub issue with posts this long. 😮 I was swayed by @LogicAndTrick arguments, we want to offer end users a minimalistic parser, not an video-image-data-processor. But... multidimensional arrays do exist for several targets, Python has numpy. For targets that do not have those, merely returning bytearray would be fine instead. This assumes a semantic, that same field can result in different types on different targets. I propose a format like: - id: array
type: array
dimensions: w, h
subtype: u1 Subtype defines the numpy dtype of array elements. Dimensions define the numpy shape of array. If the target does not support narrays, it just returns a bytearray of size |
maybe not |
That would be fine, numpy ndarray uses There is another problem however. You said in #315 that KS should not contain runtime-specific types (did I understood that right?). If so, pretty much all types that were listed in this post are off due to those same reasons. And numpy too. |
I meant And since this must work for every language, we shouldn't define formats in terms of numpy or other specific library API. Multidimensional arrays are quite universal: you can implement them in any language with OOP and operator overloading: C++, JS, python. Numpy is just a library for python for more efficient storage of numbers and more convenient use of them in math. And we don't want to bind a format to numpy - this will mean that there is no sense to use Kaitai for it, since it canjot be used with any supported language. |
You made a fair point. I agree that KS would be a good way to write formal public specs. Numpy has a serialization protocol, alike pickle protocol. In that sense its not only "a library for more efficient storage of numbers" but a formal binary protocol as well. Albeit only used on Python. There would be users (like me actually) who would use Kaitai with only 1 target. The fact that numpy type would be usable only on 1 target, doesnt make it any less usable. I would discourage people from using such types when eg. writing schemas for RFCs, but that doesnt justify denying such a type. |
Bottom line: adding runtime-dependant types (eg. numpy) doesnt get in the way of KS being universal language (for writing RFCs). It extends what users can do, not limits them. |
1 we have opaque types. But just multidimensional arrays should not be defined as numpy in a spec just because the author of spec thinks that they should be implemented as numpy. They are not numpy, they are multidimensional arrays.
Then it should be possible to describe it as a ksy spec. |
Possible but not advised. Native impl for Python is a oneliner. Schema would be somewhat complicated and somewhat slow. But then, schema would be usable on all targets. 🙂 Good point. |
I like the idea of a "schema" tag, the contents of which are assumed to be nonstandard and handled by a plugin. I'm not sure I like the image schema as it's been discussed so far -- in particular, I think it doesn't use existing ksy features enough. A pixel format is completely within the capabilities of ksy, so it seems odd to create an opaque enum of allowed pixel formats. Instead, I propose that the "big block" passed in to the schema be a typed array of things representing individual pixels. Schema is a key attached to a type. The only defined subkey is "kind". If a kind is supported by an implementation, it will know what to do with it. If not, it should ignore the entire schema. The parser will ignore the schema entirely -- it should consider any subkey as allowed. For the The pixel data shall appear from lowest (pre-transform) x to highest, and then from lowest to highest y. After transformation, x is horizontally left to right, y is vertically top to bottom. The following additional schema attributes are defined: I believe this set of definitions is sufficient for any reasonable image format. It has no explicit pixel formats or pallet support, because it should be sufficient to give ksy code that defines these. It is somewhat more verbose then I would like, but that could also be solved by having a less-verbose syntax for reading a value of a base type and doing arithmetic on it. pixel_block:
type: pixel
repeat: expr
repeat-expr: w*h
schema:
kind: pixel_data
width: _parent.w
height: _parent.h
pixel:
seq:
- id: xr
type: b5
- id: xg
type: b6
- id: xb
type: b5
instances:
r:
value: xr / 0b11111
g:
value: xr / 0b111111
b:
value: xr / 0b11111 |
Writing down yet another somewhat frequently requested feature: ability to mark up 2D bitmap images in some better syntax than just raw bytes (i.e.
size: ...
). Technically, it's a good idea. It's more or less obvious that all (uncompressed) bitmap images have lots of things in common:Note that we totally do not touch a question of compression: it should be solved with
process: ...
, not here.Draft of proposed spec
Naive proposal dictates something like that:
or, for paletted image:
This yields two new KS datatypes:
palette
andbitmap
(actually,palette
is more or less the same as one-dimensionalbitmap
)Implementation
Of course, implementation is a huge question and probably it would be better done as some sort of plugin system. Things that were cited in various discussion for now:
C/C++ has SDL, which has SDL_Texture class that abstracts operations with bitmap images, such as loading a bitmap with particular width / height / pixel format / planarity settings. The most interesting thing is probably SDL_PixelFormatEnum, which has ready-made answers to many pixel format questions.
C/C++ raw OpenGL programming reads image into uncompressed raw byte array and then does something like that:
Here,
GL_RGB
andGL_UNSIGNED_BYTE
specify pixel format / packing. Probably we could provide some better interface here as well.C++ with Qt has QImage which allows construction of image from arbitrary
char *
specifying a Format. Nothing too fancy there, though.Desktop Java has awt, which has Raster, which allows creation of usable bitmap image from a DataBuffer, given a pretty intricate pixel format spec / planarity using SampleModel.
Java on Android has Bitmap and Bitmap.Config which play roughly the same role. Also there is ImageFormat, although it's not very clear from the very first sight how it's all connected. Range of supported pixel formats is very spartan, though.
Python has even more options...
Image.fromstring
method, which allows loading of arbitrary raw data in several pixel formatsnp.fromfile
+reshape
of resulting array into a matrix.Trivial fallback implementation
In case of language / graphic library being unsupported, yet, at the very least, this syntax should provide a better, more readable equivalent of parsing a byte array of required size, i.e.
instead of
Implementing parsing + new internal KS data type + fallback solution shouldn't be that hard. The only hard way is to make a dictionary of pixel format encodings, but that's more or less already done for us in many libraries.
The text was updated successfully, but these errors were encountered: