Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Most Blob/Buf subtypes are specced, documented, broken and unusable #1209

Open
ghost opened this issue Oct 24, 2017 · 21 comments

Comments

Projects
None yet
9 participants
@ghost
Copy link

commented Oct 24, 2017

A picture is worth a thousand words, so I put together a test file. I'm not sure if all the assumptions made within are 100% correct, but most of the answers it produces are Obviously Wrong and ought to be fixed.

Can't find where I first reported this, I think I did back in… 2013 or 2014 or so? Oh well.

@zoffixznet

This comment has been minimized.

Copy link
Contributor

commented Oct 24, 2017

(just copy-pasting the test file into the Issue, in case original gets lost)

#!/usr/bin/env perl6
use v6;
use Test;
use experimental :pack;

plan 4 × (1 + (1 + 8) + (4 × 7));

my buf8  $byte;
my buf16 $short;
my buf32 $long;
my buf64 $huge;

lives-ok { $byte  .= new(0x12, 0x34, 0x56, 0x78, 0x9a, 0xbc, 0xde, 0xf0) };
lives-ok { $short .= new(0x12_34, 0x56_78, 0x9a_bc, 0xde_f0) };
lives-ok { $long  .= new(0x12_34_56_78, 0x9a_bc_de_f0) };
lives-ok { $huge  .= new(0x12_34_56_78_9a_bc_de_f0) };


sub test-casts(Buf $in) {
    lives-ok {
        is $in.new(my   int8 @ = $in).gist, $in.gist, "{$in.^name} roundtrips to int8 array";
        is $in.new(my  int16 @ = $in).gist, $in.gist, "{$in.^name} roundtrips to int16 array";
        is $in.new(my  int32 @ = $in).gist, $in.gist, "{$in.^name} roundtrips to int32 array";
        is $in.new(my  int64 @ = $in).gist, $in.gist, "{$in.^name} roundtrips to int64 array";
        is $in.new(my  uint8 @ = $in).gist, $in.gist, "{$in.^name} roundtrips to uint8 array";
        is $in.new(my uint16 @ = $in).gist, $in.gist, "{$in.^name} roundtrips to uint16 array";
        is $in.new(my uint32 @ = $in).gist, $in.gist, "{$in.^name} roundtrips to uint32 array";
        is $in.new(my uint64 @ = $in).gist, $in.gist, "{$in.^name} roundtrips to uint64 array";
    }, "{$in.^name} roundtrips to any native array type without dying";
}

sub test-unpack(Buf $in, Str $desc, Str $template, Int :$head, Int :$tail = $head) {
    my @quux;
    lives-ok { @quux = $in.unpack($template) }, "{$in.^name}.unpack('$template') works - $desc";
    is @quux.elems, $template.chars, 'yields correct list length';
    is @quux.head, $head, 'correct first unpacked value';
    is @quux.tail, $tail, 'correct last unpacked value';
}

for $byte, $short, $long, $huge {
    test-casts($_);
    test-unpack($_, '8-bit',                'c' x 8, head => 0x12,       tail => 0xf0      );
    test-unpack($_, '16-bit big-endian',    'n' x 4, head => 0x1234,     tail => 0xdef0    );
    test-unpack($_, '32-bit big-endian',    'N' x 2, head => 0x12345678, tail => 0x9abcdef0);
    test-unpack($_, '64-bit big-endian',    'Q',     head => 0x123456789abcdef0);
    test-unpack($_, '16-bit little-endian', 'v' x 4, head => 0x3412,     tail => 0xf0de    );
    test-unpack($_, '32-bit little-endian', 'V' x 2, head => 0x78563412, tail => 0xf0debc9a);
    test-unpack($_, '64-bit little-endian', 'q',     head => 0xf0debc9a78563412);
}
@lizmat

This comment has been minimized.

Copy link
Contributor

commented Oct 25, 2017

@jonathanstowe

This comment has been minimized.

Copy link
Contributor

commented Oct 25, 2017

I'd rather if the pack/unpack weren't deprecated with immediate effect as https://github.com/retupmoca/P6-Net-AMQP depends on them rather heavily ( as does the yet to be released MIDI module,) and I'd like to be able to test using PackUnpack (and remedy whichever needs fixing,) first.

But yeah, in principle I agree :)

@ghost

This comment has been minimized.

Copy link
Author

commented Oct 25, 2017

I had another question on how the endianness of Bufs is determined, but it seems like it's been made impossible to pass them to IO which previously worked.

I don't have any code that depends on this, but it's because I've got burned out on and thrown away more projects than I can remember, all because Perl 6 can't speak a common language with the rest of the world. I know I'm not the only one. Please don't make us wait another four years.

@jnthn

This comment has been minimized.

Copy link
Member

commented Oct 25, 2017

We really need some primitives in core for efficiently reading binary values (integers of various endianness, floats, etc.) out of a Blob/Buf, as well as writing them into a Buf. I'm sure the pack/unpack API is not the one I'd want to see there as it's too high-level (but. of course, it can live on in module space, done in terms of whatever primitives we provide).

These kinds of things surely are great candidates for JITting, so we'd want a design that pushes them down to the VM level. That in turn would mean they're available in NQP, which would be rather useful for replacing the current QAST -> MoarVM bytecode phase with something better (at the moment it makes a load of objects and then hands them off to C code that makes a load of unsafe layout assumptions about those objects in order to generate the bunch of bytes; it'd be better to just generate the bytes, saving time and memory).

@samcv

This comment has been minimized.

Copy link
Member

commented Oct 25, 2017

I'm against removing pack/unpack unless we have something to replace it. We really have a lack of ways to work with Buf in general. When I worked with Buf's like @flussence I was frustrated in that there are plenty of operators to do many other things in Perl 6, but working with Buf's is not one of them.

Plenty of other languages can pack/unpack such as Python, Ruby, and Perl 5 can do this so I am against removing pack/unpack.

@jnthn

This comment has been minimized.

Copy link
Member

commented Oct 25, 2017

I'm against removing pack/unpack unless we have something to replace it.

Yes; to be clear, my comment that these aren't where I want us to end up (at least, not at a primitives level) wasn't intended to suggest we should deprecate, let alone remove, them now. We'll need a migration period, and we can't have that until we have something to migrate to.

@niner

This comment has been minimized.

Copy link
Contributor

commented Oct 26, 2017

We already have support for structs in NativeCall. And structs are all about mixed data layed out in memory. The actual layout is different to what we'd see in binary storage formats, but I like the interface.

So what about adding some possibility for associating a CStruct like class (just without implicit padding) with a part of a blob at some offset? Instead of inventing yet another sublanguage like with pack, we'd reuse existing syntax and one can use it just like any other Perl 6 object.

So to read an ELF file (executable format on Linux), you'd associate an ELF-Header object with the blob at offset 0. The object starts with { has CArray[int8] $.e_ident_mag is length(4); has $int8 $.e_ident_class; ... }. If $.e_ident_mag.decode('ASCII') eq "\x[7F]ELF" it's an ELF file. The e_phoff field in this object contains the offset of the program header table. So that's where you attach an ELF-Program-Header object and so on.

Underneath this would probably use the primitives jnthn++ described. The declarative properties of the user facing API should give us a lot of freedom in designing the internals and in optimizing the running code.

@ab5tract

This comment has been minimized.

Copy link
Contributor

commented Oct 26, 2017

@niner

This comment has been minimized.

Copy link
Contributor

commented Oct 26, 2017

Actually with a new repr for packed structures, we shouldn't even need new ops in the VM. We do use the same plain old data types and the repr already tells the VM how the object is layed out in memory. We'd really just need a CStruct repr without the alignment.

Or we add a way to make alignment of CStructs optional. AFAICT locations of attributes are calculated when the class is composed (compute_allocation_strategy called by compose) so this wouldn't even add any branches to runtime code like get_attribute and bind_attribute. Associating the object with the blob would just mean to let the CStruct's cstruct pointer point at the offset in VMArray's slots array.

@samcv

This comment has been minimized.

Copy link
Member

commented Oct 28, 2017

@flussence the test-casts() function no longer fails when I make sure it doesn't test types which can't fit into each other (trying to fit a larger bit size into a smaller one).

You seem to be testing a 'c' directive which does not exist. Our directives work on quantities of items. So if you use buf8, then you would use A directive to extract a char.

I think a misconception about endianess you may have, is that these are arrays of specific uint sizes. So if you tried to process buf16 as big endian, it is processing each uint16 bit number in big endian order, not the specific bytes. I'd be glad to try and solve problems you are having but let me know what you need to do that you have been unable to do so I can propose a solution. Thanks!

@bbkr

This comment has been minimized.

Copy link

commented May 24, 2018

I'd love to see new API for packing/unpacking buffers. I'm working now on GeoIP2 interface and I made a benchmark to find best way to decode Buf-> uint which is used extensively in their file format: bbkr/GeoIP2#15. It shows that current unpack is miserably slow with 9K/s while nativecast version reaches 269K/s. Huge space for improvement.

Idea: When refactoring please short-circuit unpack with IO::*, for example:

my $some_unpacked_value = $handle.unpack(uint32);

This way read and unpack are in the same method, without intermediate Buf object creation.
And without need to specify how many bytes to read because unpacked type determines it.

@lizmat

This comment has been minimized.

Copy link
Contributor

commented May 24, 2018

@bbkr

This comment has been minimized.

Copy link

commented May 25, 2018

I've tried P5pack and performance is still to low to be usable. In my benchmark it gets 9K/s uint32 unpacks, very similar to core unpack result. That's nowhere near 269K/s of nativecast version. And Perl 5 does... 8100K/s in the same test.

I doubt that separate module will ever be able to provide satisfying pack/unpack performance without low level magic. For me pack/unpack is the very crucial feature of language designed for data parsing and processing. Please keep it in core and make it uber-fast :)

@jnthn

This comment has been minimized.

Copy link
Member

commented May 25, 2018

fwiw, I've started drafting up an nqp::op API for dealing with binary data efficiently (though am too busy for the next couple of weeks to work much on it). Over the summer, I'm hoping to get it implemented and then (hopefully with help :-)) replace the current Rakudo code-gen phase (build a huge tree of nodes, hand it to C code to assemble the bytecode) with just generating the bytes directly from NQP code. That should reduce memory consumption a good bit, and hopefully be measurably faster. The work that results from that will also be able to back some far more optimal Perl 6 binary data handling API too.

@bbkr

This comment has been minimized.

Copy link

commented May 25, 2018

Awesome!

BTW: Have you considered changing this 'UqL*v' P5 patterns to something more readable like for example qp:little-endian/ uint32 Str[16] uint32 ** 5 /? This pack/unpack mini-language deserves refactoring.

@jnthn

This comment has been minimized.

Copy link
Member

commented May 25, 2018

I'd rather look at the problem space afresh rather than start from the P5 pack/unpack approach.

@niner

This comment has been minimized.

Copy link
Contributor

commented May 26, 2018

jnthn: what do you think about my packed CStruct based suggestion above?

@lizmat

This comment has been minimized.

Copy link
Contributor

commented May 26, 2018

@MasterDuke17

This comment has been minimized.

Copy link
Contributor

commented Mar 22, 2019

@nine++ recently implemented (most? all? of) @jnthn++'s binary nqp::op API. Maybe now is a good time to revisit some of these ideas.

@jnthn

This comment has been minimized.

Copy link
Member

commented Mar 22, 2019

Not only did @niner++ implement most of that, but also got them JITted, and re-implemented the code-gen layer of the compiler using it. Granted it's a bit apples and oranges since we eliminated a pass along the way, but still, the new binary ops are fast enough that we didn't lose any performance when we switched to the new compiler backend using them. They're also exposed on Buf, so available in Rakudo now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.