Skip to content

Perl 5 Module for Bifcode serialization format

Notifications You must be signed in to change notification settings

mlawren/p5-Bifcode

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NAME
    Bifcode2 - encode and decode Bifcode2 serialization format

VERSION
    2.0.0_15 (yyyy-mm-dd)

SYNOPSIS
        use utf8;
        use boolean;
        use Bifcode2 qw( encode_bifcode2 decode_bifcode2 );

        my $bifcode = encode_bifcode2 {
            bools   => [ boolean::false, boolean::true, ],
            bytes   => \pack( 's<',       255 ),
            integer => 25,
            real    => 1.25e-5,
            null    => undef,
            utf8    => "Ελύτη",
        }

        # 7b 75 35 2e 62 6f 6f 6c 73 3a 5b 66 2c 74 2c    {u5.bools:[f,t,
        # 5d 75 35 2e 62 79 74 65 73 3a 62 32 2e ff  0    ]u5.bytes:b2...
        # 2c 75 37 2e 69 6e 74 65 67 65 72 3a 69 32 35    ,u7.integer:i25
        # 2c 75 34 2e 6e 75 6c 6c 3a 7e 2c 75 34 2e 72    ,u4.null:~,u4.r
        # 65 61 6c 3a 72 31 2e 32 35 65 2d 35 2c 75 34    eal:r1.25e-5,u4
        # 2e 75 74 66 38 3a 75 31 30 2e ce 95 ce bb cf    .utf8:u10......
        # 8d cf 84 ce b7 2c 7d                            .....,}

        my $decoded = decode_bifcode2 $bifcode;

DESCRIPTION
    Bifcode2 implements the *Bifcode2* serialisation format, a mixed
    binary/text encoding with support for the following data types:

    *   Primitive:

        *   Undefined(null)

        *   Booleans(true/false)

        *   Not a Number (NaN)

        *   Integer numbers

        *   Real numbers

        *   +/- Infinity

        *   UTF8 strings

        *   Binary strings

    *   Structured:

        *   Arrays(lists)

        *   Hashes(dictionaries)

    The encoding is simple to construct and relatively easy to parse.
    There is no need to escape special characters in strings. It is not
    considered human readable, but as it is mostly text it can usually
    be visually debugged.

        +---------+--------------------+--------------------+
        | Type    | Perl               | Bifcode2           |
        +---------+--------------------+--------------------+
        | UNDEF   | undef              | ~,                 |
        | TRUE    | boolean::true      | t,                 |
        | FALSE   | boolean::false     | f,                 |
        | NAN     | use bignum;  NaN() | N,                 |
        | INTEGER | -1                 | i-1,               |
        | INTEGER | 0                  | i0,                |
        | INTEGER | 1                  | i1,                |
        | REAL    | 3.1415             | r3.1415e0,         |
        | REAL    | 1.380649e-23       | r1.380649e-23,     |
        | INF     | use bignum;  inf() | +,                 |
        | NEGINF  | use bignum; -inf() | -,                 |
        | BYTES   | $TWO_BYTE_STR      | b2.��,             |
        | UTF8    | 'Plain ASCII'      | u11.Plain ASCII,   |
        | UTF8    | 'MIXΣD ƬΣXƬ'       | u14.MIXΣD ƬΣXƬ,    |
        | ARRAY   | [ 'one', 'two' ]   | [u3.one,u3.two,]   |
        | DICT    | { key => 'value'}  | {u3.key:u5.value,} |
        | BIFCODE | $BIFCODE           | B8.$BIFCODE,       |
        +---------+--------------------+--------------------+

    *Bifcode2* can only be constructed canonically; i.e. there is only
    one possible encoding per data structure. This property makes it
    suitable for comparing structures (using cryptographic hashes)
    across networks.

    In terms of size the encoding is similar to minified JSON. In terms
    of speed this module compares well with other pure Perl encoding
    modules with the same features.

MOTIVATION
    *Bifcode* was created for a project because none of currently
    available serialization formats (Bencode, JSON, MsgPack, Netstrings,
    Sereal, YAML, etc) met the requirements of:

    *   Support for undef

    *   Support for binary data

    *   Support for UTF8 strings

    *   Universally-recognized canonical form for hashing

    *   Trivial to construct on the fly from SQLite triggers

    I have no lofty goals or intentions to promote this outside of my
    specific case, but would appreciate hearing about other uses or
    implementations.

SPECIFICATION
    The encoding is defined as follows:

  BIFCODE_UNDEF
    A null or undefined value correspond to "~,".

  BIFCODE_TRUE and BIFCODE_FALSE
    Boolean values are represented by "t," and "f,".

  BIFCODE_NAN
    Not a number (NaN) is represented by "N,".

  BIFCODE_INF and BIFCODE_NEGINF
    Positive and negative infinity represented by "+," and "-,".

  BIFCODE_UTF8
    A UTF8 string is "u" followed by the octet length of the encoded
    string as a base ten number followed by a "." and the encoded string
    followed by ",". For example the Perl string "\x{df}" (ß)
    corresponds to "u2.\x{c3}\x{9f},".

  BIFCODE_BYTES
    Opaque data is 'b' followed by the octet length of the data as a
    base ten number followed by a "." and then the data itself followed
    by ",". For example a three-byte blob 'xyz' corresponds to
    'b3.xyz,'.

  BIFCODE_INTEGER
    Integers are represented by an 'i' followed by the number in base 10
    followed by a ','. For example 'i3,' corresponds to 3 and 'i-3,'
    corresponds to -3. Integers have no size limitation. 'i-0,' is
    invalid. All encodings with a leading zero, such as 'i03,', are
    invalid, other than 'i0,', which of course corresponds to 0.

  BIFCODE_REAL
    Real numbers are represented by an 'r' followed by a decimal number
    in base 10 followed by a 'e' followed by an exponent followed by a
    ','. For example 'r3.0e-1,' corresponds to 0.3 and 'r-0.1e0,'
    corresponds to -0.1. Reals have no size limitation. 'r-0.0e0,' is
    invalid. All encodings with an extraneous leading zero, such as
    'r03.0e0,', or an extraneous trailing zero, such as 'r3.10e0,', are
    invalid.

  BIFCODE_LIST
    Lists are encoded as a '[' followed by their elements (also
    *Bifcode2* encoded) followed by a ']'. For example
    '[u4.spam,u4.eggs,]' corresponds to ['spam', 'eggs'].

  BIFCODE_DICT
    Dictionaries are encoded as a '{' followed by a list of alternating
    keys and their corresponding values followed by a '}'. Keys must be
    of type BIFCODE_UTF8 or BIFCODE_BYTES and are encoded with a ":" as
    the last character instead of ",".

    For example, '{u3.cow:u3.moo,u4.spam:u4.eggs,}' corresponds to
    {'cow': 'moo', 'spam': 'eggs'} and '{u4.spam:[u1.a,u1.b,]}'
    corresponds to {'spam'. ['a', 'b']}. Keys must appear in sorted
    order (sorted as raw strings, not alphanumerics).

  BIFCODE_BIFCODE
    A Bifcode string is "B" followed by the octet length of the encoded
    string as a base ten number followed by a "." and the encoded string
    followed by ",". This is typically used to frame Bifcode structures
    over a network.

INTERFACE
  "encode_bifcode2( $datastructure [, $enclose ] )"
    The first argument (required) may be a scalar, or may be a reference
    to either a scalar, an array, or a hash. Arrays and hashes may in
    turn contain values of these same types. Returns the appropriate
    BIFCODE_* byte string. If $enclose is true then the result is
    further encoded as BIFCODE_BIFCODE.

    The mapping from Perl to *Bifcode2* is as follows:

    *   'undef' maps directly to BIFCODE_UNDEF.

    *   The "true" and "false" values from the boolean distribution
        encode to BIFCODE_TRUE and BIFCODE_FALSE.

    *   A plain scalar is treated as follows:

        *   BIFCODE_UTF8 if "utf8::is_utf8" returns true; or

        *   BIFCODE_INTEGER if it looks like a canonically represented
            integer; or

        *   BIFCODE_REAL if it looks like a real number; or

        *   BIFCODE_UTF8 if it only contains ASCII characters; or

        *   BIFCODE_BYTES when none of the above applies.

        You can force scalars to be encoded a particular way by passing
        a reference to them blessed as Bifcode2::BYTES,
        Bifcode2::INTEGER, Bifcode2::REAL or Bifcode2::UTF8. The
        "force_bifcode2" function below can help with creating such
        references.

    *   SCALAR references become BIFCODE_BYTES.

    *   ARRAY references become BIFCODE_LIST.

    *   HASH references become BIFCODE_DICT.

    Integers and floats under "bignum" scope are handled transparently,
    but do not always produce the same encoding you get without:

        encode_bifcode(100.2); # r100.2e0
        {
            use bignum;
            encode_bifcode(100.2); # r1.002e2
        }

    The reason is that "reading" Math::BigFloat forces conversion to a
    standardized format (e.g. scientific, engineering, etc).

    This subroutine croaks on unhandled data types.

  "decode_bifcode2( $string [, $max_depth ] )"
    Takes a byte string and returns the corresponding deserialised data
    structure.

    If you pass an integer for the second option, it will croak when
    attempting to parse dictionaries nested deeper than this level, to
    prevent DoS attacks using maliciously crafted input.

    *Bifcode2* types are mapped back to Perl in the reverse way to the
    "encode_bifcode2" function, except for:

    *   Any scalars which were "forced" to a particular type (using
        blessed references) will decode as plain scalars.

    *   BIFCODE_BIFCODE types are fully inflated into Perl structures,
        and not the intermediate *Bifcode2* byte string.

    *   Large numbers encoded under "bignum" or similar scope are not
        currently detected and get converted back to floats (for
        originally large integers) or have less precision (for
        originally large floats).

    Croaks on malformed data.

  "force_bifcode2( $scalar, $type )"
    Returns a reference to $scalar blessed as Bifcode::$TYPE. The value
    of $type is not checked, but the "encode_bifcode2" function will
    only accept the resulting reference where $type is one of 'bytes',
    'real', 'integer' or 'utf8'.

  "diff_bifcode2( $bc1, $bc2, [$diff_args] )"
    Returns a string representing the difference between two bifcodes.
    The inputs do not need to be valid Bifcode; they are only expanded
    with a very simple regex before the diff is done. The third argument
    ($diff_args) is passed directly to Text::Diff.

    Croaks if Text::Diff is not installed.

  AnyEvent::Handle Support
    Bifcode2 implements the AnyEvent::Handle "anyevent_read_type" and
    "anyevent_write_type" functions which allow you to do this:

        $handle->push_write( 'Bifcode2' => { your => 'structure here' } );

        $handle->push_read(
            'Bifcode2' => sub {
                my ( $hdl, $ref ) = @_;
                # do stuff with $ref
            },
            $maxdepth   # passed straight to decode_bifcode2()
        );

DIAGNOSTICS
    The following exceptions may be raised by Bifcode2:

    Bifcode2::Error::Decode
        Your data is malformed in a non-identifiable way.

    Bifcode2::Error::DecodeBytes
        Your data contains a byte string with an invalid length.

    Bifcode2::Error::DecodeBytesTrunc
        Your data includes a byte string declared to be longer than the
        available data.

    Bifcode2::Error::DecodeBytesTerm
        Your data includes a byte string that is missing a ","
        terminator.

    Bifcode2::Error::DecodeDepth
        Your data contains dicts or lists that are nested deeper than
        the $max_depth passed to "decode_bifcode2()".

    Bifcode2::Error::DecodeTrunc
        Your data is truncated.

    Bifcode2::Error::DecodeReal
        Your data contained something that was supposed to be a real but
        didn't make sense.

    Bifcode2::Error::DecodeRealTrunc
        Your data contains a real that is truncated.

    Bifcode2::Error::DecodeInteger
        Your data contained something that was supposed to be an integer
        but didn't make sense.

    Bifcode2::Error::DecodeIntegerTrunc
        Your data contains an integer that is truncated.

    Bifcode2::Error::DecodeKeyType
        Your data violates the *Bifcode2* format constaint that all dict
        keys be BIFCODE_BYTES or BIFCODE_UTF8.

    Bifcode2::Error::DecodeKeyDuplicate
        Your data violates the *Bifcode2* format constaint that all dict
        keys must be unique.

    Bifcode2::Error::DecodeKeyOrder
        Your data violates the *Bifcode2* format constaint that dict
        keys must appear in lexical sort order.

    Bifcode2::Error::DecodeKeyValue
        Your data contains a dictionary with an odd number of elements.

    Bifcode2::Error::DecodeTrailing
        Your data does not end after the first *Bifcode2*-serialised
        item.

    Bifcode2::Error::DecodeUTF8
        Your data contained a UTF8 string with an invalid length.

    Bifcode2::Error::DecodeUTF8Trunc
        Your data includes a string declared to be longer than the
        available data.

    Bifcode2::Error::DecodeUTF8Term
        Your data includes a UTF8 string that is missing a ","
        terminator.

    Bifcode2::Error::DecodeUsage
        You called "decode_bifcode2()" with invalid arguments.

    Bifcode2::Error::DiffUsage
        You called "diff_bifcode2()" with invalid arguments.

    Bifcode2::Error::EncodeBytesUndef
        You attempted to encode "undef" as a byte string.

    Bifcode2::Error::EncodeReal
        You attempted to encode something as a real that isn't
        recognised as one.

    Bifcode2::Error::EncodeRealUndef
        You attempted to encode "undef" as a real.

    Bifcode2::Error::EncodeInteger
        You attempted to encode something as an integer that isn't
        recognised as one.

    Bifcode2::Error::EncodeIntegerUndef
        You attempted to encode "undef" as an integer.

    Bifcode2::Error::EncodeUTF8Undef
        You attempted to encode "undef" as a UTF8 string.

    Bifcode2::Error::EncodeUnhandled
        You are trying to serialise a data structure that contains a
        data type not supported by the *Bifcode2* format.

    Bifcode2::Error::EncodeUsage
        You called "encode_bifcode2()" with invalid arguments.

    Bifcode2::Error::ForceUsage
        You called "force_bifcode2()" with invalid arguments.

BUGS AND LIMITATIONS
    Strings and numbers are practically indistinguishable in Perl, so
    "encode_bifcode2()" has to resort to a heuristic to decide how to
    serialise a scalar. This cannot be fixed.

SEE ALSO
    This distribution includes the diff-bifcode2 command-line utility
    for comparing *Bifcode2* in files.

    Bifcode implements the original (experimental) encoding.

AUTHOR
    Mark Lawrence <nomad@null.net>, heavily based on Bencode by
    Aristotle Pagaltzis <pagaltzis@gmx.de>

COPYRIGHT AND LICENSE
    This software is copyright (c):

    *   2015 by Aristotle Pagaltzis

    *   2017-2022 by Mark Lawrence.

    This is free software; you can redistribute it and/or modify it
    under the same terms as the Perl 5 programming language system
    itself.

About

Perl 5 Module for Bifcode serialization format

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages