Skip to content

scott-griffiths/bitformat

Repository files navigation

⚠️ This project is currently in the planning stage. The documentation is only partially accurate and there are lots of bugs and missing features!

bitformat

CI badge Docs


bitformat is a Python module for creating and parsing file formats, especially at the bit rather than byte level.

It is intended to complement the bitstring module from the same author, and uses its Dtype, Bits and Array classes as the basis for building complex bit formats.


Features

  • A bitformat is a specification of a binary format using fields that can say how to build it from supplied values, or how to parse binary data to retrieve those values.
  • A wide array of data types is supported. Want to use a 13 bit integer or an 8-bit float? Fine - there are no special hoops to jump through.
  • Several field types are available:
    • The simplest is just a Field which contains a single data type, and either a single value or an array of values. These can usually be constructed from just a string.
    • A Format contains a list of other fields. These can be nested to any depth.
    • Fields like Repeat, Find and Condition can be used to add more logical structure.
  • The values of other fields can be used in later calculations via an f-string-like expression syntax.
  • Data is always stored efficiently as a contiguous array of bits.

An Example

A quick example: the MPEG-2 video standard specifies a 'sequence_header' that could be defined in bitformat by

seq_header = Format(['sequence_header_code: hex32 = 0x000001b3',
                     'horizontal_size_value: u12',
                     'vertical_size_value: u12',
                     'aspect_ratio_information: u4',
                     'frame_rate_code: u4',
                     'bit_rate_value: u18',
                     'marker_bit: bool',
                     'vbv_buffer_size_value: u10',
                     'constrained_parameters_flag: bool',
                     'load_intra_quantizer_matrix: bool',
                     Repeat('{load_intra_quantizer_matrix}',
                         'intra_quantizer_matrix: [u8; 64]'),
                     'load_non_intra_quantizer_matrix bool',
                     Repeat('{load_non_intra_quantizer_matrix}',
                         'non_intra_quantizer_matrix: [u8; 64]'),
                     Find('0x000001')
                     ], 'sequence_header')

To parse such a header you can write simply

seq_header.parse(some_bytes_object)

then you can access and modify the field values

seq_header['bit_rate_value'].value *= 2

before rebuilding the binary object

b = seq_header.build()

Copyright (c) 2024 Scott Griffiths

About

A Python module for creating and parsing file formats, especially at the bit rather than byte level.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages