CBOR codec extension for PHP
This extension makes it possible to encode/decode CBOR data defined in RFC 8949 on PHP.
This extension itself is not a PHP instance serializer as much as CBOR data format itself is not.
BSD 2-Clause License
phpize
./configure --enable-cbor
make install
See Releases for the Windows binaries.
function cbor_encode(
mixed $value,
int $flags = CBOR_BYTE | CBOR_KEY_BYTE,
?array $options = null,
): string;
function cbor_decode(
string $data,
int $flags = CBOR_BYTE | CBOR_KEY_BYTE,
?array $options = null,
): mixed;
Encodes to or decodes from a CBOR data item.
try {
echo bin2hex(cbor_encode(['binary', 1, 2, null])), "\n";
// 844662696e6172790102f6
var_export(cbor_decode(hex2bin('844662696e6172790102f6')));
// array (
// 0 => 'binary',
// 1 => 1,
// 2 => 2,
// 3 => NULL,
// )
} catch (Cbor\Exception $e) {
echo $e->getMessage(), "\n";
}
When decoding, the CBOR data item must be a single item, or the function will throw an exception with code CBOR_ERROR_EXTRANEOUS_DATA
.
This means this function cannot decode the CBOR sequences format defined in RFC 8742.
See Decoder
class for sequences and progressive decoding.
$options
array elements are:
-
'max_depth'
(default:64
; range:0
..10000
) Maximum number of nesting levels to process. To handle arrays/maps/tags, at least 1 depth is required. -
'max_size'
: (default:65536
; range:0
..0xffffffff
) Decode: Maximum number of elements for array and map to be processed.Depending on the actual limit set by PHP, the value may be lowered.
-
'offset'
(default:0
; range:0
..PHP_INT_MAX
) Decode: Offset of the data to start decoding from. Offset cannot go beyond the length of the data. -
'length'
(default:null
; range:null
|0
..PHP_INT_MAX
) Decode: Length of the data to decode from the offset.null
means the whole remaining data. Length cannot go beyond the available length of the data.Although you cannot decode an empty string,
0
is valid as an option value.
See "Supported Tags" below for the following options:
-
'datetime'
,'bignum'
,'decimal'
:- Encode: default:
true
; values:bool
- Encode: default:
-
'string_ref'
:- Encode: default:
false
; values:bool
|'explicit'
- Decode: default:
true
; values:bool
- Encode: default:
-
'shared_ref'
:- Encode: default:
false
; values:bool
|'unsafe_ref'
- Decode: default:
false
; values:bool
|'shareable'
|'shareable_only'
|'unsafe_ref'
- Encode: default:
Unknown key names are silently ignored.
You can use CBOR_CDE
encoding flag to let encode data satisfy core deterministic encoding requirements.
CBOR types can have multiple binary sequences to express the same value, but they are always encoded in the smallest possible form (well-formed) regardless of the flag.
The exception is float
value (see below).
The CBOR_CDE
flag enforces the CBOR_MAP_NO_DUP_KEY
, CBOR_FLOAT16
and CBOR_FLOAT32
flags, while the CBOR_UNSAFE_TEXT
flag can be used regardless.
The flag cannot be used with options that make CBOR data contextual, which are mentioned later.
With the decoding flag CBOR_EDN
, CBOR data (in binary) is decoded to Extended Diagnostic Notation (EDN) string
defined in RFC 8610 appendix G. (It does not decode EDN string.)
This can be used to inspect CBOR data if something is wrong with it.
var_dump(cbor_decode(hex2bin('83010243030405'), CBOR_EDN)); // string(17) "[1, 2, h'030405']"
$flags
except CBOR_SELF_DESCRIBE
and $options
for decoding are ignored for this mode.
Formatting $options
can be specified:
-
'indent'
(default:false
; values:false
|0
..16
|"\t"
) Number of space characters for indentation.false
for no pretty-printing (one-line)."\t"
to use a single tab character. -
'space'
(default:true
; values:bool
) Whether to insert a space character after separators to improve readability. -
'byte_space'
(default:0
; values:0
..63
) Add space for every 1/2/4/8/16/32 bytes ofbyte string
. For example, if5
(i.e.1 | 4
) is specified,h'112233445566...'
will beh'11 22 33 44 55 66 ...'
. -
'byte_wrap'
(default:false
; values:false
|1
..1024
) Break downbyte string
into multipleh'...'
notation for every specified length.
Note that when CBOR_EDN
is specified, the function will not throw an exception for data error (such as truncated data). It will instead print an error in the returned string as a comment.
Invalid UTF-8 sequences are represented as byte string fragments inside text string, along with consecutive ASCII control characters.
When encoding classes that implement Cbor\Serializable
, the encoder will call cborSerialize()
.
Implementers may return data structure to serialize the instance, or throw an Exception to stop serializing.
Although some classes such as Traversable
and PSR-7 UriInterface
are serializable by default as described below, Serializable
can take precedence.
stdClass
plain object is serialized to map
by default.
When the encoder encounters a Cbor\EncodeParams
instance, it encodes the $value
instance variable with the specified $params
flags and options added to the current flags and options. After encoding the inner $value
, those parameters are back to the previous state.
This can be useful when you want to enforce specific parameters partially.
$params
array elements are:
-
'flags' => int
: Encoding flags to set. -
'flags_clear' => int
: Encoding flags to clear.Flags in
'flags_clear'
are cleared first then flags in'flags'
are set. Note that you don't need to clear conflicting string flags, i.e.CBOR_TEXT
is cleared when settingCBOR_BYTE
and vice versa. The same applies forCBOR_KEY_*
string flags.You cannot clear
CBOR_CDE
flag. -
Other
$options
values for encoding.You cannot change
'max_depth'
or options that make CBOR data contextual.
Unknown or unsupported key names are silently ignored.
The class Cbor\Decoder
can do what cbor_decode()
does in a more controlled way.
Instantiate Decoder
with the optional $flags
and $options
, then feed CBOR data with add()
method. Decoder
will append passed data to the internal buffer.
The 'offset'
and 'length'
options have no effects for this class. You may specify those as parameters of add()
instead.
The parameters act like those of substr()
PHP function.
To process data in the buffer, call process()
. It will return true
if a data item is decoded. You can test it with hasValue()
method too. Call getValue()
to retrieve the decoded value.
If another data item follows (CBOR sequences), call add()
and/or process()
again.
process()
will not return ture
until a complete item is decoded. Until that, isPartial()
returns true
and you need to feed more data to complete the decode.
getBuffer()
returns a copy of the internal buffer. Note that it doesn't contain data that is partially consumed by process()
.
Progressive decoding
With the class, large data can be decoded progressively without loading the whole data in memory at once:
if (!($fp = fopen($filePath, 'rb'))) {
throw new RuntimeException('Cannot open file');
}
$decoder = new Cbor\Decoder();
while (!feof($fp)) {
$data = fread($fp, 32768);
if ($data === false) {
throw new RuntimeException('Cannot read file');
}
$decoder->add($data);
if ($decoder->process()) {
break;
}
}
if (!feof($fp) || $decoder->getBuffer() !== '') {
throw new RuntimeException('Extraneous data.');
}
if (!$decoder->hasValue() || $decoder->isPartial()) {
throw new RuntimeException('Data is truncated.');
}
fclose($fp);
$value = $decoder->getValue();
var_dump($value);
Decode CBOR sequences
Decoder
can decode CBOR sequences which cbor_decode()
complains:
function cbor_decode_seq(string $data, int $flags = CBOR_BYTE | CBOR_KEY_BYTE, ?array $options = null): Generator
{
$decoder = new Cbor\Decoder($flags, $options);
$decoder->add($data);
while ($decoder->process()) {
yield $decoder->getValue();
}
if ($decoder->isPartial()) {
throw new Cbor\Exception('Data is truncated.', CBOR_ERROR_TRUNCATED_DATA);
}
}
CBOR unsigned integer
and negative integer
are translated to PHP int
.
The value must be within the range PHP can handle (PHP_INT_MIN
..PHP_INT_MAX
).
This is -2**63..2**63-1 on 64-bit PHP, which is narrower than CBOR's -2**64..2**64-1.
If decoding data contains out-of-range value, an exception is thrown.
CBOR float
has three sizes. 64-bit values are translated to PHP float
.
32-bit values and 16-bit values are decoded to PHP Cbor\Float32
and Cbor\Float16
respectively (derived from Cbor\FloatX
).
But if the flags CBOR_FLOAT32
and/or CBOR_FLOAT16
are passed, they are decoded to PHP float
.
When encoding PHP float
, values are stored with 64-bit value.
If either of the flags is specified, values are stored in that size.
If both flags are set, the smallest possible type is used; therefore no informational loss is expected.
For Cbor\Float32
type, the size of the value is retained even if it can be expressed in half-precision type. The CBOR_CDE
flag overrides this behavior to enforce the requirements.
CBOR has two types of strings: byte string
(binary octets) and text string
(UTF-8 encoded octets).
PHP string
type does not have this distinction.
If you specify the CBOR_BYTE
flag (default) and/or the CBOR_TEXT
flag on decoding, those strings are decoded to PHP string
. If the flags are not specified, strings are decoded to Cbor\Byte
and Cbor\Text
objects respectively (derived from Cbor\XString
).
On encoding PHP string
, you must specify either of the flags so that the extension can encode strings to CBOR strings. The default is CBOR_BYTE
.
The flags CBOR_KEY_BYTE
and CBOR_KEY_TEXT
are for strings of CBOR map
keys.
If text string
is not a valid UTF-8 sequence, an exception is thrown unless you pass CBOR_UNSAFE_TEXT
flag.
CBOR array
is translated to PHP array
.
If PHP array
has holes or string
keys (i.e. the array is not a "list"), it is encoded to CBOR map
.
CBOR map
is translated to PHP stdClass
object.
If the CBOR_MAP_AS_ARRAY
flag is passed when decoding, it is translated to PHP array
instead.
Keys must be of CBOR string
type.
The extension may accept CBOR integer
keys if the CBOR_INT_KEY
flag is passed. Likewise with the flag, it will encode PHP int
key (including integer numeric string
keys in the range of CBOR integer
) as CBOR integer
key.
If the CBOR_MAP_NO_DUP_KEY
flag is specified on decoding, encountering a duplicated key will throw an exception instead of overriding the former value. This may happen on valid CBOR map
; e.g. all of unsigned integer 1
, text string "1"
, and byte string '1'
may be the same key for PHP.
If the CBOR_CDE
flag is specified on encoding, keys are sorted in the bytewise lexicographic order.
CBOR tag
is translated to PHP Cbor\Tag(int $tag, mixed $content)
object.
Tag is a marker to mark data (including another tag) as some type using an unsigned integer
.
You can consult CBOR tag registry for valid tags.
Tags consume one 'max_level'
for each nesting.
Also see "Supported Tags" below.
CBOR has null
and undefined
value, but PHP does not have undefined
value.
CBOR undefined
is translated to PHP Cbor\Undefined
singleton object.
This object is evaluated as false
in a boolean context. It is not equal to null
.
$undefined = Cbor\Undefined::get();
var_dump($undefined === clone $undefined); // true
Instance of a class that implements Traversable
is encoded to map
.
If the class does not implement Countable
, the instance is encoded to an indefinite-length map
unless the CBOR_CDE
flag is specified.
The CBOR_INT_KEY
flag does not take effect on encoding Traversable
objects, and the key is encoded according to the actual type.
Note: In this section, tag names are written as {tag-name} for clarity.
Option:
'datetime'
:- Encode: default:
true
; values:bool
- Encode: default:
Constant:
Cbor\Tag::DATETIME
If the option is enabled, an instance of DateTimeInterface
is encoded as a text string
with a {date/time} tag.
Since the tag only stores datetime with timezone offset, regional information will be lost.
Option:
'bignum'
:- Encode: default:
true
; values:bool
- Encode: default:
Constants:
Cbor\Tag::BIGNUM_U
Cbor\Tag::BIGNUM_N
If the option is enabled, an instance of GMP
is encoded as a byte string
with {bignum} tag.
Note: If the value is within CBOR integer
range, it is encoded as an integer
. (preferred serialization)
Option:
'decimal'
:- Encode: default:
true
; values:bool
- Encode: default:
Constant:
Cbor\Tag::DECIMAL
If the option is enabled, an instance of Decimal\Decimal
is encoded as an array
of integer mantissa and exponent with {decimal} tag.
Although the precision is retained, the maximum precision specified on instance creation is lost.
Option:
'uri'
:- Encode: default:
true
; values:bool
- Encode: default:
Constant:
Cbor\Tag::URI
If the option is enabled, an instance of class that implements PSR-7 UriInterface
is encoded as a text string
with {uri} tag.
Flag:
CBOR_SELF_DESCRIBE
- default:
false
- default:
Constants:
Cbor\Tag::SELF_DESCRIBE
CBOR_TAG_SELF_DESCRIBE_DATA
Self-Described CBOR is CBOR data that has this tag for the data. This 3-byte binary string (magic pattern) can be used to distinguish CBOR from other data including Unicode text encodings. This is useful if data loader need to identify the format by data itself.
$isCbor = str_starts_with($data, CBOR_TAG_SELF_DESCRIBE_DATA);
On encoding if the flag is set, the tag is prepended to the encoded data. On decoding if the flag is not set, the tag is skipped even if one exists. If the flag is set, the tag is retained in the decoded value, meaning you need to test if the root is this tag to extract the real content.
If the tag is to be prepended/skipped, it is handled specially and not counted as a 'max_depth'
level.
* This tag is not in the RFC but registered in the CBOR Tags registry.
Option:
'string_ref'
:- Encode: default:
false
; values:bool
|'explicit'
- Decode: default:
true
; values:bool
- Encode: default:
Constants:
Cbor\Tag::STRING_REF_NS
Cbor\Tag::STRING_REF
The tag {stringref} is like a compression, that "references" the string previously appeared inside {stringref-namespace} tag. Note that it differs from PHP's reference to string
; i.e. not the concept of $stringRef = &$string
.
On encode, it can save payload size by replacing the string already seen with the tag + index (or at the worst case increase by 3-bytes overall when single-namespaced).
Note that if smaller payload is desired, it should perform better to apply a data compression instead of this tag.
Also, for maps, small integer
keys (1
..23
, -1
..-24
are all 1-byte key) are often used to minimize a payload. You may want to avoid defining key 0
as it sometimes means the empty as a data definition convention. And also because when working with CBOR_MAP_AS_ARRAY
flag, it can transform array to a "list" array in PHP.
On decode, the use of tag can save memory of decoded value because of copy-on-write; PHP can share the identical string
sequences until one of them is going to be modified.
For decoding, the option is enabled as true
by default, while encoding it should be specified explicitly.
If true
is specified on encoding, data is always wrapped with {stringref-namespace} tag. It initializes the string index table (like compression dictionary) for the content inside the tag.
The {stringref-namespace} tag added implicitly is handled specially and not counted as 'max_depth'
level.
Similarly, 'explicit'
makes {stringref} active but the root namespace is not implicitly created, meaning {stringref} is not created on its own.
The use of this tag makes CBOR contextual. CBOR data that use {stringref} can be embedded in other CBOR. But data that doesn't use it cannot always be embedded safely in {stringref} CBOR, because it will corrupt reference indices of the following strings. (As of now, the extension cannot embed raw CBOR data on encoding though.)
Decoders without the support of this tag cannot decode data using {stringref} correctly.
It is recommended to explicitly enable the string_ref
option on decoding if you are sure of the use of {stringref}, so that readers of the code will know of it.
* This tag is not in the RFC but registered in the CBOR Tags registry.
Option:
'shared_ref'
:- Encode: default:
false
; values:bool
|'unsafe_ref'
- Decode: default:
false
; values:bool
|'shareable'
|'shareable_only'
|'unsafe_ref'
- Encode: default:
Constants:
Cbor\Tag::SHAREABLE
Cbor\Tag::SHARED_REF
The tag {sharedref} can refer the previously-defined data.
If the option is enabled, CBOR maps tagged as {shareable} once decoded into PHP object will share the instance among {sharedref} references. If other type of value including CBOR array or tag is tagged {shareable}, it triggers an error. See other option values for possible workarounds.
On encoding, potentially shared (i.e. there are variables holding the instance somewhere) PHP stdClass
objects are tagged {shareable}. When such object is reused, {sharedref} tag is emitted. A reference to variable is dereferenced.
If 'shareable'
is specified, values tagged as {shareable} which are decoded to non-object are wrapped into Cbor\Shareable
object on decoding, and the instances are reused on {sharedref} tag.
On encoding, an instance of Cbor\Shareable
is tagged {shareable} regardless of the option value, unless the flag CBOR_CDE
is specified. In which case, the encoder throws an exception.
'shareable_only'
works similar to 'shareable'
. But CBOR maps tagged {shareable} are also wrapped into Cbor\Shareable
instead of bare object
.
If 'unsafe_ref'
is specified, {shareable}-tagged data that decoded to non-object becomes PHP &
reference. On encoding a reference to variable is tagged {shareable} too.
At first glance it may seem natural to use PHP reference for shared scalars and arrays. But this may cause unwanted side effects when the decoded structure contains references that you don't expect. You replace a single scalar value, and values elsewhere are changed too!
Note that decoder's return value (decoding root value) cannot be a PHP reference. Moreover, a reference to a PHP object cannot be described even with this option.
The use of this tag makes CBOR contextual.