Skip to content

Commit

Permalink
[DOC] Link to packed data doc (#6567)
Browse files Browse the repository at this point in the history
* Link to packed data doc

* Update pack.rb

Co-authored-by: Peter Zhu <peter@peterzhu.ca>

* Update pack.rb

Co-authored-by: Peter Zhu <peter@peterzhu.ca>

* Update pack.rb

Co-authored-by: Peter Zhu <peter@peterzhu.ca>

Co-authored-by: Peter Zhu <peter@peterzhu.ca>
  • Loading branch information
BurdetteLamar and peterzhu2118 committed Oct 18, 2022
1 parent 1b235dc commit d67b631
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 290 deletions.
4 changes: 2 additions & 2 deletions doc/packed_data.rdoc
@@ -1,4 +1,4 @@
== Packed Data
== Packed \Data

Certain Ruby core methods deal with packing and unpacking data:

Expand All @@ -10,7 +10,7 @@ Certain Ruby core methods deal with packing and unpacking data:
forming objects that become the elements of a new array;
returns that array.
- \Method String#unpack1:
Does the same, but returns only the first extracted object.
Does the same, but unpacks and returns only the first extracted object.

Each of these methods accepts a string +template+,
consisting of zero or more _directive_ characters,
Expand Down
298 changes: 10 additions & 288 deletions pack.rb
@@ -1,308 +1,30 @@
class Array
# call-seq:
# arr.pack( aTemplateString ) -> aBinaryString
# arr.pack( aTemplateString, buffer: aBufferString ) -> aBufferString
# pack(template, buffer: nil) -> string
#
# Packs the contents of <i>arr</i> into a binary sequence according to
# the directives in <i>aTemplateString</i> (see the table below)
# Directives ``A,'' ``a,'' and ``Z'' may be followed by a count,
# which gives the width of the resulting field. The remaining
# directives also may take a count, indicating the number of array
# elements to convert. If the count is an asterisk
# (``<code>*</code>''), all remaining array elements will be
# converted. Any of the directives ``<code>sSiIlL</code>'' may be
# followed by an underscore (``<code>_</code>'') or
# exclamation mark (``<code>!</code>'') to use the underlying
# platform's native size for the specified type; otherwise, they use a
# platform-independent size. Spaces are ignored in the template
# string. See also String#unpack.
#
# a = [ "a", "b", "c" ]
# n = [ 65, 66, 67 ]
# a.pack("A3A3A3") #=> "a b c "
# a.pack("a3a3a3") #=> "a\000\000b\000\000c\000\000"
# n.pack("ccc") #=> "ABC"
#
# If <i>aBufferString</i> is specified and its capacity is enough,
# +pack+ uses it as the buffer and returns it.
# When the offset is specified by the beginning of <i>aTemplateString</i>,
# the result is filled after the offset.
# If original contents of <i>aBufferString</i> exists and it's longer than
# the offset, the rest of <i>offsetOfBuffer</i> are overwritten by the result.
# If it's shorter, the gap is filled with ``<code>\0</code>''.
#
# # packed data is appended by default
# [255].pack("C", buffer:"foo".b) #=> "foo\xFF"
#
# # "@0" (offset 0) specifies that packed data is filled from beginning.
# # Also, original data after packed data is removed. ("oo" is removed.)
# [255].pack("@0C", buffer:"foo".b) #=> "\xFF"
#
# # If the offset is bigger than the original length, \x00 is filled.
# [255].pack("@5C", buffer:"foo".b) #=> "foo\x00\x00\xFF"
#
# Note that ``buffer:'' option does not guarantee not to allocate memory
# in +pack+. If the capacity of <i>aBufferString</i> is not enough,
# +pack+ allocates memory.
#
# Directives for +pack+.
#
# Integer | Array |
# Directive | Element | Meaning
# ----------------------------------------------------------------------------
# C | Integer | 8-bit unsigned (unsigned char)
# S | Integer | 16-bit unsigned, native endian (uint16_t)
# L | Integer | 32-bit unsigned, native endian (uint32_t)
# Q | Integer | 64-bit unsigned, native endian (uint64_t)
# J | Integer | pointer width unsigned, native endian (uintptr_t)
# | | (J is available since Ruby 2.3.)
# | |
# c | Integer | 8-bit signed (signed char)
# s | Integer | 16-bit signed, native endian (int16_t)
# l | Integer | 32-bit signed, native endian (int32_t)
# q | Integer | 64-bit signed, native endian (int64_t)
# j | Integer | pointer width signed, native endian (intptr_t)
# | | (j is available since Ruby 2.3.)
# | |
# S_ S! | Integer | unsigned short, native endian
# I I_ I! | Integer | unsigned int, native endian
# L_ L! | Integer | unsigned long, native endian
# Q_ Q! | Integer | unsigned long long, native endian (ArgumentError
# | | if the platform has no long long type.)
# | | (Q_ and Q! is available since Ruby 2.1.)
# J! | Integer | uintptr_t, native endian (same with J)
# | | (J! is available since Ruby 2.3.)
# | |
# s_ s! | Integer | signed short, native endian
# i i_ i! | Integer | signed int, native endian
# l_ l! | Integer | signed long, native endian
# q_ q! | Integer | signed long long, native endian (ArgumentError
# | | if the platform has no long long type.)
# | | (q_ and q! is available since Ruby 2.1.)
# j! | Integer | intptr_t, native endian (same with j)
# | | (j! is available since Ruby 2.3.)
# | |
# S> s> S!> s!> | Integer | same as the directives without ">" except
# L> l> L!> l!> | | big endian
# I!> i!> | | (available since Ruby 1.9.3)
# Q> q> Q!> q!> | | "S>" is the same as "n"
# J> j> J!> j!> | | "L>" is the same as "N"
# | |
# S< s< S!< s!< | Integer | same as the directives without "<" except
# L< l< L!< l!< | | little endian
# I!< i!< | | (available since Ruby 1.9.3)
# Q< q< Q!< q!< | | "S<" is the same as "v"
# J< j< J!< j!< | | "L<" is the same as "V"
# | |
# n | Integer | 16-bit unsigned, network (big-endian) byte order
# N | Integer | 32-bit unsigned, network (big-endian) byte order
# v | Integer | 16-bit unsigned, VAX (little-endian) byte order
# V | Integer | 32-bit unsigned, VAX (little-endian) byte order
# | |
# U | Integer | UTF-8 character
# w | Integer | BER-compressed integer
#
# Float | Array |
# Directive | Element | Meaning
# ---------------------------------------------------------------------------
# D d | Float | double-precision, native format
# F f | Float | single-precision, native format
# E | Float | double-precision, little-endian byte order
# e | Float | single-precision, little-endian byte order
# G | Float | double-precision, network (big-endian) byte order
# g | Float | single-precision, network (big-endian) byte order
#
# String | Array |
# Directive | Element | Meaning
# ---------------------------------------------------------------------------
# A | String | arbitrary binary string (space padded, count is width)
# a | String | arbitrary binary string (null padded, count is width)
# Z | String | same as ``a'', except that null is added with *
# B | String | bit string (MSB first)
# b | String | bit string (LSB first)
# H | String | hex string (high nibble first)
# h | String | hex string (low nibble first)
# u | String | UU-encoded string
# M | String | quoted printable, MIME encoding (see also RFC2045)
# | | (text mode but input must use LF and output LF)
# m | String | base64 encoded string (see RFC 2045)
# | | (if count is 0, no line feed are added, see RFC 4648)
# | | (count specifies input bytes between each LF,
# | | rounded down to nearest multiple of 3)
# P | String | pointer to a structure (fixed-length string)
# p | String | pointer to a null-terminated string
#
# Misc. | Array |
# Directive | Element | Meaning
# ---------------------------------------------------------------------------
# @ | --- | moves to absolute position
# X | --- | back up a byte
# x | --- | null byte
# Formats each element in +self+ into a binary string; returns that string.
# See {Packed Data}[rdoc-ref:packed_data.rdoc].
def pack(fmt, buffer: nil)
Primitive.pack_pack(fmt, buffer)
end
end

class String
# call-seq:
# str.unpack(format) -> anArray
# str.unpack(format, offset: anInteger) -> anArray
#
# Decodes <i>str</i> (which may contain binary data) according to the
# format string, returning an array of each value extracted.
# The format string consists of a sequence of single-character directives,
# summarized in the table at the end of this entry.
# Each directive may be followed
# by a number, indicating the number of times to repeat with this
# directive. An asterisk (``<code>*</code>'') will use up all
# remaining elements. The directives <code>sSiIlL</code> may each be
# followed by an underscore (``<code>_</code>'') or
# exclamation mark (``<code>!</code>'') to use the underlying
# platform's native size for the specified type; otherwise, it uses a
# platform-independent consistent size. Spaces are ignored in the
# format string.
#
# See also String#unpack1, Array#pack.
#
# "abc \0\0abc \0\0".unpack('A6Z6') #=> ["abc", "abc "]
# "abc \0\0".unpack('a3a3') #=> ["abc", " \000\000"]
# "abc \0abc \0".unpack('Z*Z*') #=> ["abc ", "abc "]
# "aa".unpack('b8B8') #=> ["10000110", "01100001"]
# "aaa".unpack('h2H2c') #=> ["16", "61", 97]
# "\xfe\xff\xfe\xff".unpack('sS') #=> [-2, 65534]
# "now=20is".unpack('M*') #=> ["now is"]
# "whole".unpack('xax2aX2aX1aX2a') #=> ["h", "e", "l", "l", "o"]
#
# This table summarizes the various formats and the Ruby classes
# returned by each.
# unpack(template, offset: 0) -> array
#
# Integer | |
# Directive | Returns | Meaning
# ------------------------------------------------------------------
# C | Integer | 8-bit unsigned (unsigned char)
# S | Integer | 16-bit unsigned, native endian (uint16_t)
# L | Integer | 32-bit unsigned, native endian (uint32_t)
# Q | Integer | 64-bit unsigned, native endian (uint64_t)
# J | Integer | pointer width unsigned, native endian (uintptr_t)
# | |
# c | Integer | 8-bit signed (signed char)
# s | Integer | 16-bit signed, native endian (int16_t)
# l | Integer | 32-bit signed, native endian (int32_t)
# q | Integer | 64-bit signed, native endian (int64_t)
# j | Integer | pointer width signed, native endian (intptr_t)
# | |
# S_ S! | Integer | unsigned short, native endian
# I I_ I! | Integer | unsigned int, native endian
# L_ L! | Integer | unsigned long, native endian
# Q_ Q! | Integer | unsigned long long, native endian (ArgumentError
# | | if the platform has no long long type.)
# J! | Integer | uintptr_t, native endian (same with J)
# | |
# s_ s! | Integer | signed short, native endian
# i i_ i! | Integer | signed int, native endian
# l_ l! | Integer | signed long, native endian
# q_ q! | Integer | signed long long, native endian (ArgumentError
# | | if the platform has no long long type.)
# j! | Integer | intptr_t, native endian (same with j)
# | |
# S> s> S!> s!> | Integer | same as the directives without ">" except
# L> l> L!> l!> | | big endian
# I!> i!> | |
# Q> q> Q!> q!> | | "S>" is the same as "n"
# J> j> J!> j!> | | "L>" is the same as "N"
# | |
# S< s< S!< s!< | Integer | same as the directives without "<" except
# L< l< L!< l!< | | little endian
# I!< i!< | |
# Q< q< Q!< q!< | | "S<" is the same as "v"
# J< j< J!< j!< | | "L<" is the same as "V"
# | |
# n | Integer | 16-bit unsigned, network (big-endian) byte order
# N | Integer | 32-bit unsigned, network (big-endian) byte order
# v | Integer | 16-bit unsigned, VAX (little-endian) byte order
# V | Integer | 32-bit unsigned, VAX (little-endian) byte order
# | |
# U | Integer | UTF-8 character
# w | Integer | BER-compressed integer (see Array#pack)
#
# Float | |
# Directive | Returns | Meaning
# -----------------------------------------------------------------
# D d | Float | double-precision, native format
# F f | Float | single-precision, native format
# E | Float | double-precision, little-endian byte order
# e | Float | single-precision, little-endian byte order
# G | Float | double-precision, network (big-endian) byte order
# g | Float | single-precision, network (big-endian) byte order
#
# String | |
# Directive | Returns | Meaning
# -----------------------------------------------------------------
# A | String | arbitrary binary string (remove trailing nulls and ASCII spaces)
# a | String | arbitrary binary string
# Z | String | null-terminated string
# B | String | bit string (MSB first)
# b | String | bit string (LSB first)
# H | String | hex string (high nibble first)
# h | String | hex string (low nibble first)
# u | String | UU-encoded string
# M | String | quoted-printable, MIME encoding (see RFC2045)
# m | String | base64 encoded string (RFC 2045) (default)
# | | base64 encoded string (RFC 4648) if followed by 0
# P | String | pointer to a structure (fixed-length string)
# p | String | pointer to a null-terminated string
#
# Misc. | |
# Directive | Returns | Meaning
# -----------------------------------------------------------------
# @ | --- | skip to the offset given by the length argument
# X | --- | skip backward one byte
# x | --- | skip forward one byte
#
# The keyword <i>offset</i> can be given to start the decoding after skipping
# the specified amount of bytes:
# "abc".unpack("C*") # => [97, 98, 99]
# "abc".unpack("C*", offset: 2) # => [99]
# "abc".unpack("C*", offset: 4) # => offset outside of string (ArgumentError)
#
# HISTORY
#
# * J, J! j, and j! are available since Ruby 2.3.
# * Q_, Q!, q_, and q! are available since Ruby 2.1.
# * I!<, i!<, I!>, and i!> are available since Ruby 1.9.3.
# Extracts data from +self+, forming objects that become the elements of a new array;
# returns that array.
# See {Packed Data}[rdoc-ref:packed_data.rdoc].
def unpack(fmt, offset: 0)
Primitive.pack_unpack(fmt, offset)
end

# call-seq:
# str.unpack1(format) -> obj
# str.unpack1(format, offset: anInteger) -> obj
#
# Decodes <i>str</i> (which may contain binary data) according to the
# format string, returning the first value extracted.
#
# See also String#unpack, Array#pack.
#
# Contrast with String#unpack:
#
# "abc \0\0abc \0\0".unpack('A6Z6') #=> ["abc", "abc "]
# "abc \0\0abc \0\0".unpack1('A6Z6') #=> "abc"
#
# In that case data would be lost but often it's the case that the array
# only holds one value, especially when unpacking binary data. For instance:
#
# "\xff\x00\x00\x00".unpack("l") #=> [255]
# "\xff\x00\x00\x00".unpack1("l") #=> 255
#
# Thus unpack1 is convenient, makes clear the intention and signals
# the expected return value to those reading the code.
#
# The keyword <i>offset</i> can be given to start the decoding after skipping
# the specified amount of bytes:
# "abc".unpack1("C*") # => 97
# "abc".unpack1("C*", offset: 2) # => 99
# "abc".unpack1("C*", offset: 4) # => offset outside of string (ArgumentError)
# unpack1(template, offset: 0) -> object
#
# Like String#unpack, but unpacks and returns only the first extracted object.
# See {Packed Data}[rdoc-ref:packed_data.rdoc].
def unpack1(fmt, offset: 0)
Primitive.pack_unpack1(fmt, offset)
end
Expand Down

0 comments on commit d67b631

Please sign in to comment.