Document the new extensions.

python · Dec 31, 1996 · 1254346 · 1254346
1 parent 3aa27fd
commit 1254346
Show file tree

Hide file tree

Showing 2 changed files with 128 additions and 18 deletions.
diff --git a/Doc/lib/libstruct.tex b/Doc/lib/libstruct.tex
@@ -45,23 +45,81 @@ \section{Built-in Module \sectcode{struct}}
   \lineiii{x}{pad byte}{no value}
   \lineiii{c}{char}{string of length 1}
   \lineiii{b}{signed char}{integer}
+  \lineiii{B}{unsigned char}{integer}
   \lineiii{h}{short}{integer}
+  \lineiii{H}{unsigned short}{integer}
   \lineiii{i}{int}{integer}
+  \lineiii{I}{unsigned int}{integer}
   \lineiii{l}{long}{integer}
+  \lineiii{L}{unsigned long}{integer}
   \lineiii{f}{float}{float}
   \lineiii{d}{double}{float}
+  \lineiii{s}{char[]}{string}
 \end{tableiii}
 
 A format character may be preceded by an integral repeat count; e.g.\
 the format string \code{'4h'} means exactly the same as \code{'hhhh'}.
 
-C numbers are represented in the machine's native format and byte
-order, and properly aligned by skipping pad bytes if necessary
-(according to the rules used by the C compiler).
+For the \code{'s'} format character, the count is interpreted as the
+size of the string, not a repeat count like for the other format
+characters; e.g. \code{'10s'} means a single 10-byte string, while
+\code{'10c'} means 10 characters.  For packing, the string is
+truncated or padded with null bytes as appropriate to make it fit.
+For unpacking, the resulting string always has exactly the specified
+number of bytes.  As a special case, \code{'0s'} means a single, empty
+string (while \code{'0c'} means 0 characters).
 
-Examples (all on a big-endian machine):
+For the \code{'I'} and \code{'L'} format characters, the return
+value is a Python long integer if a Python plain integer can't
+represent the required range (note: this is dependent on the size of
+the relevant C types only, not of the sign of the actual value).
+
+By default, C numbers are represented in the machine's native format
+and byte order, and properly aligned by skipping pad bytes if
+necessary (according to the rules used by the C compiler).
+
+Alternatively, the first character of the format string can be used to
+indicate the byte order, size and alignment of the packed data,
+according to the following table:
+
+\begin{tableiii}{|c|l|l|}{samp}{Character}{Byte order}{Size and alignment}
+  \lineiii{@}{native}{native}
+  \lineiii{=}{native}{standard}
+  \lineiii{<}{little-endian}{standard}
+  \lineiii{>}{big-endian}{standard}
+  \lineiii{!}{network (= big-endian)}{standard}
+\end{tableiii}
+
+If the first character is not one of these, \code{'@'} is assumed.
+
+Native byte order is big-endian or little-endian, depending on the
+host system (e.g. Motorola and Sun are big-endian; Intel and DEC are
+little-endian).
+
+Native size and alignment are determined using the C compiler's sizeof
+expression.  This is always combined with native byte order.
+
+Standard size and alignment are as follows: no alignment is required
+for any type (so you have to use pad bytes); short is 2 bytes; int and
+long are 4 bytes.  In this mode, there is no support for float and
+double (\code{'f'} and \code{'d'}).
+
+Note the difference between \code{'@'} and \code{'='}: both use native
+byte order, but the size and alignment of the latter is standardized.
+
+The form \code{'!'} is available for those poor souls who claim they
+can't remember whether network byte order is big-endian or
+little-endian.
+
+There is no way to indicate non-native byte order (i.e. force
+byte-swapping); use the appropriate choice of \code{'<'} or
+\code{'>'}.
+
+Examples (all using native byte order, size and alignment, on a
+big-endian machine):
 
 \bcode\begin{verbatim}
+from struct import *
 pack('hhl', 1, 2, 3) == '\000\001\000\002\000\000\000\003'
 unpack('hhl', '\000\001\000\002\000\000\000\003') == (1, 2, 3)
 calcsize('hhl') == 8
@@ -71,8 +129,5 @@ \section{Built-in Module \sectcode{struct}}
 a particular type, end the format with the code for that type with a
 repeat count of zero, e.g.\ the format \code{'llh0l'} specifies two
 pad bytes at the end, assuming longs are aligned on 4-byte boundaries.
-
-(More format characters are planned, e.g.\ \code{'s'} for character
-arrays, upper case for unsigned variants, and a way to specify the
-byte order, which is useful for [de]constructing network packets and
-reading/writing portable binary file formats like TIFF and AIFF.)
+(This only works when native size and alignment are in effect;
+standard size and alignment does not enforce any alignment.)
diff --git a/Doc/libstruct.tex b/Doc/libstruct.tex
@@ -45,23 +45,81 @@ \section{Built-in Module \sectcode{struct}}
   \lineiii{x}{pad byte}{no value}
   \lineiii{c}{char}{string of length 1}
   \lineiii{b}{signed char}{integer}
+  \lineiii{B}{unsigned char}{integer}
   \lineiii{h}{short}{integer}
+  \lineiii{H}{unsigned short}{integer}
   \lineiii{i}{int}{integer}
+  \lineiii{I}{unsigned int}{integer}
   \lineiii{l}{long}{integer}
+  \lineiii{L}{unsigned long}{integer}
   \lineiii{f}{float}{float}
   \lineiii{d}{double}{float}
+  \lineiii{s}{char[]}{string}
 \end{tableiii}
 
 A format character may be preceded by an integral repeat count; e.g.\
 the format string \code{'4h'} means exactly the same as \code{'hhhh'}.
 
-C numbers are represented in the machine's native format and byte
-order, and properly aligned by skipping pad bytes if necessary
-(according to the rules used by the C compiler).
+For the \code{'s'} format character, the count is interpreted as the
+size of the string, not a repeat count like for the other format
+characters; e.g. \code{'10s'} means a single 10-byte string, while
+\code{'10c'} means 10 characters.  For packing, the string is
+truncated or padded with null bytes as appropriate to make it fit.
+For unpacking, the resulting string always has exactly the specified
+number of bytes.  As a special case, \code{'0s'} means a single, empty
+string (while \code{'0c'} means 0 characters).
 
-Examples (all on a big-endian machine):
+For the \code{'I'} and \code{'L'} format characters, the return
+value is a Python long integer if a Python plain integer can't
+represent the required range (note: this is dependent on the size of
+the relevant C types only, not of the sign of the actual value).
+
+By default, C numbers are represented in the machine's native format
+and byte order, and properly aligned by skipping pad bytes if
+necessary (according to the rules used by the C compiler).
+
+Alternatively, the first character of the format string can be used to
+indicate the byte order, size and alignment of the packed data,
+according to the following table:
+
+\begin{tableiii}{|c|l|l|}{samp}{Character}{Byte order}{Size and alignment}
+  \lineiii{@}{native}{native}
+  \lineiii{=}{native}{standard}
+  \lineiii{<}{little-endian}{standard}
+  \lineiii{>}{big-endian}{standard}
+  \lineiii{!}{network (= big-endian)}{standard}
+\end{tableiii}
+
+If the first character is not one of these, \code{'@'} is assumed.
+
+Native byte order is big-endian or little-endian, depending on the
+host system (e.g. Motorola and Sun are big-endian; Intel and DEC are
+little-endian).
+
+Native size and alignment are determined using the C compiler's sizeof
+expression.  This is always combined with native byte order.
+
+Standard size and alignment are as follows: no alignment is required
+for any type (so you have to use pad bytes); short is 2 bytes; int and
+long are 4 bytes.  In this mode, there is no support for float and
+double (\code{'f'} and \code{'d'}).
+
+Note the difference between \code{'@'} and \code{'='}: both use native
+byte order, but the size and alignment of the latter is standardized.
+
+The form \code{'!'} is available for those poor souls who claim they
+can't remember whether network byte order is big-endian or
+little-endian.
+
+There is no way to indicate non-native byte order (i.e. force
+byte-swapping); use the appropriate choice of \code{'<'} or
+\code{'>'}.
+
+Examples (all using native byte order, size and alignment, on a
+big-endian machine):
 
 \bcode\begin{verbatim}
+from struct import *
 pack('hhl', 1, 2, 3) == '\000\001\000\002\000\000\000\003'
 unpack('hhl', '\000\001\000\002\000\000\000\003') == (1, 2, 3)
 calcsize('hhl') == 8
@@ -71,8 +129,5 @@ \section{Built-in Module \sectcode{struct}}
 a particular type, end the format with the code for that type with a
 repeat count of zero, e.g.\ the format \code{'llh0l'} specifies two
 pad bytes at the end, assuming longs are aligned on 4-byte boundaries.
-
-(More format characters are planned, e.g.\ \code{'s'} for character
-arrays, upper case for unsigned variants, and a way to specify the
-byte order, which is useful for [de]constructing network packets and
-reading/writing portable binary file formats like TIFF and AIFF.)
+(This only works when native size and alignment are in effect;
+standard size and alignment does not enforce any alignment.)