# Working with Bytes
This notebook contains a few short examples of how to work with binary data in python.  It's intended purely as background / review. 

### bytes and hex format

When working with bytes it's good to get at least a little bit used to hexidecimal format.  In modern computing, a byte is always 8 bits which means it can encode unsigned values from 0 to 255.

In [1]:
# expressed in hexidecimal, 255 is ff
# the 'hex' built in function returns the hexidecimal string prepended with '0x'
hex(255)

'0xff'

In [2]:
# you can also use string.format with the 'x' format specifier like so:
'{:x}'.format(255)

'ff'

In [3]:
# note that when expressed in hex, a single byte might need two characters to represent it.
# if an unsigned byte is less than 15, then we usually pad to the left with a zero.
# we can use string.format to do this:
'{:02x}'.format(15)

'0f'

### bytes and bytearray ( native Python types )

bytes is a native python type which holds an immutable sequence of bytes.  There are several ways to create them and we go over a few here.

bytearray is pretty much the same as bytes except it is not immutable.  For simplicity, we'll just do examples with bytes here

In [4]:
# initialize from a string-literal.  Just prefix a string with the letter b.
# For characters with ASCII codes < 127, you can just use an ascii character.   
# For any other byte value, you need to express the byte in hex preceeded by '\x'
our_bytes = b'some text\x01\xff'

# once you have a bytes object you can take slices just as with strings
# byte values < 127 which represent printable characters will get represented as ASCII text
print(our_bytes[0:4])
print(our_bytes[10])

b'some'
255


In [5]:
# you can also create bytes by using struct.pack
import struct
our_bytes = struct.pack('<i', 255)

print(our_bytes)
print(our_bytes.hex())   # sometimes it's easier to read bytes when they're expressed in hex

b'\xff\x00\x00\x00'
ff000000


In [6]:
# In the above cell, note that 255 is 'ff' in hex, but that it's followed by three 'zero' bytes 
# which are printed as \x00.  This is because the '<i' format in struct.pack represents a 4 byte integer
# stored in 'little endian' byte order.  Little endian byte order means that the least significant bytes
# come in the first field rather than in the last field the way humans write numbers.  You can also
# pack bytes in 'big endian' format by using the '>i' format specifier
our_bytes = struct.pack('>i', 255)
print(our_bytes)

b'\x00\x00\x00\xff'


In [7]:
# the struct.pack command can also pack multiple fields into a byte array.  Just use a different 
# format specifier.  Here we'll pack two 4 byte ints followed by a 5 byte field of raw bytes
fmt = '<ii5s'
our_bytes = struct.pack(fmt, 1, 2, b'hiya')
print(our_bytes)  

# we only gave struct.pack 4 bytes from b'hiya' so it just zero padded the string to make a 5 byte field

b'\x01\x00\x00\x00\x02\x00\x00\x00hiya\x00'


### extracting ints and strings from bytes

Mostly, when working with binary data we'll already have the byte data and instead, we'll need to extract  native python types from them.  Here we go over a few ways to do that.

In [8]:
# first let's create a simple byte array with two ints and a string
fmt = '<ii5s'
our_bytes = struct.pack(fmt, 128, 129, b'heya')
print(our_bytes)

b'\x80\x00\x00\x00\x81\x00\x00\x00heya\x00'


In [9]:
# to extract the int that we've encoded in the first four bytes we can use the function int.from_bytes
int.from_bytes(our_bytes[0:4], 'little', signed=True)

128

In [10]:
# we can access the second four bytes like so:
int.from_bytes(our_bytes[4:8], 'little', signed=True)

129

In [11]:
# we can also use struct.unpack which can unpack multiple values and will return a tuple
struct.unpack('<ii', our_bytes[0:8])

(128, 129)

In [12]:
# to create python strings we can use the decode method of bytes
# but note that when applied here, this will also literally convert the trailing '\x00' byte.
our_bytes[8:].decode('utf-8')

'heya\x00'

In [13]:
# to convert bytes to a str and remove any trailing 'zeros' just use rstrip
our_bytes[8:].decode('utf-8').rstrip('\x00')

'heya'

### saving and loading bytes from files

It's easy to load bytes from files or save them.  Just open the file in binary format and use read/write

In [14]:
# for an example, we'll create some binary records with the following format
fmt = '<HHi5s'  # H represents a two byte unsigned integer.  
# See documentation for struct.pack
r1 = struct.pack(fmt, 9, 1, 1, b'one')
r2 = struct.pack(fmt, 9, 1, 2, b'two')
r3 = struct.pack(fmt, 9, 1, 3, b'three')

# we can concatenate them all together
b = r1 + r2 + r3

# and just write to a file ( make sure to open it in binary format!!! )
with open('simple_binary.bin', 'wb') as f:
    f.write(b)

In [15]:
# for an example, we'll create some binary records with the following format
fmtA = '<HHi5s'  # H represents a two byte unsigned integer.  
fmtB = '<HHiiii'
# See documentation for struct.pack
r1a = struct.pack(fmtA, 9, 1, 1, b'one')
r2a = struct.pack(fmtA, 9, 1, 2, b'two')
r3a = struct.pack(fmtA, 9, 1, 3, b'three')
r1b = struct.pack(fmtB, 16, 2, 1, 2, 3, 4)
r2b = struct.pack(fmtB, 16, 2, 2, 4, 6, 8)
r3b = struct.pack(fmtB, 16, 2, 3, 6, 9, 12)

# we can concatenate them all together
b = r1a + r1b + r2a + r2b + r3a + r3b

# and just write to a file ( make sure to open it in binary format!!! )
with open('simple_binary_mixed.bin', 'wb') as f:
    f.write(b)

In [16]:
# we can read our bytes back easily by opening the file and doing a file.read
with open('simple_binary_mixed.bin', 'rb') as f:
    b2 = f.read()
assert(b == b2)

if your binary file is really big and you want to only read some of it, you can pass an optional 'number of bytes to read' to file.read.  Also see the documentation for file.seek which lets you move the file pointer through the file without reading anything.  And also file.tell which tells you where you are in the file

In [17]:
# here we'll just read the first two bytes of the file
with open('simple_binary.bin', 'rb') as f:
    b3 = f.read(2)
print(b3.hex())

0900
