# Bytes with python

Bytes are writen as strings with a "b" in front of them.

They can be either be written as hex literals or the associated ASCII values

See here: https://www.rapidtables.com/convert/number/hex-to-ascii.html

In [13]:
hex_literal = b"\x31\x30"
other_byte_literal = b"10"

print(hex_literal == other_byte_literal)
print(hex_literal)
print(other_byte_literal)

True
b'10'
b'10'


## Numbers to and from bytes

We can use `int.to_bytes()` and `int.from_bytes()` to convert bytes to integers.

Numbers aren't interpreted as their ASCII values, and are instead interpreted as
their actual decimal values when using this interface.

You need specify the endienness of the bytes. This says whether we read the bytes/from left to right or right to left. Big endian is largest byte on the left, little endien is largest byte on the right.

Little endian can be weird, so after you think you get it, give it another look and try a few examples.

[see here](https://www.digital-detective.net/understanding-big-and-little-endian-byte-order/)


In [52]:
small_byte_number = b"\x31"

# 0x31 == (3*16^1) + (1*16^0) == 49
print(int.from_bytes(small_byte_number, "big"))

bigger_byte_number = b"\x01\x23"

# 0x0123 = (0*16^3)+(1*16^2)+(2*16^1)+(3*16^0) in big endian
print(int.from_bytes(bigger_byte_number, "big"))

# 0x0123 = (0*16^1)+(1*16^0)+(2*16^3)+(3*16^2) in big endian
print(int.from_bytes(bigger_byte_number, "little"))


49
291
8961


`to_bytes` works in a similar manner to `from_bytes`.

You specify the int that you want to convert, the number of bytes you want to represent it, and its endianness.

In [49]:
# Representing 1024 using 6 bytes
val = int.to_bytes(1024, 6, "big")
print(val)


b'\x00\x00\x00\x00\x04\x00'


## Quick note - data representation.

We covered above that certain values translate to ACII characters. The hex value `0x32` which is equal to `50` in decimal (`3*16^1 + 2*16^0`) translates to the ASCII character `2`. 

Below you'll see the int `50` converted to `b"\x002"`, which is strange, since the hex value that is equivalent to the decimal value `50` is `0x32`. (Related - `0xNN` and `\xNN` are used interchangeably here for bytes in hex)

This is just shorthand. If you see two numbers preceded by `\x`, then it is a hex literal. If you see a number or letter that doesn't belong to a hex pair `\xNN`, then it is the ASCII representation of the byte. Don't let this confuse you. We can see below that they're the same thing.

In [48]:
v = int.to_bytes(50, 2, "big")
print(v)

z = int.from_bytes(b"\x00\x32", "big")
print(z)

print(b"\x002" == b"\x00\x32")

b'\x002'
50
True


## Using struct.unpack()

The function `struct.unpack()` is also a great way to work with bytes. It allows you to specify the type of data that you want to create from the bytes. For this to make sense, you need to know what the letters are shorthand for.

`H` is an unsigned short (an unsigned integer that is 2-bytes long)

`h` is a signed short (a signed integer that is 2-bytes long)

`c` is a char (a 1-byte ASCII character)

etc.

See [this table](https://docs.python.org/3/library/struct.html#format-characters) in the documentation for all of the different types:

You also need to pick a symbol to represent endianness. The greater `>` symbol for big endian and the `<` for little endian.

The format string, or first argument you pass, will first include the endianness symbol, followed by the number of each specific type of thing that you want. See the examples below.

In [53]:
import struct

testBytes = b'\x00\x01\x00\x02'
# Unpack the bytes into two unsigned shorts, interpreting the bytes in 
    # big-endian order
two_shorts = struct.unpack('>HH', testBytes)
print(two_shorts)

(1, 2)


In [59]:
testBytes = b'\x44\x00\x32'
short_and_char = struct.unpack('>cH', testBytes)
print(short_and_char)

(b'D', 50)


## Fun with dense representations

Imagine we have a situation where space is at a premium and we have information that we want to store. In this example we'll be storing people's 3-letter initials, an age, and a date of birth.

Normally we would require at least 
* 3-bytes for the initial (one for each character)
* 8 bytes for the int age
* 16 bytes for the date of birth (or 4 bytes for a string MMDD)

This doesn't actually play out in practice in Python (I'm pretty sure because objects have additional overhead), but it's directionally correct.

Using what we know about bytes, we can store it much more densely.

We still need 3-bytes for the initials, but a single unsigned byte can go up to 255, so we only need one for age, one for day, and one for month

In [107]:
class User:
    def __init__(self, initials, age, birthday):
        self.initials = initials
        self.age = age
        self.birthday = birthday
    
    def __repr__(self):
        return f"initials: {self.initials}, age: {self.age}, birthday:{self.birthday[1]}/{self.birthday[0]}"


def create_dense_representation(initials:str, age:int, DOB:str) -> bytes:
    initials_bytes = bytes(initials.encode("ascii"))
    age_byte = age.to_bytes(1, "big")
    day_byte = int(DOB[:2]).to_bytes(1, "big")
    month_byte = int(DOB[2:4]).to_bytes(1, "big")

    return initials_bytes + age_byte + day_byte + month_byte 


def unpack_dense_representation(dense_rep:bytes):
    r = struct.unpack('>cccBBB', dense_rep)
    user = User(
        initials=(r[0] + r[1] + r[2]).decode("ascii"),
        age=r[3],
        birthday=(r[4], r[5])
    )
    
    return user

In [110]:
from sys import getsizeof
d = create_dense_representation("PTR", 30, "0602")
print(f"dense representation: {getsizeof(d)}")
print(f'initials: {getsizeof("PTR")}')
print(f'age: {getsizeof(30)}')
print(f'age: {getsizeof("0602")}')

dense representation: 39
initials: 52
age: 28
age: 53
