# Bytes & bits

Everything in the computer is translated into a long list of bits:
`010011011010001000101010100010101...`

Where a **bit** represents a logical state with two possible values. These values are generally represented as `0` and `1`.

The **byte** is a unit of digital information that consists of eight bits:

```
    bits:    01000101  00111000  00111111  10001000 ...
    bytes:   1st byte  2nd byte  3rd byte  ...
```

So, just to give an example, let's convert an integer to the raw bit sequence:

In [None]:
bin(int("0", base=10))

In [None]:
bin(int("1", base=10))

In [None]:
bin(int("2", base=10))

In [None]:
bin(int("3", base=10))

We can transform a sequence of bits back to an integer with:

In [None]:
int("00100001", base=2)

Since in a computer everything is a sequence of bits, you can read a sequence of bits as an integer as we have seen in the previous examples, or we can interpret this sequence as a character.

For example, we can convert an integer to a character:

In [None]:
chr(33)

In [None]:
chr(64)

In [None]:
bytes("!", encoding="UTF-8")

The first 128 numbers are reserved for the **American Standard Code for Information Interchange** (ASCII)

In [None]:
[(i, chr(i)) for i in range(128)]

Here a table with the first 31 values:


| Binary   |   Oct |   Dec | Hex   | Abbreviation   | Abbreviation   | Abbreviation   | [b]   | [c]   | [d]   | Name (1967)                   |
|:---------|------:|------:|:------|:---------------|:---------------|:---------------|:------|:------|:------|:------------------------------|
| 000 0000 |     0 |     0 | 00    | nan            | NUL            | NUL            | ␀     | ^@    | \0    | Null                          |
| 000 0001 |     1 |     1 | 01    | SOM            | SOH            | SOH            | ␁     | ^A    | nan   | Start of Heading              |
| 000 0010 |     2 |     2 | 02    | EOA            | STX            | STX            | ␂     | ^B    | nan   | Start of Text                 |
| 000 0011 |     3 |     3 | 03    | EOM            | ETX            | ETX            | ␃     | ^C    | nan   | End of Text                   |
| 000 0100 |     4 |     4 | 04    | EOT            | EOT            | EOT            | ␄     | ^D    | nan   | End of Transmission           |
| 000 0101 |     5 |     5 | 05    | WRU            | ENQ            | ENQ            | ␅     | ^E    | nan   | Enquiry                       |
| 000 0110 |     6 |     6 | 06    | RU             | ACK            | ACK            | ␆     | ^F    | nan   | Acknowledgement               |
| 000 0111 |     7 |     7 | 07    | BELL           | BEL            | BEL            | ␇     | ^G    | \a    | Bell                          |
| 000 1000 |    10 |     8 | 08    | FE0            | BS             | BS             | ␈     | ^H    | \b    | Backspace[e][f]               |
| 000 1001 |    11 |     9 | 09    | HT/SK          | HT             | HT             | ␉     | ^I    | \t    | Horizontal Tab[g]             |
| 000 1010 |    12 |    10 | 0A    | LF             | LF             | LF             | ␊     | ^J    | \n    | Line Feed                     |
| 000 1011 |    13 |    11 | 0B    | VTAB           | VT             | VT             | ␋     | ^K    | \v    | Vertical Tab                  |
| 000 1100 |    14 |    12 | 0C    | FF             | FF             | FF             | ␌     | ^L    | \f    | Form Feed                     |
| 000 1101 |    15 |    13 | 0D    | CR             | CR             | CR             | ␍     | ^M    | \r    | Carriage Return[h]            |
| 000 1110 |    16 |    14 | 0E    | SO             | SO             | SO             | ␎     | ^N    | nan   | Shift Out                     |
| 000 1111 |    17 |    15 | 0F    | SI             | SI             | SI             | ␏     | ^O    | nan   | Shift In                      |
| 001 0000 |    20 |    16 | 10    | DC0            | DLE            | DLE            | ␐     | ^P    | nan   | Data Link Escape              |
| 001 0001 |    21 |    17 | 11    | DC1            | DC1            | DC1            | ␑     | ^Q    | nan   | Device Control 1 (often XON)  |
| 001 0010 |    22 |    18 | 12    | DC2            | DC2            | DC2            | ␒     | ^R    | nan   | Device Control 2              |
| 001 0011 |    23 |    19 | 13    | DC3            | DC3            | DC3            | ␓     | ^S    | nan   | Device Control 3 (often XOFF) |
| 001 0100 |    24 |    20 | 14    | DC4            | DC4            | DC4            | ␔     | ^T    | nan   | Device Control 4              |
| 001 0101 |    25 |    21 | 15    | ERR            | NAK            | NAK            | ␕     | ^U    | nan   | Negative Acknowledgement      |
| 001 0110 |    26 |    22 | 16    | SYNC           | SYN            | SYN            | ␖     | ^V    | nan   | Synchronous Idle              |
| 001 0111 |    27 |    23 | 17    | LEM            | ETB            | ETB            | ␗     | ^W    | nan   | End of Transmission Block     |
| 001 1000 |    30 |    24 | 18    | S0             | CAN            | CAN            | ␘     | ^X    | nan   | Cancel                        |
| 001 1001 |    31 |    25 | 19    | S1             | EM             | EM             | ␙     | ^Y    | nan   | End of Medium                 |
| 001 1010 |    32 |    26 | 1A    | S2             | SS             | SUB            | ␚     | ^Z    | nan   | Substitute                    |
| 001 1011 |    33 |    27 | 1B    | S3             | ESC            | ESC            | ␛     | ^[    | \e[i] | Escape[j]                     |
| 001 1100 |    34 |    28 | 1C    | S4             | FS             | FS             | ␜     | ^\    | nan   | File Separator                |
| 001 1101 |    35 |    29 | 1D    | S5             | GS             | GS             | ␝     | ^]    | nan   | Group Separator               |
| 001 1110 |    36 |    30 | 1E    | S6             | RS             | RS             | ␞     | ^^[k] | nan   | Record Separator              |
| 001 1111 |    37 |    31 | 1F    | S7             | US             | US             | ␟     | ^_    | nan   | Unit Separator                |
| 111 1111 |   177 |   127 | 7F    | DEL            | DEL            | DEL            | ␡     | ^?    | nan   | Delete[l][f]                  |


And here the table with the remaining numbers:


| Binary   |   Oct |   Dec | Hex   | 1963   | 1965   | 1967   |
|:---------|------:|------:|:------|:-------|:-------|:-------|
| 010 0000 |    40 |    32 | 20    | space  | space  | space  |
| 010 0001 |    41 |    33 | 21    | !      | !      | !      |
| 010 0010 |    42 |    34 | 22    | "      | "      | "      |
| 010 0011 |    43 |    35 | 23    | #      | #      | #      |
| 010 0100 |    44 |    36 | 24    | $      | $      | $      |
| 010 0101 |    45 |    37 | 25    | %      | %      | %      |
| 010 0110 |    46 |    38 | 26    | &      | &      | &      |
| 010 0111 |    47 |    39 | 27    | '      | '      | '      |
| 010 1000 |    50 |    40 | 28    | (      | (      | (      |
| 010 1001 |    51 |    41 | 29    | )      | )      | )      |
| 010 1010 |    52 |    42 | 2A    | *      | *      | *      |
| 010 1011 |    53 |    43 | 2B    | +      | +      | +      |
| 010 1100 |    54 |    44 | 2C    | nan    | nan    | nan    |
| 010 1101 |    55 |    45 | 2D    | -      | -      | -      |
| 010 1110 |    56 |    46 | 2E    | .      | .      | .      |
| 010 1111 |    57 |    47 | 2F    | /      | /      | /      |
| 011 0000 |    60 |    48 | 30    | 0      | 0      | 0      |
| 011 0001 |    61 |    49 | 31    | 1      | 1      | 1      |
| 011 0010 |    62 |    50 | 32    | 2      | 2      | 2      |
| 011 0011 |    63 |    51 | 33    | 3      | 3      | 3      |
| 011 0100 |    64 |    52 | 34    | 4      | 4      | 4      |
| 011 0101 |    65 |    53 | 35    | 5      | 5      | 5      |
| 011 0110 |    66 |    54 | 36    | 6      | 6      | 6      |
| 011 0111 |    67 |    55 | 37    | 7      | 7      | 7      |
| 011 1000 |    70 |    56 | 38    | 8      | 8      | 8      |
| 011 1001 |    71 |    57 | 39    | 9      | 9      | 9      |
| 011 1010 |    72 |    58 | 3A    | :      | :      | :      |
| 011 1011 |    73 |    59 | 3B    | ;      | ;      | ;      |
| 011 1100 |    74 |    60 | 3C    | <      | <      | <      |
| 011 1101 |    75 |    61 | 3D    | =      | =      | =      |
| 011 1110 |    76 |    62 | 3E    | >      | >      | >      |
| 011 1111 |    77 |    63 | 3F    | ?      | ?      | ?      |
| 100 0000 |   100 |    64 | 40    | @      | `      | @      |
| 100 0001 |   101 |    65 | 41    | A      | A      | A      |
| 100 0010 |   102 |    66 | 42    | B      | B      | B      |
| 100 0011 |   103 |    67 | 43    | C      | C      | C      |
| 100 0100 |   104 |    68 | 44    | D      | D      | D      |
| 100 0101 |   105 |    69 | 45    | E      | E      | E      |
| 100 0110 |   106 |    70 | 46    | F      | F      | F      |
| 100 0111 |   107 |    71 | 47    | G      | G      | G      |
| 100 1000 |   110 |    72 | 48    | H      | H      | H      |
| 100 1001 |   111 |    73 | 49    | I      | I      | I      |
| 100 1010 |   112 |    74 | 4A    | J      | J      | J      |
| 100 1011 |   113 |    75 | 4B    | K      | K      | K      |
| 100 1100 |   114 |    76 | 4C    | L      | L      | L      |
| 100 1101 |   115 |    77 | 4D    | M      | M      | M      |
| 100 1110 |   116 |    78 | 4E    | N      | N      | N      |
| 100 1111 |   117 |    79 | 4F    | O      | O      | O      |
| 101 0000 |   120 |    80 | 50    | P      | P      | P      |
| 101 0001 |   121 |    81 | 51    | Q      | Q      | Q      |
| 101 0010 |   122 |    82 | 52    | R      | R      | R      |
| 101 0011 |   123 |    83 | 53    | S      | S      | S      |
| 101 0100 |   124 |    84 | 54    | T      | T      | T      |
| 101 0101 |   125 |    85 | 55    | U      | U      | U      |
| 101 0110 |   126 |    86 | 56    | V      | V      | V      |
| 101 0111 |   127 |    87 | 57    | W      | W      | W      |
| 101 1000 |   130 |    88 | 58    | X      | X      | X      |
| 101 1001 |   131 |    89 | 59    | Y      | Y      | Y      |
| 101 1010 |   132 |    90 | 5A    | Z      | Z      | Z      |
| 101 1011 |   133 |    91 | 5B    | [      | [      | [      |
| 101 1100 |   134 |    92 | 5C    | \      | ~      | \      |
| 101 1101 |   135 |    93 | 5D    | ]      | ]      | ]      |
| 101 1110 |   136 |    94 | 5E    | ↑      | ^      | ^      |
| 101 1111 |   137 |    95 | 5F    | ←      | _      | _      |
| 110 0000 |   140 |    96 | 60    | nan    | @      | `      |
| 110 0001 |   141 |    97 | 61    | nan    | a      | a      |
| 110 0010 |   142 |    98 | 62    | nan    | b      | b      |
| 110 0011 |   143 |    99 | 63    | nan    | c      | c      |
| 110 0100 |   144 |   100 | 64    | nan    | d      | d      |
| 110 0101 |   145 |   101 | 65    | nan    | e      | e      |
| 110 0110 |   146 |   102 | 66    | nan    | f      | f      |
| 110 0111 |   147 |   103 | 67    | nan    | g      | g      |
| 110 1000 |   150 |   104 | 68    | nan    | h      | h      |
| 110 1001 |   151 |   105 | 69    | nan    | i      | i      |
| 110 1010 |   152 |   106 | 6A    | nan    | j      | j      |
| 110 1011 |   153 |   107 | 6B    | nan    | k      | k      |
| 110 1100 |   154 |   108 | 6C    | nan    | l      | l      |
| 110 1101 |   155 |   109 | 6D    | nan    | m      | m      |
| 110 1110 |   156 |   110 | 6E    | nan    | n      | n      |
| 110 1111 |   157 |   111 | 6F    | nan    | o      | o      |
| 111 0000 |   160 |   112 | 70    | nan    | p      | p      |
| 111 0001 |   161 |   113 | 71    | nan    | q      | q      |
| 111 0010 |   162 |   114 | 72    | nan    | r      | r      |
| 111 0011 |   163 |   115 | 73    | nan    | s      | s      |
| 111 0100 |   164 |   116 | 74    | nan    | t      | t      |
| 111 0101 |   165 |   117 | 75    | nan    | u      | u      |
| 111 0110 |   166 |   118 | 76    | nan    | v      | v      |
| 111 0111 |   167 |   119 | 77    | nan    | w      | w      |
| 111 1000 |   170 |   120 | 78    | nan    | x      | x      |
| 111 1001 |   171 |   121 | 79    | nan    | y      | y      |
| 111 1010 |   172 |   122 | 7A    | nan    | z      | z      |
| 111 1011 |   173 |   123 | 7B    | nan    | {      | {      |
| 111 1100 |   174 |   124 | 7C    | ACK    | ¬      | |      |
| 111 1101 |   175 |   125 | 7D    | nan    | }      | }      |
| 111 1110 |   176 |   126 | 7E    | ESC    | |      | ~      |


What happens if we go beyond the 128-limit?

In [None]:
[(i, chr(i)) for i in range(24000)]

When we are going above 128, the `chr` function is using the encoding system used by your computer. In our case, the default encoding system is:

In [None]:
import locale

print( locale.getpreferredencoding())

In [None]:
chr(928)  # bit to string

In [None]:
chr(928).encode()  # bits to bytes

In [None]:
b'\xce\xa0'.decode() # bytes to string

In [None]:
bytes("Π", encoding="UTF-8") # string to bytes

In [None]:
bytes("🍉", encoding="UTF-8")

In [None]:
# string => bytes
"🥑".encode()

In [None]:
# string => bytes => hex
'🥑'.encode().hex()

In [None]:
# string => bytes => hex => int
int('🥑'.encode().hex(), 16)

In [None]:
# string => bytes => hex => int => bits
print(f"{int('🥑'.encode().hex(), 16):08b}")

In [None]:
bin(int.from_bytes("🥑".encode(), "big"))

In [None]:
int('0b11110000100111111010010110010001', base=2)

# Convert some symbols from unicode to bytes and from bytes to bits

😀 😁 😂 😃 😄 😅 😆 😇 😈 😉 😊 😋 😌 😍 😎 😏
😐 😑 😒 😓 😔 😕 😖 😗 😘 😙 😚 😛 😜 😝 😞 😟
😠 😡 😢 😣 😤 😥 😦 😧 😨 😩 😪 😫 😬 😭 😮 😯
😰 😱 😲 😳 😴 😵 😶 😷 😸 😹 😺 😻 😼 😽 😾 😿
🙀 🙁 🙂 🙃 🙄 🙅 🙆 🙇 🙈 🙉 🙊 🙋 🙌 🙍 🙎 🙏
☕⛾🍅🍊🍏🥑🍔🍕🍙🍞🍣🍨🍭🍲🍷🍼🍓
☐ ☑ ☒ ✓ ⍻ ✅ ✔ 🗸 🗹
˟ ⨯ ❌ ✝ ❎ 🕇 🞡 ✚ ✞ 🞤 ⛌ ✠ ♱✟ 🕈☨
❦ ♡ ❧ ☙ ❥ ღ ❤ 💚 💛 🧡 ❤️ 🤎 💞 💕
★ ☆ ✪ ✵ ✯ ٭ ✭ ✰ 🌟 ✡ ⚝ ⚹ ✹ ✷ ⍟ ❃ ✫ ✧ ✦
👍 👎 ☝ 🖒 🖓Δ 𝚫 𝝙 𝛥 ⍙ ⍍ ⍋
