# My TCP/IP

In [2]:
import socket

## Packet socket

In order to read/write data from/to the wire, we need to open a packet socket. A packet socket is a feature of Linux which allows us to "tap into" incoming packets on an interface before they reach the OS IP stack. Incoming packets are cloned from the interface and put on to our socket.

Opening a packet socket is usually restricted by the operating system for security reasons. If a user can open a packet socket then he can snoop on every packet received on that network interface, including packets destined for other users! So packet sockets are only available to root.

On Linux, permission can be given to individual programs by setting *capabilities*. Since we are using Python, that means giving this Python interpreter the capabilities. I created this virtualenv using the `--copies` option, which makes a copy of the interpreter instead of a symlink. Then:

```sh
sudo setcap cap_net_admin,cap_net_raw=eip /path/to/venv/python
```

Now we can write a function to send arbitrary data using a packet socket. The `socket` function takes three arguments:

- socket family: we use `AF_PACKET` for a packet socket,
- socket type: we use `SOCK_RAW` which means we control the ethernet headers,
- protocol: this will filter incoming packets by protocol. We want full control so use `ETH_P_ALL` (this constant is defined in header `net/ethernet.h` but isn't exposed in Python, we define it ourselves.

In [3]:
def packet_socket(interface_name):
    ETH_P_ALL = 3
    s = socket.socket(socket.AF_PACKET, socket.SOCK_RAW, socket.htons(ETH_P_ALL))
    s.bind((interface_name, 0))
    return s

But we can't just send arbitrary data, it must be a valid Ethernet frame:

In [4]:
s = packet_socket("lo")
try:
    s.send(b"HELLO")
except OSError:
    print("Invalid Ethernet frame!")
finally:
    s.close()

Invalid Ethernet frame!


We can make a simple context manager to handle closing of the socket:

In [5]:
from contextlib import contextmanager

@contextmanager
def open_packet_socket(interface_name):
    print("Opening socket.")
    sock = packet_socket(interface_name)
    try:
        yield sock
    finally:
        print("Closing socket.")
        sock.close()

In [6]:
with open_packet_socket("lo") as s:
    try:
        s.send(b"HELLO")
    except OSError:
        print("Invalid Ethernet frame!")

Opening socket.
Invalid Ethernet frame!
Closing socket.


## OSI model layers

1. Physical (Copper, Fibre, Radio etc.)
2. Link (Ethernet, PPP etc.)
3. Network (IPv4, IPv6)
4. Transport (TCP, UDP)
5. Session
6. Presentation
7. Application (HTTP etc.)

## Ethernet (Layer 2)

Ethernet supports sending frames. It can be assumed that all frames sent will be received by all other interfaces on the network. All interfaces are uniquely identified by a MAC address and each frame will contain a source and destination MAC address. This way, interfaces can decide which frames are intended for them and which are not.

### MAC addresses

MAC addresses are 48 bits (6 bytes) long. They are usually formatted in hexadecimal with bytes separated by colons.

In [7]:
def encode_mac_addr(mac_address):
    return bytes.fromhex(mac_address.replace(":", ""))

In [8]:
encode_mac_addr("01:02:03:04:05:06")

b'\x01\x02\x03\x04\x05\x06'

In [9]:
def format_mac_addr(mac_address):
    return ":".join(f"{byte:02x}" for byte in mac_address)

In [10]:
format_mac_addr(b"\x01\x02\x03\x04\x05\x06")

'01:02:03:04:05:06'

### Ethernet II frame

In addition to the MAC addresses, an Ethernet II frame only contains a 2-byte *ethertype*, the payload, and a checksum. The checksum will be automatically generated by the operating system, so we only need to specify the others:

In [11]:
def ethernet_frame(dst_addr, src_addr, ethertype, payload):
    return dst_addr + src_addr + ethertype + payload

Some common ethertypes are:

In [12]:
ARP  = bytes.fromhex("08 06")
IPV4 = bytes.fromhex("08 00")
IPV6 = bytes.fromhex("86 DD")
CUST = bytes.fromhex("08 01")  # custom, not a real protocol

So we can now send a frame to the loopback interface:

In [13]:
frame = ethernet_frame(
    src_addr=encode_mac_addr("00:00:00:00:00:00"),
    dst_addr=encode_mac_addr("00:00:00:00:00:00"),
    ethertype=CUST,
    payload="Hello, world!".encode(),
)

with open_packet_socket("lo") as s:
    s.send(frame)

Opening socket.
Closing socket.


## Internet Protocol version 4

Ethernet is suitable for small to medium sized networks but it could never work on the scale of the Internet. The Internet Protocol version 4 (IPv4) is used to route packets across many different layer 1/2 networks such that any two devices in the world can communicate.

To achieve this, each device gets an IP address in addition to its MAC address. Unlike MAC addresses, which are hardwired into network interfaces, IP addresses are dynamic and only make sense in layer 3.

In addition to an IP address, each device is also configured with a subnet mask and a gateway. The subnet mask tells the host which other IP addresses are on its local network. For those on its local network, communication is performed directly over Ethernet. For hosts outside of its local network, packets are instead sent to the router which will forward it to another network which contains that host.

For example, a host can be configured like this:

```
Address:     10.0.0.1 (or 10.0.0.1/24)
Subnet Mask: 255.255.255.0
Gateway:     10.0.0.254
```

The host `10.0.0.2` is on the same network and should be reachable on the layer 1/2 network, but the host `10.0.1.1` is on another network and will only be reachable via the gateway at `10.0.0.254`.

In [14]:
def encode_ip4_addr(ip4_addr):
    return bytes(int(s) for s in ip4_addr.split("."))

In [137]:
encode_ip4_addr("10.0.0.254")

b'\n\x00\x00\xfe'

In [138]:
encode_ip4_addr("8.8.8.8")

b'\x08\x08\x08\x08'

In [16]:
def decode_ip4_addr(ip4_addr):
    return ".".join(f"{byte:d}" for byte in ip4_addr)

In [17]:
decode_ip4_addr(b'\n\x00\x00\xfe')

'10.0.0.254'

### Address Resolution Protocol (Layer 2)

If we somehow know the IP address of a device on our local network, we need to know its MAC address to be able to communicate. ARP is used to resolve IP addresses into MAC addresses. ARP is not specific to IPv4 or Ethernet and supports different address sizes, but we will keep it specific to IPv4 over Ethernet.

In [18]:
def arp_request(src_mac_addr, src_ip_addr, dst_ip_addr):
    hardware_type = bytes.fromhex("00 01")  # Ethernet
    protocol_type = bytes.fromhex("08 00")  # IPv4
    hardware_len  = bytes.fromhex("06")     # MAC address length
    protocol_len  = bytes.fromhex("04")     # IPv4 address length
    operation     = bytes.fromhex("00 01")  # request
    dst_mac_addr  = encode_mac_addr("00:00:00:00:00:00")  # ignored for a request
    
    return (
        hardware_type
        + protocol_type
        + hardware_len
        + protocol_len
        + operation
        + src_mac_addr
        + src_ip_addr
        + dst_mac_addr
        + dst_ip_addr
    )

If we want to receive a reply to an ARP request, we must put in our real MAC address (this is technically only necessary if the network uses switches instead of hubs). We will send a MAC request to the router on this network.

In [117]:
MY_MAC_ADDR = encode_mac_addr("1c:87:2c:46:e0:47")
MY_IP_ADDR = encode_ip4_addr("192.168.1.250")
SUBNET_MASK = encode_ip4_addr("255.255.255.0")
ROUTER_IP_ADDR = encode_ip4_addr("192.168.1.1")

arp_packet = arp_request(
    src_mac_addr=MY_MAC_ADDR,
    src_ip_addr=MY_IP_ADDR,
    dst_ip_addr=ROUTER_IP_ADDR,
)

When we send this packet, we want to broadcast it to the network (since we don't know the MAC address of the router yet. To do that, a special MAC address `ff:ff:ff:ff:ff:ff` is used.

In [20]:
BROADCAST_MAC_ADDR = encode_mac_addr("ff:ff:ff:ff:ff:ff")
INTERFACE_NAME = "eno1"

with open_packet_socket(INTERFACE_NAME) as s:
    s.send(
        ethernet_frame(
            src_addr=MY_MAC_ADDR,
            dst_addr=BROADCAST_MAC_ADDR,
            ethertype=ARP,
            payload=arp_packet,
        ),
    )

Opening socket.
Closing socket.


After sending this packet to the network we would expect to receive a reply from the router telling us its MAC address. This can be stored in a table for later use.

To get the reply, we need to read from the socket. Since we will receive all packets on the interface, it is necessary to decode every ethernet frame and discard those without an ARP ethertype. We could also receive other ARP replies at any time (called announcements, or gratuitous ARP messages) so we need to check if this is indeed the reply we were waiting for.

First, a function to decode an ethernet frame:

In [21]:
from collections import namedtuple

EthernetFrame = namedtuple("EthernetFrame", "dst_addr src_addr ethertype payload")
def decode_ethernet_frame(data):
    return EthernetFrame(
        dst_addr=data[0:6],
        src_addr=data[6:12],
        ethertype=data[12:14],
        payload=data[14:],
    )

Now we can receive the first packet on the socket:

In [24]:
with open_packet_socket("eno1") as s:
    data = s.recv(65536)
    
frame = decode_ethernet_frame(data)
print("Destination", format_mac_addr(frame.dst_addr))
print("Source", format_mac_addr(frame.src_addr))
print("Ethertype", frame.ethertype)
print("Payload length:", len(frame.payload), "bytes")

Opening socket.
Closing socket.
Destination 00:1f:16:f8:37:86
Source 1c:87:2c:46:e0:47
Ethertype b'\x08\x00'
Payload length: 40 bytes


But we could never guarantee the first packet we receive will be the ARP response we are waiting for. So we need a way to filter all other packets until we receive the ARP response we're waiting for. First let's see how many other packets are received before the response:

In [41]:
with open_packet_socket("eno1") as s:
    s.send(
        ethernet_frame(
            src_addr=MY_MAC_ADDR,
            dst_addr=BROADCAST_MAC_ADDR,
            ethertype=ARP,
            payload=arp_packet,
        ),
    )
    packets_received = 0
    while True:
        data = s.recv(65536)
        packets_received += 1
        frame = decode_ethernet_frame(data)
        if frame.dst_addr == MY_MAC_ADDR and frame.ethertype == ARP:
            print("Packets received: ", packets_received)
            print(frame.payload)
            break

Opening socket.
Packets received:  1
b'\x00\x01\x08\x00\x06\x04\x00\x02\x00\x1f\x16\xf87\x86\xc0\xa8\x01\x01\x1c\x87,F\xe0G\xc0\xa8\x01\xfa\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
Closing socket.


We need to be able to decode the ARP payload:

In [30]:
ArpPacket = namedtuple("ArpPacket", "htype ptype hlen plen oper sha spa tha tpa")
def decode_arp_packet(data):
    return ArpPacket(
        htype=data[0:2],
        ptype=data[2:4],
        hlen=data[4:5],
        plen=data[5:6],
        oper=data[6:8],
        sha=data[8:14],
        spa=data[14:18],
        tha=data[18:24],
        tpa=data[24:26],
    )

To avoid a situation where we wait forever we can set a timeout and use the `select` function to enforce the timeout even when we are waiting for packets to arrive:

In [31]:
import select, time

def receive_arp_response(socket, sender_ip_addr, timeout=1.0):
    time_left = timeout
    while True:
        start_select = time.time()
        ready = select.select([socket], [], [], time_left)
        select_time = time.time() - start_select
        if ready[0] == []: # timeout
            return

        data = socket.recv(65536)
        frame = decode_ethernet_frame(data)
        if frame.dst_addr == MY_MAC_ADDR and frame.ethertype == ARP:
            arp_packet = decode_arp_packet(frame.payload)
            if arp_packet.spa == sender_ip_addr:
                return arp_packet
            
        time_left -= select_time
        if time_left <= 0:
            return

In [116]:
with open_packet_socket("eno1") as s:
    s.send(
        ethernet_frame(
            src_addr=MY_MAC_ADDR,
            dst_addr=BROADCAST_MAC_ADDR,
            ethertype=ARP,
            payload=arp_packet,
        ),
    )
    response = receive_arp_response(s, ROUTER_IP_ADDR, timeout=1.0)
    print(format_mac_addr(response.sha))

Opening socket.
00:1f:16:f8:37:86
Closing socket.


We now have enough to write a function to resolve a IP address into a MAC address. It it common for these resolutions to be stored in an ARP table for future use rather than looking them up via ARP each time.

In [119]:
ARP_TABLE = {}

def resolve_local_ip_addr(ip_addr):
    """Resolve an IP address on the LAN to a MAC address"""
    if ip_addr in ARP_TABLE:
        return ARP_TABLE[ip_addr]
    with open_packet_socket("eno1") as s:
        arp_packet = arp_packet = arp_request(
            src_mac_addr=MY_MAC_ADDR,
            src_ip_addr=MY_IP_ADDR,
            dst_ip_addr=ip_addr,
        )
        s.send(
            ethernet_frame(
                src_addr=MY_MAC_ADDR,
                dst_addr=BROADCAST_MAC_ADDR,
                ethertype=ARP,
                payload=arp_packet,
            ),
        )
        response = receive_arp_response(s, ip_addr, timeout=1.0)
        if response:
            ARP_TABLE[ip_addr] = response.sha
            return response.sha
        else:
            return None

In [122]:
mac_addr = resolve_ip_addr(encode_ip4_addr("192.168.1.1"))
print(format_mac_addr(mac_addr))

00:1f:16:f8:37:86


However, we can only resolve IP addresses that are on our local network segment. To send messages to other IP addresses, we need to send them to the router instead. To work out which addresses are on our segment we use the subnet mask.

In [123]:
def is_on_same_subnet(ip_addr):
    """Returns True iff ip_addr is on the same subnet as our address"""
    my_network_prefix = bytes(b1 & b2 for b1, b2 in zip(MY_IP_ADDR, SUBNET_MASK))
    other_network_prefix = bytes(b1 & b2 for b1, b2 in zip(ip_addr, SUBNET_MASK))
    return my_network_prefix == other_network_prefix

In [125]:
is_on_same_subnet(encode_ip4_addr("192.168.1.54"))

True

In [127]:
is_on_same_subnet(encode_ip4_addr("8.8.8.8"))

False

In [129]:
def resolve_ip_addr(ip_addr):
    """Resolve any IP address to a MAC address"""
    if is_on_same_subnet(ip_addr):
        return resolve_local_ip_addr(ip_addr)
    else:
        return resolve_local_ip_addr(ROUTER_IP_ADDR)

# IP (Layer 3)

Now that we can resolve IP addresses to MAC addresses we can build IP packets to send across networks.

In [139]:
def calculate_checksum(header):
    words = [int.from_bytes(header[i:i+2], "big") for i in range(0, len(header), 2)]
    full_sum = sum(words)
    overflow = full_sum >> 16
    full_sum = (full_sum&0xFFFF) + overflow
    overflow = full_sum >> 16
    full_sum = (full_sum&0xFFFF) + overflow
    return (~full_sum&0xFFFF)

In [47]:
def ip4_packet(src_ip_addr, dst_ip_addr, protocol, payload):
    version = 4
    ihl = 5  # header length in 32 bits words
    dscp = 0
    ecn = 0
    total_length = ihl*4 + len(payload)
    identification = 0  # only important for fragmentation
    flags = 0b010  # don't fragment
    fragment_offset = 0
    ttl = 64
    
    header = bytearray(ihl*4)
    header[0] = (version<<4) + ihl
    header[1] = (dscp<<2) + ecn
    header[2:4] = total_length.to_bytes(2, "big")
    header[4:6] = identification.to_bytes(2, "big")
    header[6:8] = ((flags<<13) + fragment_offset).to_bytes(2, "big")
    header[8] = ttl
    header[9] = protocol
    header[12:16] = src_ip_addr
    header[16:20] = dst_ip_addr
    
    checksum = calculate_checksum(header)
    header[10:12] = checksum.to_bytes(2, "big")
    
    return header + payload

In [50]:
with open_packet_socket("lo") as s:
    s.send(
        ethernet_frame(
            src_addr=MY_MAC_ADDR,
            dst_addr=BROADCAST_MAC_ADDR,
            ethertype=IPV4,
            payload=ip4_packet(
                src_ip_addr=encode_ip4_addr("192.168.1.250"),
                dst_ip_addr=encode_ip4_addr("192.168.1.1"),
                protocol=0xfd,
                payload=b"Hello, world!",
            ),
        ),
    )

Opening socket.
Closing socket.


## ICMP

Internet Control Message Protocol (ICMP) is used to support IP routing. The famous "ping" utility uses ICMP. The client sends an echo request and a well-behaved host receiving a request will send back an echo reply.

In [84]:
def icmp_ping_request_packet(identifier, sequence_no):
    message_type = 8
    message_code = 0
    
    message = bytearray(8)
    message[0] = message_type
    message[1] = message_code
    message[4:6] = identifier.to_bytes(2, "big")
    message[6:8] = sequence_no.to_bytes(2, "big")
    message +=  "Python!!".encode("ascii")

    checksum = calculate_checksum(message)
    message[2:4] = checksum.to_bytes(2, "big")
    
    return message

In [102]:
def send_echo_request(socket, dst_ip_addr, identifier, sequence_no):
    socket.send(
        ethernet_frame(
            src_addr=MY_MAC_ADDR,
            dst_addr=resolve_ip_addr(dst_ip_addr),
            ethertype=IPV4,
            payload=ip4_packet(
                src_ip_addr=MY_IP_ADDR,
                dst_ip_addr=dst_ip_addr,
                protocol=1,
                payload=icmp_ping_request_packet(identifier, sequence_no),
            ),
        ),
    )

In order to receive the reply, we'll need corresponding functions to receive and unpack packets.

In [95]:
IcmpPacket = namedtuple("IcmpPacket", "type code checksum identifier sequence_no payload")
def decode_icmp_packet(data):
    return IcmpPacket(
        type=data[0],
        code=data[1],
        checksum=data[2:4],
        identifier=int.from_bytes(data[4:6], "big"),
        sequence_no=int.from_bytes(data[6:8], "big"),
        payload=data[8:],
    )

In [62]:
Ipv4Packet = namedtuple("Ipv4Packet", "version_ihl type length id flags_offset ttl protocol checksum src_addr dst_addr payload")
def decode_ipv4_packet(data):
    return Ipv4Packet(
        version_ihl=data[0],
        type=data[1],
        length=data[2:4],
        id=data[4:6],
        flags_offset=data[6:8],
        ttl=data[8],
        protocol=data[9],
        checksum=data[10:12],
        src_addr=data[12:16],
        dst_addr=data[16:20],
        payload=data[20:],
    )

In [108]:
def receive_echo_reply(socket, src_ip_addr, identifier, sequence_no, timeout=1.0):
    time_left = timeout
    while True:
        start_select = time.time()
        ready = select.select([socket], [], [], time_left)
        select_time = time.time() - start_select
        if ready[0] == []: # timeout
            return

        data = socket.recv(65536)
        frame = decode_ethernet_frame(data)
        if frame.dst_addr == MY_MAC_ADDR and frame.ethertype == IPV4:
            ipv4_packet = decode_ipv4_packet(frame.payload)
            if ipv4_packet.dst_addr == MY_IP_ADDR and ipv4_packet.src_addr == src_ip_addr and ipv4_packet.protocol == 1:
                icmp_packet = decode_icmp_packet(ipv4_packet.payload)
                if icmp_packet.identifier == identifier and icmp_packet.sequence_no == sequence_no:
                    return icmp_packet

        time_left -= select_time
        if time_left <= 0:
            return

In [134]:
with open_packet_socket("eno1") as s:
    start = time.time()
    send_echo_request(s, ROUTER_IP_ADDR, 1, 1)
    response = receive_echo_reply(s, ROUTER_IP_ADDR, 1, 1, timeout=1.0)
    delay = time.time() - start
    print(response)
    print(f"Delay: {delay*1000:.3f}ms")

Opening socket.
IcmpPacket(type=0, code=0, checksum=b'\xaa\x8c', identifier=1, sequence_no=1, payload=b'Python!!\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00')
Delay: 0.508ms
Closing socket.


Now, because we are using IPv4, we should be able to ping internet hosts!

In [140]:
GOOGLE_DNS_ADDR = encode_ip4_addr("8.8.8.8")
with open_packet_socket("eno1") as s:
    start = time.time()
    send_echo_request(s, GOOGLE_DNS_ADDR, 1, 1)
    response = receive_echo_reply(s, GOOGLE_DNS_ADDR, 1, 1, timeout=1.0)
    delay = time.time() - start
    print(response)
    print(f"Delay: {delay*1000:.3f}ms")

Opening socket.
IcmpPacket(type=0, code=0, checksum=b'\xaa\x8c', identifier=1, sequence_no=1, payload=b'Python!!\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00')
Delay: 11.328ms
Closing socket.
