# Finishing the Handshake

Take a peek at [where we left off last time](http://localhost:8888/notebooks/2.%20Reading%20Version%20Messages.ipynb#Parsing-a-complete-Version-response).

We were able to entice a response from our peer and then decode it. However, three problems remain:
1. Our initial `version` message payload is hardcoded. We should be able to construct it using any parameters we like.
2. After receiving our peer's `version` response, we don't listen for their `verack` response as the [version handshake](https://en.bitcoin.it/wiki/Version_Handshake) says we should.
3. We don't send our `verack` upon receipt of our peer's `verack`, the final step in the handshake.

Once we fix all these problems our program will be able to join the Bitcoin peer-to-peer network just like a [Bitcoin Core](https://github.com/bitcoin/bitcoin) full node does.  We won't be able to participate nearly as fully or effectively as Bitcoin Core node, but it's a start!

The last 2 problems simple fixes, since the `Verack` message [is so simple](https://en.bitcoin.it/wiki/Protocol_documentation#verack). 

Problem #1 will be more involved. We've spent a lot of time learning to _deserialize_ Bitcoin network messages -- to turn raw bytes into Python objects. The 3rd problem demands we do the opposite: _serialize_ Bitcoin messages, to turn Python objects into raw bytes that can be sent over the network to our peers, who may not even have Python installed!

If you've ever done any web development I'm sure you've learned to serialize and deserialize JSON, which is the de facto data representation in this arena. Bitcoin is no different, but it uses raw bytes instead of JSON. Pretty simple.

Let's tackle these problems one-by-one. But first, some imports to set up the notebook:

In [None]:
# this loads a jupyter extension which allows us to reimport python
# files every time we edit them
%load_ext autoreload
%autoreload 2

# import the code from last time
# FIXME: these should be loaded from ibd.two.complete.
# when API stabilizes I will replace ibd.two.complete with ibd.three.complete
from ibd.three.complete import *

# import all libraries we will need
import os, time, socket, ipytest, pytest

# Problem #1: Constructing Version Messages

We want to be able to do something like this:

```python
services = 1  # just NODE_NETWORK ...
my_address = "7.7.7.7"
peer_address = "9.9.9.9"
ver_msg = VersionMessage(
    version=70015,
    services=services,
    time=time.time(),
    addr_from=my_address,
    addr_recv=peer_address,
    nonce=1234567890,
    user_agent="bitcoin-corps",
    start_height=0,
    relay=1,
)
version_packet = Packet(
    command=version_message.command, 
    payload=version_message.to_bytes()
)
packet_bytes = version_packet.to_bytes()

sock = socket()
sock.connect((peer_address, 8333))
sock.send(packet_bytes)
print(Packet.from_socket(sock))
```

This would do the same exact thing as [the last cell in lesson 2](http://localhost:8888/notebooks/2.%20Reading%20Version%20Messages.ipynb#Parsing-a-complete-Version-response), but the version message we send is no longer hard-coded.

We are now free to send our peer whatever `version` number we like -- here we're choosing the most recent Bitcoin protocol version number 70015. We can advertise whatever `services` we like. We can define our own custom `user_agent` designating the Bitcoin implementation we're using: `bitcoin-corps`. And we can tell them we haven't started syncing the blockchain yet: `start_height=0`.

Most of the above snippet already works. But the two serialization methods do not: 
1. `Version.to_bytes()`
2. `Packet.to_bytes()`

These will be somewhat analagous to the `Packet.from_socket()` and `Version.from_bytes()` methods we wrote previously: but they will do exactly the inverse operations. 

`Packet.from_socket` loads a Python `Packet` class instance from bytes read from a socket, but `Packet.to_bytes` takes a Python `Packet` class and converts it into a `bytes`-representation which we can send to our peer using `socket.send`. 

Similarly, `Version.from_bytes` takes the `payload` bytes of a `Packet` instance and turns it into a Python `Version` class, and `Version.to_bytes` will take a `Version` instance and turn it into `bytes`-representation in order to include it as the `payload` of an outgoing `Packet`.

### A Simplified Example

Let's pretend the Bitcoin network also has a `pet` data type, similar to the `net_addr` and `services` we dealt with previously. Here's what the corresponding table would look like in the protocol documentation:


| Field Size | Description | Data type | Comments                     |
| ---------- | ----------- | --------- | ---------------------------- |
| 3          | kind        | char[3]   | 'dog', 'cat', 'cow', or 'pig'|
| 10         | name        | char[10]  | The pet's name               |

Pretty simple, right. Two attributes: 3-character `kind` and 10-character `name`.

Let's say we have a class like the one below, which already has a `from_bytes` classmethod defined to instantiate `Pet` instances from serialized `bytes` we receive over the wire. This leads to our first exercise ...

### Exercise #X - Write `Pet.to_bytes`

This would allow us to create an instance of our own `Pet`, serialize it into `bytes` and send it across the Bitcoin network (cringe, I know ...).

If you look at the test, you can tell this method is correct if `pet_bytes == Pet.from_bytes(pet_bytes).to_bytes()`. That is, you should be able to turn it from bytes into a Python class and then back to bytes and have the very same bytes you started with.

In [None]:
class Pet:
    valid_kinds = [b"cat", b"dog", b"pig", b"cow"]
    
    def __init__(self, kind, name):
        self.kind = kind
        self.name = name
    
    @classmethod
    def from_bytes(cls, b):
        stream = io.BytesIO(b)
        kind = stream.read(3)
        if kind not in cls.valid_kinds:
            raise RuntimeError("invalid 'kind'")
        name = stream.read(10)
        return cls(kind, name)
    
    def to_bytes(self):
        raise NotImplementedError()

In [None]:
def test_pet_to_bytes():
    pet_bytes = b'pigbuddy'
    pet = Pet.from_bytes(pet_bytes)
    assert pet_bytes == pet.to_bytes()
    
ipytest.run_tests(doctest=True)
ipytest.clean_test("test_pet_to_bytes*")

### Packet.to_bytes

So this is what `Packet.to_bytes` will look like.

With this we just need to `int_to_bytes` and `command_to_bytes`

In [None]:
# FIXME these should be exercises
def encode_command(cmd):
    padding_needed = 12 - len(cmd)
    padding = b"\x00" * padding_needed
    return cmd + padding


def int_to_bytes(i, length, byte_order="little"):
    return int.to_bytes(i, length, byte_order)


NETWORK_MAGIC = 0xD9B4BEF9


class Packet:
    def __init__(self, command, payload):
        self.command = command
        self.payload = payload

    @classmethod
    def from_socket(cls, sock):
        magic = read_magic(sock)
        if magic != NETWORK_MAGIC:
            raise RuntimeError(f'Network magic "{magic}" is wrong')

        command = read_command(sock)
        payload_length = read_length(sock)
        checksum = read_checksum(sock)
        payload = read_payload(sock, payload_length)

        computed_checksum = compute_checksum(payload)
        if computed_checksum != checksum:
            raise RuntimeError("Checksums don't match")

        if payload_length != len(payload):
            raise RuntimeError(
                "Tried to read {payload_length} bytes, only received {len(payload)} bytes"
            )

        return cls(command, payload)

    def to_bytes(self):
        result = int_to_bytes(NETWORK_MAGIC, 4)
        result += encode_command(self.command)
        result += int_to_bytes(len(self.payload), 4)
        result += compute_checksum(self.payload)
        result += self.payload
        return result

    def __repr__(self):
        return f"<Packet command={self.command}>"


### Version.to_bytes

define all helper functions

write unittests to test each line of to_bytes

In [None]:
def time_to_bytes(t, num_bytes):
    return int_to_bytes(t, num_bytes)

def str_to_var_str(s):
    pass

def bool_to_bytes(b):
    pass

class VersionMessage:

    command = b"version"

    def __init__(self,
        version,
        services,
        time,
        addr_recv,
        addr_from,
        nonce,
        user_agent,
        start_height,
        relay,
    ):
        self.version = version
        self.services = services
        self.time = time
        self.addr_recv = addr_recv
        self.addr_from = addr_from
        self.nonce = nonce
        self.user_agent = user_agent
        self.start_height = start_height
        self.relay = relay

    @classmethod
    def from_bytes(cls, payload):
        stream = io.BytesIO(payload)
        version = read_int(stream, 4)
        services = read_services(stream)
        time = read_time(stream)
        addr_recv = Address.from_stream(stream, version_msg=True)
        addr_from = Address.from_stream(stream, version_msg=True)
        nonce = read_int(stream, 8)
        user_agent = read_var_str(stream)
        start_height = read_int(stream, 4)
        relay = read_bool(stream)
        return cls(
            version,
            services,
            time,
            addr_recv,
            addr_from,
            nonce,
            user_agent,
            start_height,
            relay,
        )
    
    def to_bytes(self):
        msg = int_to_bytes(self.version, 4)
        msg += services_to_bytes(self.services)
        msg += time_to_bytes(self.time, 8)
        msg += self.addr_recv.to_bytes()  # TODO
        msg += self.addr_from.to_bytes()
        msg += int_to_bytes(self.nonce, 8)
        msg += str_to_var_str(self.user_agent)
        msg += int_to_bytes(self.start_height, 4)
        msg += bool_to_bytes(self.relay)
        return msg
    
    def __repr__(self):
        return f"<Message command={self.command}>"

### Exercise: implement `bool_to_bytes`

Hint: turn the argument `b` into an integer, then use `int_to_bytes` to serialize it into a 1 byte bytestring.

In [None]:
def bool_to_bytes(b): 
    raise NotImplementedError()

In [None]:
def test_bool_to_bytes():
    booleans = [
        True,
        False
    ]
    answers = [
        b'\x01',
        b'\x00',
    ]
    for boolean, answer in zip(booleans, answers):
        assert answer == bool_to_bytes(boolean)
        
ipytest.run_tests(doctest=True)
ipytest.clean_tests("test_bool_to_bytes*")

In order to implement `str_to_var_str`, which is required to serialize the user-agent, we first need to implement `int_to_var_int` because `str_to_var_str` uses it to encode the length of the variable-length string.

### Exercise: implement int_to_var_int

[reference](https://en.bitcoin.it/wiki/Protocol_documentation#Variable_length_integer)

In [None]:
def int_to_var_int(i):
    if i < 0xfd:
        return bytes([i])
    elif i <= 0xffff:
        return b"\xfd" + "FIXME"
    elif i <= 0xffffffff:
        "FIXME"
    elif "FIXME":
        "FIXME"
    else:
        raise RuntimeError("integer too large: {}".format(i))

In [None]:
# FIXME: we should have 4 separate tests ...
# So student can make one little part pass at a time

def test_int_to_var_int():
    numbers = [
        0x10,
        0x1000,
        0x10000000,
        0x1000000000000000
    ]
    answers = [
        b'\x10',
        b'\xfd\x00\x10',
        b'\xfe\x00\x00\x00\x10',
        b'\xff\x00\x00\x00\x00\x00\x00\x00\x10',
    ]
    for number, answer in zip(numbers, answers):
        assert answer == int_to_var_int(number)

ipytest.run_tests(doctest=True)
ipytest.clean_tests("test_int_to_var_int*")

### Exercise: Implement `str_to_var_str`

[reference](https://en.bitcoin.it/wiki/Protocol_documentation#Variable_length_string)

In [None]:
def str_to_var_str(s):
    length = len(s)
    return int_to_var_int(length) + s

In [None]:
def test_str_to_var_str():
    strings = [
        b"x" * 0x10,
        b"x" * 0x1000,
        b"x" * 0x10000000,
        b"x" * 0x100000001,
    ]
    answers = [
        b"\x10" + strings[0],
        b"\xfd" + b"\x00\x10" + strings[1], # FIXME doesn't seem right ...
#         b"\xfe" + b"\x00\x10" + strings[2],
#         b"\xff" + strings[3],
    ]
    for string, answer in zip(strings, answers):
        print(answer)
        print(str_to_var_str(string))
        assert answer == str_to_var_str(string)

ipytest.run_tests(doctest=True)
ipytest.clean_tests("test_str_to_var_str*")

### Address.to_bytes()

don't give them anything

First test implements port_to_bytes

Next test checks `Address.to_bytes` line by line / attribute by attribute ...

In [None]:
# FIXME these should be exercises

def port_to_bytes(port):
    return int_to_bytes(port, 2, byte_order="big")

IPV4_PREFIX = b"\x00" * 10 + b"\xff" * 2

def ip_to_bytes(ip):
    if ":" in ip:  # determine if address is IPv6
        return socket.inet_pton(socket.AF_INET6, ip)
    else:
        return IPV4_PREFIX + socket.inet_pton(socket.AF_INET, ip)

class Address:
    def __init__(self, services, ip, port, time):
        self.services = services
        self.ip = ip
        self.port = port
        self.time = time

    @classmethod
    def from_bytes(cls, bytes_, version_msg=False):
        stream = io.BytesIO(bytes_)
        return cls.from_stream(stream, version_msg)

    @classmethod
    def from_stream(cls, stream, version_msg=False):
        if version_msg:
            time = None
        else:
            time = read_time(stream)
        services = read_services(stream)
        ip = read_ip(stream)
        port = read_port(stream)
        return cls(services, ip, port, time)

    def to_bytes(self, version_msg=False):
        result = b""
        if self.time:
            result += time_to_bytes(self.time, 4)
        result += services_to_bytes(self.services)
        result += ip_to_bytes(self.ip)
        result += port_to_bytes(self.port)
        return result

    def __eq__(self, other):
        return self.__dict__ == other.__dict__

    def __repr__(self):
        return f"<Address {self.ip}:{self.port}>"


### Constructing and Serializing a Version Message



In [None]:
from ibd.three.complete import *  # get the final version ...

services = 1
my_ip = "7.7.7.7"
peer_ip = "6.6.6.6"
port = 8333
now = int(time.time())

# addresses in version messages don't have "time" attributes
my_address = Address(services, my_ip, port, time=None)
peer_address = Address(services, peer_ip, port, time=None)

version_message = VersionMessage(
    version=70015,
    services=services,
    time=now,
    # FIXME should we make this not in the first time this block of code appears?
    # we're going to send this message, so it's "from" us ...
    addr_from=my_address,
    # and our peer will receive it
    addr_recv=peer_address,
    nonce=73948692739875,
    user_agent=b"bitcoin-corps",
    start_height=0,
    relay=1,
)

payload = version_message.to_bytes()
print("serialized version: ", payload)

packet = Packet(command=version_message.command, payload=payload)
print("serialized packet: ", packet.to_bytes())

# Problem #2: Read Verack

Verack messages [don't have payload](https://en.bitcoin.it/wiki/Protocol_documentation#verack). Remember all that work we had to do while implementing `Packet.from_socket` and `Vesion.from_stream`? We won't have to do anything of the sort while implementing our `Verack` class.

For consistency sake we're going to maintain the conventions used previously:
* class-level `command` attribute.
* `from_bytes` classmethod
* `__repr__` method, which dictates how the Verack class will be printed in [some circumstances](https://stackoverflow.com/questions/1436703/difference-between-str-and-repr).

Here's the implementation:

In [None]:
class VerackMessage:

    command = b'verack'
    
    @classmethod
    def from_bytes(cls, s):
        return cls()
    
    def __repr__(self):
        return "<Verack>"

Like I said, pretty simple!

# Problem #3: Verack Response

To send a `verack` response it would be sufficent to do the following:

```python
packet = Packet(command=b"verack", payload=b"")
sock.send(packet.to_bytes())
```

But that would be inconsistent with the convention we set in the last section of creating `Packet` instances this way:

```python
packet = Packet(
    command=some_message.command, 
    payload=some_message.to_bytes()
)
```

And from now on we should avoid hard-coding anything we don't have to. So let's implement a `VerackMessage.to_bytes()` just like we did with `VersionMessage`. This will allow us to use the same calling when sending either kind of message.

In [None]:
class VerackMessage:

    command = b'verack'

    @classmethod
    def from_bytes(cls, s):
        return cls()

    def to_bytes(self):
        return b""
    
    def __repr__(self):
        return "<Verack>"


# The Handshake 

In [None]:
import socket
import time

from ibd.three.complete import *  # get the final version ...


def handshake(address):
    # Arguments for our outgoing VersionMessage
    services = 1
    my_ip = "7.7.7.7"
    peer_ip = address[0]
    port = address[1]
    now = int(time.time())
    my_address = Address(services, my_ip, port, time=None)
    peer_address = Address(services, peer_ip, port, time=None)

    # Create out outgoing VersionMessage and Packet instances
    version_message = VersionMessage(
        version=70015,
        services=services,
        time=now,
        addr_from=my_address,
        addr_recv=peer_address,
        nonce=73948692739875,
        user_agent=b"bitcoin-corps",
        start_height=0,
        relay=1,
    )
    version_packet = Packet(
        command=version_message.command, 
        payload=version_message.to_bytes()
    )
    serialized_packet = version_packet.to_bytes()

    # Create the socket
    sock = socket.socket()

    # Initiate TCP connection
    sock.connect(address)

    # Initiate the Bitcoin version handshake
    sock.send(serialized_packet)

    # Receive their "version" response
    pkt = Packet.from_socket(sock)
    peer_version_message = VersionMessage.from_bytes(pkt.payload)
    print(peer_version_message)

    # Receive their "verack" response
    pkt = Packet.from_socket(sock)
    peer_verack_message = VerackMessage.from_bytes(pkt.payload)
    print(peer_verack_message)

    # Send out "verack" response
    verack_message = VerackMessage()
    verack_packet = Packet(verack_message.command, payload=verack_message.to_bytes())
    sock.send(verack_packet.to_bytes())

    return sock

In [None]:
handshake(("35.198.151.21", 8333))

# What comes next?

We've successfully executed the handshake, and our `handshake` function returns a live socket ...

Why don't we just listen on the socket forever (or until the process is killedby typing "ii" or hitting the square "stop" button in the menu at the top of the screen) and see what happens?

In [None]:
sock = handshake(("35.198.151.21", 8333))

while True:
    packet = Packet.from_socket(sock)
    print(packet)

# Like a Full Node

You likely received all kinds of different command.

Some -- like `feefilter` and `sendheaders` -- are your peer attempting to tell you what kind of data they want from you. You'll see a lot of these at the beginning of the output. But as time passes, you'll mostly see `inv` type messages. These are containers telling you about all kinds of new objects that your peer just found out about. If you were to decode these "inv" messages you could request the specific objects and you'd get a bunch of `tx` and some `block` messages in return

FIXME should we do this?

### Responding to inv messages

### Following `addr` messages

Another thing we can do is wait until our peer shares their list of currently connected addresses, and attempt to connect to some of their peers. Such a program would basically be a "Bitcoin network crawler", and building it will in fact be the topic of lesson 4.

But to conclude lesson 3, let's build a simple, naive crawler and point our some deficiencies which will need to be corrected in order to realistically crawl the entire network.



In [None]:
from ibd.three.complete import *

 
def simple_crawler():
    addresses = [
        ("35.198.151.21", 8333),
        ("91.221.70.137", 8333),
        ("92.255.176.109", 8333),
        ("94.199.178.17", 8333),
        ("213.250.21.112", 8333),
    ]
    while addresses:
        
        address = addresses.pop()
        print('connecting to ', address)
        sock = handshake(address)
        
        print("Waiting for addr message")
        listening = True
        while listening:
            packet = Packet.from_socket(sock)
            if packet.command == b"addr":
                addr_message = AddrMessage.from_bytes(packet.payload)
                if len(addr_message.addresses) == 1 and addr_message.addresses[0].ip == address[0]:
                    print("Received addr message with only our peer's address. Still waiting ...")
                else:
                    print(f"Received {len(addr_message.addresses)} addrs")
                    addresses.extend([(a.ip, a.port) for a in addr_message.addresses])
                    listening = False
    print("ran out of addresses. exiting.")

In [None]:
simple_crawler()