# Table of Contents

1. serialize version
2. verack
3. look at all these nice messages!
4. let's decode addrs and try to follow them
5. Why so slow? We can only have 2 connection open at a time, and it takes ~30 seconds to get a non-empty addr message from the peer? Graph this. "In the next lesson we solve this".

### lesson #4
* threaded crawler
    * build it slowly. Arrive at class-based design.
    * copy from 3.2
* explain concurrency
    * 3.2
* Give a brief explanation of the reporter script. Invite them to run onvernght ...

# Finishing the Handshake

Take a peek at [where we left off last time](http://localhost:8888/notebooks/2.%20Reading%20Version%20Messages.ipynb#Parsing-a-complete-Version-response).

There are stil 3 large problems:
1. You are given a `VERSION` message, when you should make you construct it using whatever parameters you like.
2. After receiving out peer's `version` response, we don't listen for their `verack` response as the [version handshake](https://en.bitcoin.it/wiki/Version_Handshake) says we should.
3. We don't send our `verack` upon receipt of our peer's `verack`

Once we fix all these problems our code will be able to join the Bitcoin peer-to-peer network just like a [Bitcoin Core](https://github.com/bitcoin/bitcoin) full node does.  We won't be able to participate nearly as fully or effectively as Bitcoin Core node, but it's a start!

Let's tackle these problems one-by-one.

In [None]:
# this loads a jupyter extension which allows us to reimport python
# files every time we edit them
%load_ext autoreload
%autoreload 2


# import the code from last time
# FIXME: these should be loaded from ibd.two.complete.
# when API stabilizes I will replace ibd.two.complete with ibd.three.complete
from ibd.three.complete import *

# import all libraries we will need
import os, time, socket, ipytest, pytest

# Problem #1: Constructing Version Messages

We want to be able to do something like this:

```python
services = 1  # just NODE_NETWORK ...
my_address = "7.7.7.7"
peer_address = "9.9.9.9"
ver_msg = VersionMessage(
    version=70015,
    services=services,
    time=time.time(),
    addr_from=my_address,
    addr_recv=peer_address,
    nonce=1234567890,
    user_agent="bitcoin-corps",
    start_height=0,
    relay=1,
)
version_packet = Packet(
    command=version_message.command, 
    payload=version_message.to_bytes()
)
packet_bytes = version_packet.to_bytes()

sock = socket()
sock.connect((peer_address, 8333))
sock.send(packet_bytes)
print(Packet.from_socket(sock))
```

This would do the same exact thing as [the last cell in lesson 2](http://localhost:8888/notebooks/2.%20Reading%20Version%20Messages.ipynb#Parsing-a-complete-Version-response), but it's not hard-coded.

We are now free to send our peer whatever `version` number we like -- here we're choosing the most recent Bitcoin protocol version number. We can advertise whatever `services` we like. We can define our own custom `user_agent` designating the bitcoin implementation we're using. And we can tell them we haven't started syncing the blockchain yet: `start_height=0`.

Most of the above snippet already works. But two methods don't: 
1. `Version.to_bytes()`
2. `Packet.to_bytes()`
3. `Address.to_bytes()`

These will be somewhat analagous to the `Packet.from_socket()` and `Version.from_bytes()` methods we wrote previously: but they will do exactly the inverse operations. 

`Packet.from_socket` loads a Python `Packet` class instance from data we receive over the wire through a Python `socket.socket` instance, where `Packet.to_bytes` takes a Python `Packet` class and converts it into a `bytes` representation which we can send to our peer using `socket.send`. 

`Version.from_bytes` takes the `payload` bytes of a `Packet` instance and turns it into a Python `Version` class, and `Version.to_bytes` will take a `Version` instance and turn it into `bytes`-representation in order to include it as the `payload` of an outgoin `Packet`.

To confince you, this much already works:

In [None]:
services = 1  # just NODE_NETWORK ...
my_address = "7.7.7.7"
peer_address = "9.9.9.9"
version_message = VersionMessage(
    version=70015,
    services=services,
    time=time.time(),
    addr_from=my_address,
    addr_recv=peer_address,
    nonce=1234567890,
    user_agent="bitcoin-corps",
    start_height=0,
    relay=1,
)
print(version_message)

### A Stupid Simplified Example

Let's pretend the Bitcoin network sends us `pet` instances just like it sends us `net_addrs` and `services` etc. Here's what the corresponding table looks like in the protocol documentation:


| Field Size | Description | Data type | Comments                     |
| ---------- | ----------- | --------- | ---------------------------- |
| 3          | kind        | char[3]   | 'dog', 'cat', 'cow', or 'pig'|
| ?          | name        | var_str   | The pet's name               |

Pretty simple, right. Two attributes, 3-character `kind` and variable-length string `name`.

Let's say we have a class like the one below, which already has a `from_bytes` classmethod defined to instantiate `Pet` instances from serialized `bytes` we receive over the wire.

### Exercise #X - Write `Pet.to_bytes`

This would allow us to create an instance of our own `Pet`, serialize it into `bytes` and send it across the bitcoin network (cringe, I know ...).

If you look at the test, you can tell this method is correct if `pet_bytes == Pet.from_bytes(pet_bytes).to_bytes()`. That is, you should be able to turn it from bytes into a Python class and then back to bytes and have the very same bytes you started with.

Hint: use the `str_to_var_str` function from last time ...


In [None]:
class Pet:
    valid_kinds = [b"cat", b"dog", b"pig", b"cow"]
    
    def __init__(self, kind, name):
        self.kind = kind
        self.name = name
    
    @classmethod
    def from_bytes(cls, b):
        stream = io.BytesIO(b)
        kind = stream.read(3)
        name = read_var_str(stream)
        return cls(kind, name)
    
    def to_bytes(self):
        # hint: str_to_var_str
        raise NotImplementedError()

In [None]:
class Pet:
    valid_kinds = [b"cat", b"dog", b"pig", b"cow"]
    
    def __init__(self, kind, name):
        self.kind = kind
        self.name = name
    
    @classmethod
    def from_bytes(cls, b):
        stream = io.BytesIO(b)
        kind = stream.read(3)
        name = read_var_str(stream)
        return cls(kind, name)
    
    def to_bytes(self):
        return self.kind + str_to_var_str(self.name)

In [None]:


def test_pet_to_bytes():
    pet_bytes = b'pig\x05buddy'
    pet = Pet.from_bytes(pet_bytes)
    assert pet_bytes == pet.to_bytes()
    
ipytest.run_tests(doctest=True)
ipytest.clean_tests("test_pet_to_bytes*")

But how can we turn this "serialize" this class? How can we turn it from a python `VersionMessage` instance into a bytestring like the one we've been cheating with:

> b'\xf9\xbe\xb4\xd9version\x00\x00\x00\x00\x00j\x00\x00\x00\x9b"\x8b\x9e\x7f\x11\x01\x00\x0f\x04\x00\x00\x00\x00\x00\x00\x93AU[\x00\x00\x00\x00\x0f\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0f\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00rV\xc5C\x9b:\xea\x89\x14/some-cool-software/\x01\x00\x00\x00\x01'

This demands we implement the inverse function to the previously implemented `VersionMessage.from_bytes()`: we want `VersionMessage.to_bytes()`. With this function we'll be able to create and serialize the network packet using `Packet.from_bytes(VersionMessage.to_bytes()).to_bytes()`. The result of this operation will look much like the bytestring above.

Let's get started by writing `VersionMessage.to_bytes()`:

### Packet.to_bytes

So this is what `Packet.to_bytes` will look like.

With this we just need to `int_to_bytes` and `command_to_bytes`

In [None]:
def command_to_bytes(command):
    # command is a string, not bytes ...
    pass

def int_to_bytes(n):
    pass

class Packet:
    def __init__(self, command, payload):
        self.command = command
        self.payload = payload

    @classmethod
    def from_socket(cls, sock):
        magic = read_magic(sock)
        if magic != NETWORK_MAGIC:
            raise RuntimeError(f'Network magic "{magic}" is wrong')

        command = read_command(sock)
        payload_length = read_length(sock)
        checksum = read_checksum(sock)
        payload = read_payload(sock, payload_length)

        computed_checksum = compute_checksum(payload)
        if computed_checksum != checksum:
            raise RuntimeError("Checksums don't match")

        if payload_length != len(payload):
            raise RuntimeError(
                "Tried to read {payload_length} bytes, only received {len(payload)} bytes"
            )

        return cls(command, payload)

    def to_bytes(self):
        result = int_to_bytes(NETWORK_MAGIC, 4)
        result += encode_command(self.command)
        result += int_to_bytes(len(self.payload), 4)
        result += compute_checksum(self.payload)
        result += self.payload
        return result

    def __repr__(self):
        return f"<Message command={self.command}>"


### Version.to_bytes

define all helper functions

write unittests to test each line of to_bytes

In [None]:
def time_to_bytes(t):
    pass

def str_to_var_str(s):
    pass

def bool_to_bytes(b):
    pass

class VersionMessage:

    command = b"version"

    def __init__(self,
        version,
        services,
        time,
        addr_recv,
        addr_from,
        nonce,
        user_agent,
        start_height,
        relay,
    ):
        self.version = version
        self.services = services
        self.time = time
        self.addr_recv = addr_recv
        self.addr_from = addr_from
        self.nonce = nonce
        self.user_agent = user_agent
        self.start_height = start_height
        self.relay = relay

    @classmethod
    def from_bytes(cls, payload):
        stream = io.BytesIO(payload)
        version = read_int(stream, 4)
        services = read_services(stream)
        time = read_time(stream)
        addr_recv = Address.from_stream(stream, version_msg=True)
        addr_from = Address.from_stream(stream, version_msg=True)
        nonce = read_int(stream, 8)
        user_agent = read_var_str(stream)
        start_height = read_int(stream, 4)
        relay = read_bool(stream)
        return cls(
            version,
            services,
            time,
            addr_recv,
            addr_from,
            nonce,
            user_agent,
            start_height,
            relay,
        )

    def to_bytes(self):
        msg = int_to_bytes(self.version, 4)
        msg += services_to_bytes(self.services)
        msg += time_to_bytes(self.time, 8)
        # FIXME: hack something up so this doesn't blow up ...
        msg += self.addr_recv.to_bytes()
        msg += self.addr_from.to_bytes()
        msg += int_to_bytes(self.nonce, 8)
        msg += str_to_var_str(self.user_agent)
        msg += int_to_bytes(self.start_height, 4)
        msg += bool_to_bytes(self.relay)
        return msg

    def __repr__(self):
        return f"<Message command={self.command}>"


### Address.to_bytes()

don't give them anything

First test implements port_to_bytes

Next test checks `Address.to_bytes` line by line / attribute by attribute ...

In [None]:
def port_to_bytes(p):
    # BIT ENDIAN!!!
    pass

class Address:
    def __init__(self, services, ip, port, time):
        self.services = services
        self.ip = ip
        self.port = port
        self.time = time

    @classmethod
    def from_bytes(cls, bytes_, version_msg=False):
        stream = io.BytesIO(bytes_)
        return cls.from_stream(stream, version_msg)

    @classmethod
    def from_stream(cls, stream, version_msg=False):
        if version_msg:
            time = None
        else:
            time = read_time(stream)
        services = read_services(stream)
        ip = read_ip(stream)
        port = read_port(stream)
        return cls(services, ip, port, time)

    def to_bytes(self, version_msg=False):
        # FIXME: don't call this msg
        msg = b""
        # FIXME: What's the right condition here
        if self.time:
            msg += time_to_bytes(self.time, 4)
        msg += services_to_bytes(self.services)
        msg += ip_to_bytes(self.ip)
        msg += port_to_bytes(self.port)
        return msg

    def __eq__(self, other):
        return self.__dict__ == other.__dict__

    def __repr__(self):
        return f"<Address {self.ip}:{self.port}>"


### Constructing and Serializing a Version Message



In [None]:
services = 1
my_ip = "7.7.7.7"
peer_ip = "6.6.6.6"
port = 8333
now = int(time.time())

# addresses in version messages don't have "time" attributes
my_address = Address(services, my_ip, port, time=None)
peer_address = Address(services, peer_ip, port, time=None)

version_message = VersionMessage(
    version=70015,
    services=services,
    time=now,
    # FIXME should we make this not in the first time this block of code appears?
    # we're going to send this message, so it's "from" us ...
    addr_from=my_address,
    # and our peer will receive it
    addr_recv=peer_address,
    nonce=73948692739875,
    user_agent=b"bitcoin-corps",
    start_height=0,
    relay=1,
)

# Problem #2: Reading Verack Messages

Verack messages don't have payload. The only information contained in a `verack` message is a confirmation that the our previously sent `version` message was received. The very existence of the message is all we need to know. So our `VerackMessage` class won't do much:

The only reason this class even needs to exist is to just maintain consistency of having a class corresponding to every kind of network message ...

In [None]:
class VerackMessage:

    command = b'verack'

    @classmethod
    def from_bytes(cls, s):
        return cls()
    
    def __repr__(self):
        return "<Verack>"


Like I said, pretty simple!

# Problem #3: Generating Verack Response

To do this we just need a `VerackMessage.to_bytes()` function just like we made with out `VersionMessage`. But again, it won't do much:

In [None]:
class VerackMessage:

    command = b'verack'

    @classmethod
    def from_bytes(cls, s):
        return cls()

    def to_bytes(self):
        return b""
    
    def __repr__(self):
        return "<Verack>"


# The Handshake 

In [None]:
import socket
import time

from ibd.three.complete import *  # get the final version ...


def handshake():
    # Arguments for our outgoing VersionMessage
    services = 1
    my_ip = "7.7.7.7"
    peer_ip = "6.6.6.6"
    port = 8333
    now = int(time.time())
    my_address = Address(services, my_ip, port, time=None)
    peer_address = Address(services, peer_ip, port, time=None)

    # Create out outgoing VersionMessage and Packet instances
    version_message = VersionMessage(
        version=70015,
        services=services,
        time=now,
        addr_from=my_address,
        addr_recv=peer_address,
        nonce=73948692739875,
        user_agent=b"bitcoin-corps",
        start_height=0,
        relay=1,
    )
    version_packet = Packet(
        command=version_message.command, payload=version_message.to_bytes()
    )

    # Create the socket
    PEER_IP = "35.198.151.21"
    PEER_PORT = 8333
    sock = socket.socket()

    # Initiate TCP connection
    sock.connect((PEER_IP, PEER_PORT))

    # Initiate the Bitcoin version handshake
    sock.send(version_packet.to_bytes())

    # Receive their "version" response
    pkt = Packet.from_socket(sock)
    peer_version_message = VersionMessage.from_bytes(pkt.payload)
    print(peer_version_message)

    # Receive their "version" response
    pkt = Packet.from_socket(sock)

    peer_verack_message = VerackMessage.from_bytes(pkt.payload)
    print(peer_verack_message)

    # Send out "verack" response
    verack_message = VerackMessage()
    verack_packet = Packet(verack_message.command, payload=verack_message.to_bytes())
    sock.send(verack_packet.to_bytes())

    return sock

In [None]:
handshake()

# What comes next?

We've successfully executed the handshake, and our `handshake` function returns a live socket ...

Why don't we just listen on the socket forever (or until the process is killedby typing "ii" or hitting the square "stop" button in the menu at the top of the screen) and see what happens?

In [None]:
sock = handshake()

while True:
    packet = Packet.from_socket(sock)
    print(packet)

# Like a Full Node

You likely received all kinds of different command.

Some -- like `feefilter` and `sendheaders` -- are your peer attempting to tell you what kind of data they want from you. You'll see a lot of these at the beginning of the output. But as time passes, you'll mostly see `inv` type messages. These are containers telling you about all kinds of new objects that your peer just found out about. If you were to decode these "inv" messages you could request the specific objects and you'd get a bunch of `tx` and some `block` messages in return

FIXME should we do this?

### Responding to inv messages

### Following `addr` messages

Another thing we can do is wait until our peer shares their list of currently connected addresses, and attempt to connect to some of their peers. Such a program would basically be a "Bitcoin network crawler", and building it will in fact be the topic of lesson 4.

But to conclude lesson 3, let's build a simple, naive crawler and point our some deficiencies which will need to be corrected in order to realistically crawl the entire network.



In [None]:
class AddrMessage:

    command = b"addr"

    def __init__(self, addresses):
        # FIXME this is kind of a weird variable name ...
        self.addresses = addresses

    @classmethod
    def from_bytes(cls, bytes_):
        stream = io.BytesIO(bytes_)
        count = read_var_int(stream)
        address_list = []
        for _ in range(count):
            address_list.append(Address.from_stream(stream))
        return cls(address_list)

    def __repr__(self):
        return f"<AddrMessage {len(self.address_list)}>"


def simple_crawler():
    addresses = [("35.198.151.21", 8333)]
    
    print("Waiting for addr message")
    while len(addresses):
        address = addresses.pop()
        sock = handshake(address)
        packet = Packet.from_socket(sock)
        if packet.command == b"addr":
            addr_message = AddrMessage(packet.payload)
            if len(addr_message.addresses) == 1:
                print("Received addr message with only our peer's address. Still waiting ...")
t            else:
                print("Received {len(addr_message.addresses)} addrs")
                addresses.extend(addr_message.addresses)
                next_address = addresses.pop()
                handshake(next_address)  # FIXME this argument isn't supported yet ...
    print("ran out of addresses. exiting.")

In [None]:
from io import BytesIO

class FakeSocket:
    
    def __init__(self, bytes_):
        self.stream = BytesIO(bytes_)
        
    def recv(self, n):
        return self.stream.read(n)

vb = b'\xf9\xbe\xb4\xd9version\x00\x00\x00\x00\x00j\x00\x00\x00\x9b"\x8b\x9e\x7f\x11\x01\x00\x0f\x04\x00\x00\x00\x00\x00\x00\x93AU[\x00\x00\x00\x00\x0f\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0f\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00rV\xc5C\x9b:\xea\x89\x14/some-cool-software/\x01\x00\x00\x00\x01'

p = Packet.from_socket(FakeSocket(vb))
print(len(p.payload))