In [1]:
%load_ext autoreload
%autoreload 2

# Finishing the Handshake

Take a peek at [where we left off last time](../2.%20Reading%20Version%20Messages/Lesson.ipynb#Putting-it-all-together).

We were able to entice a response from our peer and then interpret it. However, three problems remain:
1. Our initial `version` message payload is hardcoded. We should be able to construct it using any parameters we like.
2. After receiving our peer's `version` response, we don't listen for their `verack` response as the [version handshake](https://en.bitcoin.it/wiki/Version_Handshake) says we should.
3. We don't send our `verack` upon receipt of our peer's `verack`, the final step in the handshake.

Once we fix all these problems our program will be able to join the Bitcoin peer-to-peer network just like a [Bitcoin Core](https://github.com/bitcoin/bitcoin) full node does.  We won't be able to participate nearly as fully or effectively as Bitcoin Core node, but it's a start!

The last 2 problems are easy to fix. `verack` messages are easy to serialize and deserialize because they [have no payload](https://en.bitcoin.it/wiki/Protocol_documentation#verack). 

Problem #1 will be more involved. We've spent a lot of time learning to _deserialize_ Bitcoin network messages: to turn raw bytes into Python objects. The 1st problem demands we do the opposite: _serialize_ Bitcoin messages, to turn Python objects into raw bytes that can be sent over the network to our peers, who may not even have Python installed!

If you've ever done any web development I'm sure you've learned to serialize and deserialize JSON, which is the de facto data representation on the web. Bitcoin is no different, but it uses raw bytes instead of JSON.

Let's tackle these problems one-by-one.

# Problem #1: Constructing Version Messages

We desire a `serialize_version_payload` function like this:

```python
version_payoad = serialize_version_payload(
    version=7011,
    user_agent=b'/buidl-army/',
)
msg = serialize_msg(command=b'verack', payload=version_payload)
sock.send(msg)
```

`serialize_version_payload` would set default values for every parameter in a version messge and you could override those values by passing them as arguments. So, for example, it would generate a current `timestamp`. It would also have to throw in some dummy data for `recipient_address` and `sender_address`, but these aren't all that bad because node software throws out these attributes anyway.

Again, such a function would do the exact opposite from the `read_version_payload` we wrote earlier. The `serialize_message` function would do the opposite of [`read_message`](http://localhost:8889/notebooks/2.%20Reading%20Version%20Messages/Lesson.ipynb#Housekeeping) function we wrote earlier. Now that we learned to do all these `bytes` -> python data conversions, hopefully the inverse python -> `bytes` data conversions will be a little easier!

These two functions would prepare the bytes of the the payload, then the bytes of the whole message -- which can be sent directly over the wire via socket to our peer.

So which should we implement first? Let's do `serialize_version_payload` because these ideas are fresher in our minds.

# `serialize_version_payload`

Below is the outline of a function which can serialize a payload dictionary -- the kind returned by `read_version_payload` last lesson -- into raw bytes that can be sent over a socket connection.

[this file](/edit/3.%20Composing%20Version%20Messages/exercises.py)

In [48]:
import time
from random import randint

ZERO = b'\x00'

dummy_address = {
    "services": 0,
    "ip": '0.0.0.0',
    "port": 8333
}

def serialize_version_payload(
        version=70015, services=0, timestamp=None,
        receiver_address=dummy_address,
        sender_address=dummy_address,
        nonce=None, user_agent=b'/buidl-army/',
        start_height=0, relay=True):
    if timestamp is None:
        timestamp = int(time.time())
    if nonce is None:
        nonce = randint(0, 2**64)
    # message starts empty, we add to it for every field
    msg = b''
    # version
    msg += ZERO * 4
    # services
    msg += ZERO * 8
    # timestamp
    msg += ZERO * 8
    # receiver address
    msg += ZERO * 26
    # sender address
    msg += ZERO * 26
    # nonce
    msg += ZERO * 8
    # user agent
    msg += ZERO * 1 # zero byte signifies an empty varstr
    # start height
    msg += ZERO * 4
    # relay
    msg += ZERO * 1
    return msg

In [50]:
from io import BytesIO
from lib import read_version_payload

version_payload_bytes = serialize_version_payload()
version_payload_dict = read_version_payload(BytesIO(version_payload_bytes))
print(version_payload_dict)

{'version': 0, 'services': 0, 'timestamp': 0, 'receiver_address': {'services': 0, 'ip': '0.0.0.0', 'port': 0}, 'sender_address': {'services': 0, 'ip': '0.0.0.0', 'port': 0}, 'nonce': 0, 'user_agent': b'', 'start_height': 0, 'relay': 0}


A few observations:

It has default parameters for every field in a a version message. 

We need the `if` statements for `timestamp` and `nonce` because the defaults for these values need to be computed when the function is actually run and not when the code is first executed -- `timestamp` should be current unix time when the function is called, `nonce` should be a different random 8-byte number every time the function is called.

Finally, we stick the correct number of zero bytes (`b'\x00'`) for each field as sort of a placeholder.

In this lesson we will go field-by-field through this function and replace the zero bytes with the `bytes` serialization of that field.

Let's start with the integer fields because there so many of them and because they're simple.

# Integer Fields

`b\x01` is one way to encode the integer `1` as `bytes` -- in a single byte.

But that's not the only way to encode the integer `1` as `bytes` ...

`b\x01\x00` is another way -- in 2 bytes with little-endian order. 

`b\x00\x00\x01` is yet another way -- in 3 bytes with big-endian order.

We can always pad the encoding with zero-bytes in higher orders-of-magnitude.

This ability is essential because it allows use to have fields with fixed size, and fixed field sizes are one of the ways we make sense of the endless string of bytes we receive from our peers. Also, these fixed sized fields can allow for room to grow in the future. For example, timestamps increase over time. We need to make room for them to grow indefinitely into the future.

For example, this is how you'd calculate the current unix time:

In [3]:
import time

now = int(time.time())
now

1551149436

Some `timestamp` fields in the bitcoin protocol are only 4 bytes. These require no padding:

In [51]:
now.to_bytes(4, 'little')

b'|\xa9t\\'

But eventually these fields will start overflowing. Here I calculate the highest Unix timestamp which can be serialized in only 4 bytes -- `FF FF FF FF`. Then I show the date it corresponds to -- the date these 4 byte timestamp fields will overflow.

In [64]:
from datetime import datetime

# The biggest 4-byte little endian number: 256**4
n_bytes = b'\xff\xff\xff\xff'
n = int.from_bytes(n_bytes, 'little')
print('The biggest 4 byte Unix timestamp is', n)

overflow_date = datetime.fromtimestamp(n+1)
print('4 byte timestamp fields in the Bitcoin protocol will overflow at', overflow_date)

The biggest 4 byte Unix timestamp is 4294967295
4 byte timestamp fields in the Bitcoin protocol will overflow at 2106-02-07 00:28:16


This is a big problem and will most likely require a hard fork in the distant future. One of Satoshi's bigger screwups for sure!

Other `timestamp` fields, however, are 8 bytes. This is much better. They currently have four bytes of padding and have lots of space to grow in the future:

In [65]:
print('Much zero-byte padding, so growing room: ')
now.to_bytes(8, 'little')

Much zero-byte padding, so growing room: 


b'|\xa9t\\\x00\x00\x00\x00'

In fact, this range goes so high that your computer probably can't even interpret the high end as a timestamp:

In [67]:
# The biggest 4-byte little endian number: 256**4
n_bytes = b'\xff\xff\xff\xff\xff\xff\xff\xff'
n = int.from_bytes(n_bytes, 'little')
print('The biggest 8 byte Unix timestamp is', n)

# On my machine this date is so far into the future that it can't 
overflow_date = datetime.fromtimestamp(n+1)
print('8-byte timestamp fields in Bitcoin will overflow on', overflow_date)

The biggest 8 byte Unix timestamp is 18446744073709551615


OverflowError: timestamp out of range for platform time_t

8 byte Unix timetamp fields won't overflow for *billions* of years!

In [69]:
secs_in_year = 365 * 24 * 60 * 60

years = n // secs_in_year

print(f'8-byte timestamp fields in Bitcoin will overflow in the year {1970 + years:,d}')

8-byte timestamp fields in Bitcoin will overflow in the year 584,942,419,325


Now that's what I call breathing room!

Anyway, hopefully now you see why it's nice to have fixed-length fields with some room to grow into the future. 

### Exercise: `int_to_little_endian(int, size)` and `int_to_big_endian(int, size)` 

What does all this mean for our code?

In order to deal with such fields, our `int_to_little_endian` function needs to take 2 arguments -- the integer we will encode and the number of bytes our serialization should occupy.

Take a shot little-endian serialization first:

In [70]:
def int_to_little_endian(integer, length):
    raise NotImplementedError()

In [71]:
def int_to_little_endian(integer, length):
    return integer.to_bytes(length, 'little')

In [72]:
def test_int_to_little_endian():
    integer = 22
    bytes = b'\x16\x00\x00\x00\x00\x00\x00\x00\x00\x00'
    result = int_to_little_endian(integer, 10)
    assert bytes == result, f'Correct answer: {bytes}. Your answer: {result}'
    print("Test passed!")

test_int_to_little_endian()

Test passed!


Now give big-endian serialization a try:

In [73]:
def int_to_big_endian(integer, length):
    raise NotImplementedError()

In [74]:
def int_to_big_endian(integer, length):
    return integer.to_bytes(length, 'big')

In [75]:
def test_int_to_big_endian():
    integer = 22
    bytes = b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x16'
    result = int_to_big_endian(integer, 10)
    assert bytes == result, f'Correct answer: {bytes}. Your answer: {result}'
    print("Test passed!")

test_int_to_big_endian()

Test passed!


### Exercise: Update `serialize_version_payload` to serialize all integer fields integers using `little_endian_to_int` and `big_endian_to_int`

`version`, `services`, `timestamp`, `nonce`, and `start_height` are the fields we're interested in here.

In [76]:
def serialize_version_payload(
        version=70015, services=0, timestamp=None,
        receiver_address=dummy_address,
        sender_address=dummy_address,
        nonce=None, user_agent=b'/buidl-army/',
        start_height=0, relay=True):
    if timestamp is None:
        timestamp = int(time.time())
    if nonce is None:
        nonce = randint(0, 2**64)
    msg = b''
    # version
    msg += ZERO * 4
    # services
    msg += ZERO * 8
    # timestamp
    msg += ZERO * 8
    # receiver address
    msg += ZERO * 26
    # sender address
    msg += ZERO * 26
    # nonce
    msg += ZERO * 8
    # user agent
    msg += ZERO * 1 # a single zero byte is the empty varstr
    # start height
    msg += ZERO * 4
    # relay
    msg += ZERO * 1
    return msg

In [77]:
def serialize_version_payload(
        version=70015, services=0, timestamp=None,
        receiver_address=dummy_address,
        sender_address=dummy_address,
        nonce=None, user_agent=b'/buidl-bootcamp/',
        start_height=0, relay=True):
    if timestamp is None:
        timestamp = int(time.time())
    if nonce is None:
        nonce = randint(0, 2**64)
    msg = b''
    # version
    msg += int_to_little_endian(version, 4)
    # services
    msg += int_to_little_endian(services, 8)
    # timestamp
    msg += int_to_little_endian(timestamp, 8)
    # receiver address
    msg += ZERO * 26
    # sender address
    msg += ZERO * 26
    # nonce
    msg += int_to_little_endian(nonce, 8)
    # user agent
    msg += ZERO * 1 # zero byte signifies an empty varstr
    # start height
    msg += int_to_little_endian(start_height, 4)
    # relay
    msg += ZERO * 1
    return msg

In [78]:
from io import BytesIO
from lib import read_version_payload

def test_serialize_version_payload_integers():
    now = int(time.time()) - 10
    version_payload_dict = serialize_version_payload(
        services=3, nonce=4, timestamp=now, start_height=50)
    version_payload = read_version_payload(BytesIO(version_payload_dict))    
    assert version_payload['version'] == 70015
    assert version_payload['services'] == 3
    assert version_payload['timestamp'] == now
    assert version_payload['nonce'] == 4
    assert version_payload['start_height'] == 50
    print('Test passed!')
    
    
test_serialize_version_payload_integers()

Test passed!


## Services

It's all well and good that we can serialize integers. But we still don't really know how to produce the integers themselves in one case: `services`

Let's say I want to set my services bitfield to only offer support for `NODE_NETWORK` and `NODE_BLOOM`. It would currently require some tricky calculations by hand. Let's write a function that will take care of this for us:

### Exercise: `services_dict_to_int`

Here's how I would do this. I'd make a dictionary mapping `services_dict` keys to the the value of that bit:

```python
    key_to_multiplier = {
        'NODE_NETWORK': 2**0,
        'NODE_GETUTXO': 2**1,
        'NODE_BLOOM': 2**2,
        'NODE_WITNESS': 2**3,
        'NODE_NETWORK_LIMITED': 2**10,
    }
```

Then I'd look over the keys and value in `services_dict` (`for key, value in services_dict.items()`) and sum up the `services` integers with the assistance of `key_to_multiplier`.

In [19]:
def services_dict_to_int(services_dict):
    raise NotImplementedError()

In [20]:
def services_dict_to_int(services_dict):
    key_to_multiplier = {
        'NODE_NETWORK': 2**0,
        'NODE_GETUTXO': 2**1,
        'NODE_BLOOM': 2**2,
        'NODE_WITNESS': 2**3,
        'NODE_NETWORK_LIMITED': 2**10,
    }
    services_int = 0
    for key, on_or_off in services_dict.items():
        services_int += int(on_or_off) * key_to_multiplier.get(key, 0)
    return services_int

In [21]:
def test_services_dict_to_int():
    services_dict = {
        'NODE_NETWORK': True,
        'NODE_GETUTXO': False,
        'NODE_BLOOM': True,
        'NODE_WITNESS': False,
        'NODE_CASH': True,
        'NODE_NETWORK_LIMITED': True,
    }
    answer = 1 + 4 + 1024
    result = services_dict_to_int(services_dict)
    assert answer == result,\
        f'services_dict_to_int({repr(services_dict)}) should equal {answer}, was {result}'
    print('Test passed!')
    
test_services_dict_to_int()

Test passed!


### Exercise: Update `serialize_version_payload` to take `services_dict` argument

Replace the `services` argument with a `services_dict` argument.

Then use `services_dict_to_int` within the body of the function to convert `services_dict` into an integer. BTW give it a default value of the empty dictionary `{}`

In [79]:
def serialize_version_payload(
        version=70015, services=0, timestamp=None,
        receiver_address=dummy_address,
        sender_address=dummy_address,
        nonce=None, user_agent=b'/buidl-army/',
        start_height=0, relay=True):
    raise NotImplementedError()

In [80]:
def serialize_version_payload(
        version=70015, services_dict={}, timestamp=None,
        receiver_address=dummy_address,
        sender_address=dummy_address,
        nonce=None, user_agent=b'/buidl-army/',
        start_height=0, relay=True):
    if timestamp is None:
        timestamp = int(time.time())
    if nonce is None:
        nonce = randint(0, 2**64)
    msg = b''
    # version
    msg += int_to_little_endian(version, 4)
    # services
    services = services_dict_to_int(services_dict)
    msg += int_to_little_endian(services, 8)
    # timestamp
    msg += int_to_little_endian(timestamp, 8)
    # receiver address
    msg += ZERO * 26
    # sender address
    msg += ZERO * 26
    # nonce
    msg += int_to_little_endian(nonce, 8)
    # user agent
    msg += ZERO * 1 # zero byte signifies an empty varstr
    # start height
    msg += int_to_little_endian(start_height, 4)
    # relay
    msg += ZERO * 1
    return msg

In [81]:
def test_serialize_version_payload_services_dict():
    now = int(time.time()) - 10
    version_payload = serialize_version_payload(
        services_dict={'NODE_NETWORK': True, 'NODE_BLOOM': True}, 
        nonce=4, timestamp=now, start_height=50)
    version_payload = read_version_payload(BytesIO(version_payload))
    assert version_payload['version'] == 70015
    assert version_payload['services'] == 5
    assert version_payload['timestamp'] == now
    assert version_payload['nonce'] == 4
    assert version_payload['start_height'] == 50
    print('Test passed!')
    
test_serialize_version_payload_services_dict()

Test passed!


# Boolen Fields

### Exercise: `bool_to_bytes(bool)`

Write a function that will interpret a boolean as bytes

In [82]:
def bool_to_bytes(bool):
    raise NotImplementedError()

In [83]:
def bool_to_bytes(bool):
    return bytes([int(bool)])

In [84]:
def test_bytes_to_bool():
    assert bool_to_bytes(True) == b'\x01', \
        f'bool_to_bytes(False) should equal b"\\x01", was {bool_to_bytes(True)}'
    assert bool_to_bytes(False) == b'\x00',\
        f'bool_to_bytes(False) should equal b"\\x00", was {bool_to_bytes(False)}'
    print('Test passed!')
    
test_bytes_to_bool()

Test passed!


In [91]:
def serialize_version_payload(
        version=70015, services_dict={}, timestamp=None,
        receiver_address=dummy_address,
        sender_address=dummy_address,
        nonce=None, user_agent=b'/buidl-bootcamp/',
        start_height=0, relay=True):
    raise NotImplementedError()

In [208]:
def serialize_version_payload(
        version=70015, services_dict={}, timestamp=None,
        receiver_address=dummy_address,
        sender_address=dummy_address,
        nonce=None, user_agent=b'/buidl-bootcamp/',
        start_height=0, relay=True):
    if timestamp is None:
        timestamp = int(time.time())
    if nonce is None:
        nonce = randint(0, 2**64)
    msg = b''
    # version
    msg += int_to_little_endian(version, 4)
    # services
    services = services_dict_to_int(services_dict)
    msg += int_to_little_endian(services, 8)
    # timestamp
    msg += int_to_little_endian(timestamp, 8)
    # receiver address
    msg += ZERO * 26
    # sender address
    msg += ZERO * 26
    # nonce
    msg += int_to_little_endian(nonce, 8)
    # user agent
    msg += serialize_varstr(user_agent)
    # start height
    msg += int_to_little_endian(start_height, 4)
    # relay
    msg += bool_to_bytes(relay)
    return msg

In [93]:
def test_serialize_version_payload_services_dict():
    now = int(time.time()) - 10
    version_payload = serialize_version_payload(
        services_dict={'NODE_NETWORK': True, 'NODE_BLOOM': True}, 
        nonce=4, timestamp=now, start_height=50, user_agent=b'/buidl-army/')
    version_payload = read_version_payload(BytesIO(version_payload))
    assert version_payload['version'] == 70015
    assert version_payload['services'] == 5
    assert version_payload['timestamp'] == now
    assert version_payload['nonce'] == 4
    assert version_payload['start_height'] == 50
    assert version_payload['user_agent'] == b'/buidl-army/'
    assert version_payload['relay'] is True
    print('Test passed!')
    
test_serialize_version_payload_services_dict()

Test passed!


# Variable Length Fields

Next comes the `user_agent` field of type `varstr`.

Remember how `varstr` is itself composed of 2 fields, a `varint` followed n raw bytes when n is the value of the `varint` field?

![image](../images/varstr.png)

Therefore `serialize_varstr` will need to call a `serialize_varint` function. So let's implement `serizlize_varint` first.

### Exercise: `serialize_varint`

Recall the algorithm for varint serialization of some number `n`:

* If `n` is less than 253, return `n` as a single byte
* Else if `n` can fit in 2 bytes (less than 256**2) return `b'fd'`, the marker for a 2 byte varint, followed by `n` serialized in 4 little-endian bytes. 
* Else if `n` can fit in 4 bytes (less than 256**4) return `b'fe'`, the marker for a 4 byte varint, followed by `n` serialized in 4 little-endian bytes. 
* Else if `n` can fit in 8 bytes (less than 256**8) return `b'ff'`, the marker for an 8 byte varint, followed by `n` serialized in 8 little-endian bytes. 

Try to implement this. 

If you want to play around with a finished version of the function to get a better sense for how it works you can make a new cell and import this one: `from lib import serialize_varint`

In [87]:
def serialize_varint(i):
    raise NotImplementedError()

In [88]:
def serialize_varint(i):
    if i < 0xfd:
        return bytes([i])
    elif i < 256**2:
        return b'\xfd' + int_to_little_endian(i, 2)
    elif i < 256**4:
        return b'\xfe' + int_to_little_endian(i, 4)
    elif i < 256**8:
        return b'\xff' + int_to_little_endian(i, 8)
    else:
        raise RuntimeError('integer too large: {}'.format(i))


In [89]:
def test_serialize_varint():
    from lib import serialize_varint as _serialize_varint 
    numbers = [10+256**i for i in [0, 1, 3, 7]]
    for i in numbers:
        expected = _serialize_varint(i)
        result = serialize_varint(i)
        assert expected == result,\
            f'serialize_varint({i}) should return {expected}, returned {result}'

    print('Test passed!')

test_serialize_varint()

Test passed!


### Exercise: `serialize_varstr(bytes)`

There are two steps here: 
* Calculate the `varint` serialization of the length of the bytes
* Return this varint concatenated with the bytes themselves

In [31]:
def serialize_varstr(bytes):
    raise NotImplementedError()

In [32]:
def serialize_varstr(bytes):
    return serialize_varint(len(bytes)) + bytes

In [90]:
def test_serialize_varstr():
    from lib import serialize_varstr as _serialize_varstr 
    bytestrings = [
        b"hodl",
        b"A purely peer-to-peer version of electronic cash would allow online payments to be sent directly from one party to another without going through a financial institution. Digital signatures provide part of the solution, but the main benefits are lost if a trusted third party is still required to prevent double-spending. We propose a solution to the double-spending problem using a peer-to-peer network.  The network timestamps transactions by hashing them into an ongoing chain of hash-based proof-of-work, forming a record that cannot be changed without redoing the proof-of-work. The longest chain not only serves as proof of the sequence of events witnessed, but proof that it came from the largest pool of CPU power. As long as a majority of CPU power is controlled by nodes that are not cooperating to attack the network, they'll generate the longest chain and outpace attackers. The network itself requires minimal structure. Messages are broadcast on a best effort basis, and nodes can leave and rejoin the network at will, accepting the longest proof-of-work chain as proof of what happened while they were gone.",
    ]
    
    for b in bytestrings:
        answer = _serialize_varstr(b)
        result = serialize_varstr(b)
        assert result == answer,\
            f'serialize_varstr({b}) should be {answer}, was {result}'
    
    print('Test passed!')
    
test_serialize_varstr()

Test passed!


### Exercise: Update `serialize_version_payload` to serialize the `user_agent` as a `varstr`

In [91]:
def serialize_version_payload(
        version=70015, services_dict={}, timestamp=None,
        receiver_address=dummy_address,
        sender_address=dummy_address,
        nonce=None, user_agent=b'/buidl-bootcamp/',
        start_height=0, relay=True):
    raise NotImplementedError()

In [92]:
def serialize_version_payload(
        version=70015, services_dict={}, timestamp=None,
        receiver_address=dummy_address,
        sender_address=dummy_address,
        nonce=None, user_agent=b'/buidl-bootcamp/',
        start_height=0, relay=True):
    if timestamp is None:
        timestamp = int(time.time())
    if nonce is None:
        nonce = randint(0, 2**64)
    msg = b''
    # version
    msg += int_to_little_endian(version, 4)
    # services
    services = services_dict_to_int(services_dict)
    msg += int_to_little_endian(services, 8)
    # timestamp
    msg += int_to_little_endian(timestamp, 8)
    # receiver address
    msg += ZERO * 26
    # sender address
    msg += ZERO * 26
    # nonce
    msg += int_to_little_endian(nonce, 8)
    # user agent
    msg += serialize_varstr(user_agent)
    # start height
    msg += int_to_little_endian(start_height, 4)
    # relay
    msg += bool_to_bytes(relay)
    return msg

In [93]:
def test_serialize_version_payload_services_dict():
    now = int(time.time()) - 10
    version_payload = serialize_version_payload(
        services_dict={'NODE_NETWORK': True, 'NODE_BLOOM': True}, 
        nonce=4, timestamp=now, start_height=50, user_agent=b'/buidl-army/')
    version_payload = read_version_payload(BytesIO(version_payload))
    assert version_payload['version'] == 70015
    assert version_payload['services'] == 5
    assert version_payload['timestamp'] == now
    assert version_payload['nonce'] == 4
    assert version_payload['start_height'] == 50
    assert version_payload['user_agent'] == b'/buidl-army/'
    print('Test passed!')
    
test_serialize_version_payload_services_dict()

Test passed!


# "Network Address" Type

![image](../images/network-address.png)

Recall that network addresses are composed of 4 fields: `timestamp`, `services`, `ip` and a `port`. The latter three are always present, but the `timestamp` is present in all circumstances except version messages -- so our serialization code will need a flag for this just like our deserialization code had.

Here's how we deserialized network messages in the last lesson:

In [39]:
def read_address(stream, timestamp):
    r = {}
    if timestamp:
        r["timestamp"] = little_endian_to_int(stream.read(4))
    r["services"] = little_endian_to_int(stream.read(8))
    r["ip"] = bytes_to_ip(stream.read(16))
    r["port"] = big_endian_to_int(stream.read(2))
    return r

This takes a byte stream and a `has_timestamp` flag and returns a dictionary containing all the informtaion about this address. Once again, we need to do the opposite.

Honestly, this lesson is running a little long and this code won't be terribly useful to us. I think you get the idea. I'm just going to give you this one:

In [40]:
import socket

IPV4_PREFIX = b"\x00" * 10 + b"\x00" * 2

def ip_to_bytes(ip):
    if ":" in ip:
        return socket.inet_pton(socket.AF_INET6, ip)
    else:
        return IPV4_PREFIX + socket.inet_pton(socket.AF_INET, ip)

def serialize_address(address, has_timestamp):
    result = b""
    if has_timestamp:
        result += int_to_little_endian(address['timestamp'], 8)
    result += int_to_little_endian(address['services'], 8)
    result += ip_to_bytes(address['ip'])
    result += int_to_big_endian(address['port'], 2)
    return result

### Exercise: Update `serialize_version_payload` to serialize network addresses

Remember, in the context of the version message the `has_timestamp` flag should be set to false.

In [94]:
def serialize_version_payload(
        version=70015, services_dict={}, timestamp=None,
        receiver_address=dummy_address,
        sender_address=dummy_address,
        nonce=None, user_agent=b'/buidl-bootcamp/',
        start_height=0, relay=True):
    raise NotImplementedError()

In [209]:
def serialize_version_payload(
        version=70015, services_dict={}, timestamp=None,
        receiver_address=dummy_address,
        sender_address=dummy_address,
        nonce=None, user_agent=b'/buidl-bootcamp/',
        start_height=0, relay=True):
    if timestamp is None:
        timestamp = int(time.time())
    if nonce is None:
        nonce = randint(0, 2**64)
    msg = b''
    # version
    msg += int_to_little_endian(version, 4)
    # services
    services = services_dict_to_int(services_dict)
    msg += int_to_little_endian(services, 8)
    # timestamp
    msg += int_to_little_endian(timestamp, 8)
    # receiver address
    msg += serialize_address(receiver_address, has_timestamp=False)
    # sender address
    msg += serialize_address(sender_address, has_timestamp=False)
    # nonce
    msg += int_to_little_endian(nonce, 8)
    # user agent
    msg += serialize_varstr(user_agent)
    # start height
    msg += int_to_little_endian(start_height, 4)
    # relay
    msg += bool_to_bytes(relay)
    return msg

In [96]:
def test_serialize_version_payload_services_dict():
    now = int(time.time()) - 10
    version_payload = serialize_version_payload(
        services_dict={'NODE_NETWORK': True, 'NODE_BLOOM': True}, 
        nonce=4, timestamp=now, start_height=50, user_agent=b'/buidl-army/')
    version_payload = read_version_payload(BytesIO(version_payload))
    assert version_payload['version'] == 70015
    assert version_payload['services'] == 5
    assert version_payload['timestamp'] == now
    assert version_payload['receiver_address'] == dummy_address
    assert version_payload['sender_address'] == dummy_address
    assert version_payload['nonce'] == 4
    assert version_payload['start_height'] == 50
    assert version_payload['user_agent'] == b'/buidl-army/'

    print('Test passed!')
    
test_serialize_version_payload_services_dict()

Test passed!


# Serializing Messages

We can completely serialize version message payloads. Yay!

Now we just need to serialize the outer message structure itself. This will just be reversing the `read_message` function we created in the first lesson:

In [104]:
# To refresh your memory ...

def read_message(stream):
    msg = {}
    magic = stream.read(4)
    if magic != NETWORK_MAGIC:
        raise Exception(f'Magic is wrong: {magic}')
    msg['command'] = stream.read(12).strip(b'\x00')
    payload_length = int.from_bytes(stream.read(4), 'little')
    checksum = stream.read(4)
    raw_payload = stream.read(payload_length)
    calculated_checksum = double_sha256(raw_payload)[:4]
    if calculated_checksum != checksum:
        raise Exception('Checksum does not match')
    # FIXME: read_payload didn't exist last lesson ...
    msg['payload'] = read_payload(msg['command'], BytesIO(raw_payload))
    return msg

Here's an outline of how an inverse `serialize_message` function might work:

In [105]:
def serialize_message(command, payload):
    result = b'magic'
    result += b'command'
    result += b'payload length'
    result += b'checksum'
    result += b'payload'
    return result

First, fill in the network magic. Try not to look it up in the previous lesson code. Pretend that doesn't exist. Instead go to the wiki and figure out how to interpret what the wiki says into python.

In [106]:
def serialize_message(command, payload):
    result = b'\xf9\xbe\xb4\xd9'
    result += b'command'
    result += b'payload length'
    result += b'checksum'
    result += b'payload'
    return result

In [107]:
def test():
    nm = bytes([249, 190, 180, 217])
    assert read_message(s) == nm

Next, encode the payload. Remember that it needs to be a fixed length (look up the exact length in the wiki) and you should right-pad with zero bytes (`b\x00`) until that length is reached

In [108]:
def serialize_message(command, payload):
    result = b'\xf9\xbe\xb4\xd9'
    result += command + b'\x00' * (12 - len(command))
    result += b'payload length'
    result += b'checksum'
    result += b'payload'
    return result

After this we need to encode the payload length. 

Hints: 
* the `payload` argument is a dictionary. To get the payloads you need to call `serialize_payload` on that payload dictionary. 
    * FIXME this function doesn't exist in this notebook
    * we could define it -- but kinda pointless since we only know about 1 message type at this point ...
* You'll want to take the `len` of ^^, then encode it as the correct number of bytes. Make sure you consider endianness!

In [None]:
def serialize_empty_payload(**kwargs):
    return b""


def serialize_payload(command, payload):
    command_to_handler = {
        b"version": serialize_version_payload,
        b"verack": serialize_empty_payload,
    }
    handler = command_to_handler[command]
    return handler(**payload)

In [121]:
def serialize_message(command, payload):
    result = b'\xf9\xbe\xb4\xd9'
    result += command + b'\x00' * (12 - len(command))
    result += int_to_little_endian(len(payload), 4)
    result += b'checksum'
    result += b'payload'
    return result

Second to last comes the checksum. 

I suggest you get the test to pass using the `compute_checksum` function I imported for you.

Once you get that working, try to implement the `compute_checksum` yourself. It's good practice to remember how it works!

In [149]:
import hashlib
from lib import compute_checksum

# Once you get this working with the imported `compute_checksum`,
# Try implementing yourself here:
# def compute_checksum(bytes):
#     raise NotImplementedError()
    
# def compute_checksum(bytes):
#     first_round = hashlib.sha256(bytes).digest()
#     second_round = hashlib.sha256(first_round).digest()
#     return second_round[:4]

def serialize_message(command, payload):
    result = b'\xf9\xbe\xb4\xd9'
    result += command +  b'\x00' * (12 - len(command))
    result += int_to_little_endian(len(payload), 4)
    result += compute_checksum(payload)
    result += b'payload'
    return result

Lastly, add the payload

In [167]:
def serialize_message(command, payload=b''):
    result = b'\xf9\xbe\xb4\xd9'
    result += command + b'\x00' * (12 - len(command))
    result += int_to_little_endian(len(payload), 4)
    result += compute_checksum(payload)
    result += payload
    return result

# Verack

In order to complete the [version handshake](https://en.bitcoin.it/wiki/Version_Handshake), we also need to read and write verack messages.

This is going to be easy, though -- verack messages don't contain any payload.

To read a verack message it suffices to just call `read_message(stream)` which will return `{"command": b"verack": "payload": b""}`. There's nothing left to do at this point. 

To serialize a verack message, we'd simply need to call `serialize_message(command=b"verack")` -- this would produce raw bytes that could be send over a socket. We don't pass in anything for `payload_dict` because veracks don't have a payload and the `payload_dict` parameter in `serialize_message` defaults to the empty dictionary.

With this out of the way, it's time for the main event ...

# The Handshake

We're finally ready. Fill out this function which will connect to a bitcoin peer and open a connection with them:

In [210]:
from pprint import pprint

def handshake(address):
    sock = socket.create_connection(address, timeout=1)
    stream = sock.makefile("rb")

    # Step 1: our version message
    sock.sendall("OUR VERSION MESSAGE")
    print("Sent version")

    # Step 2: their version message
    peer_version = "READ THEIR VERSION MESSAGE HERE"
    print("Version: ")
    pprint(peer_version)

    # Step 3: their version message
    peer_verack = "READ THEIR VERACK MESSAGE HERE"
    print("Verack: ", peer_verack)

    # Step 4: our verack
    sock.sendall("OUR VERACK HERE")
    print("Sent verack")

    return sock, stream

In [157]:
V = b'\xf9\xbe\xb4\xd9version\x00\x00\x00\x00\x00j\x00\x00\x00\x9b"\x8b\x9e\x7f\x11\x01\x00\x0f\x04\x00\x00\x00\x00\x00\x00\x93AU[\x00\x00\x00\x00\x0f\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0f\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00rV\xc5C\x9b:\xea\x89\x14/some-cool-software/\x01\x00\x00\x00\x01'


print(read_version_payload())

TypeError: read_version_payload() missing 1 required positional argument: 'stream'

In [221]:
from lib import read_message

def handshake(address):
    sock = socket.create_connection(address)
    stream = sock.makefile("rb")

    # Step 1: our version message
    version_payload = serialize_version_payload(user_agent=b'/finally/')
    our_version = serialize_message(command=b"version", 
                                    payload=version_payload)

    sock.sendall(our_version)
    print("Sent version")

    # Step 2: their version message
    peer_version = read_message(stream)
    print("Version: ")
    pprint(read_version_payload(BytesIO(peer_version['payload'])))

    # Step 3: their version message
    peer_verack = read_message(stream)
    print("Verack: ", peer_verack)

    # Step 4: our verack
    our_verack = serialize_message(command=b"verack")
    sock.sendall(our_verack)
    print("Sent verack")

    return sock, stream

In [222]:
ADDRESS = ("46.19.137.74", 8333)

sock = handshake(ADDRESS)

Sent version
Version: 
{'nonce': 7623716393377179586,
 'receiver_address': {'ip': '::ffff:104.5.61.4', 'port': 57134, 'services': 0},
 'relay': 1,
 'sender_address': {'ip': '0.0.0.0', 'port': 0, 'services': 1037},
 'services': 1037,
 'start_height': 564914,
 'timestamp': 1551293522,
 'user_agent': b'/Satoshi:0.17.1(LearnMeABitcoin)/',
 'version': 70015}
Verack:  {'command': b'verack', 'payload': b''}
Sent verack


There's no unittest accompanied here. You'll know it's working if it prints something like: 

```
FIXME
```

Another, even more fun way to test it is to write a function that listens on the stream returned by `handshake` and prints out every message received:

In [215]:
from lib import read_message

def listen(address):
    sock, stream = handshake(address)
    while True:
        message = read_message(stream)
        print(f'Received message "{message["command"]}"')

In [216]:
listen(ADDRESS)

Sent version
Version: 
{'command': b'version',
 'payload': {'nonce': 112871403585732136,
             'receiver_address': {'ip': '::ffff:104.5.61.4',
                                  'port': 45710,
                                  'services': 0},
             'relay': 1,
             'sender_address': {'ip': '0.0.0.0', 'port': 0, 'services': 1037},
             'services': 1037,
             'start_height': 564911,
             'timestamp': 1551292131,
             'user_agent': b'/Satoshi:0.17.1(LearnMeABitcoin)/',
             'version': 70015}}
Verack:  {'command': b'verack', 'payload': {}}
Sent verack
Received message "b'sendheaders'"
Received message "b'sendcmpct'"
Received message "b'sendcmpct'"
Received message "b'ping'"
Received message "b'addr'"
Received message "b'inv'"
Received message "b'inv'"
Received message "b'inv'"
Received message "b'inv'"
Received message "b'inv'"
Received message "b'inv'"
Received message "b'inv'"
Received message "b'inv'"
Received message "b'inv'"

Exception: Magic is wrong: b''

In [207]:
version_payload = serialize_version_payload(user_agent=b'/finally/')
second = serialize_message(command=b"version", 
                                payload=version_payload)
first = b'\xf9\xbe\xb4\xd9version\x00\x00\x00\x00\x00j\x00\x00\x00\x9b"\x8b\x9e\x7f\x11\x01\x00\x0f\x04\x00\x00\x00\x00\x00\x00\x93AU[\x00\x00\x00\x00\x0f\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0f\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00rV\xc5C\x9b:\xea\x89\x14/some-cool-software/\x01\x00\x00\x00\x01'

print(read_message(BytesIO(first)))

print(read_message(BytesIO(second)))

{'command': b'version', 'payload': {'version': 70015, 'services': 1039, 'timestamp': 1532314003, 'receiver_address': {'services': 1039, 'ip': '0.0.0.0', 'port': 0}, 'sender_address': {'services': 1039, 'ip': '0.0.0.0', 'port': 0}, 'nonce': 9937819966277768818, 'user_agent': b'/some-cool-software/', 'start_height': 1, 'relay': 1}}
{'command': b'version', 'payload': {'version': 70015, 'services': 0, 'timestamp': 1551290679, 'receiver_address': {'services': 0, 'ip': '0.0.0.0', 'port': 8333}, 'sender_address': {'services': 0, 'ip': '0.0.0.0', 'port': 8333}, 'nonce': 2654525627436538055, 'user_agent': b'/finally/', 'start_height': 0, 'relay': 0}}


Hopefully this exercise gives you a bit of a sense of what a full node does.

It starts by establishing connections to a few peers. It sends messages over these sockets to request data it is interested in, and has listeners like the one above which read messages one-by-one and route them to handler functions that can handle messages of each different type.

In the next lesson we'll extend this "listener" idea into a crawler. Whenever we connect to a new node we will request they send a list of their peers. Once we receive this message, we'll add the addresses to a queue. In this way we will ask ever address in the network about its peers and 

## Exercise: first hack at `serialize_address`

### Exercise: `ip_to_bytes`

### Exercise: finish `serialize_address`

??? or should we just do `ip_to_bytes` and then the entirety of `serialize_address` 


# Serializing Messages

### Exercise: `serialize_command`

* or should i give them `serialize_command`, `compute_checksum`, `NETWORK_MAGIC` etc and ask them to put it together
    * they should probably implement `serialize_command`
    
### Exercise: `handshake`

reminder how to send bytes

sequence of messages

exercise:
* 