In [15]:
# FIXME: I'm probably leaking my IP address all over the place ...

In [10]:
%load_ext autoreload
%autoreload 2

# Reading Version Message Payloads

In the last lesson we encountered the Bitcoin protocol's [Version Handshake](https://en.bitcoin.it/wiki/Version_Handshake). We saw how Bitcoin network peers will only converse with us if we first introduce ourselves with a `version` message.

But _we cheated_. I gave you a serialized `version` message and didn't tell you how I created it.

_We were lazy_: we didn't parse the cryptic `payload` of the `version` message that our peer sent us.

_We were rude_! After listening for our peer's `version` message we stopped listening and never received or responded to their `verack` message -- completing the handshake. Our peer was left hanging ...

So you see, we have much to fix!

### Housekeeping

Last time we created a `NetworkEnvelope` class. I'm going to throw that away and use functions and dictionaries instead. Simpler this way!

Here's where we are:

In [8]:
from hashlib import sha256

NETWORK_MAGIC = b'\xf9\xbe\xb4\xd9'

def double_sha256(s):
    return sha256(sha256(s).digest()).digest()

def read_message(stream):
    """ payload attributes at top level """
    msg = {}
    magic = stream.read(4)
    if magic != NETWORK_MAGIC:
        raise Exception(f'Magic is wrong: {magic}')
    msg['command'] = stream.read(12).strip(b'\x00')
    payload_length = int.from_bytes(stream.read(4), 'little')
    checksum = stream.read(4)
    msg['payload'] = stream.read(payload_length)
    calculated_checksum = double_sha256(msg['payload'])[:4]
    if calculated_checksum != checksum:
        raise Exception('Checksum does not match')
    return msg


In [9]:
import socket

# magic "version" bytestring
VERSION = b'\xf9\xbe\xb4\xd9version\x00\x00\x00\x00\x00j\x00\x00\x00\x9b"\x8b\x9e\x7f\x11\x01\x00\x0f\x04\x00\x00\x00\x00\x00\x00\x93AU[\x00\x00\x00\x00\x0f\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0f\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00rV\xc5C\x9b:\xea\x89\x14/some-cool-software/\x01\x00\x00\x00\x01'

sock = socket.socket()
sock.connect(("35.198.151.21", 8333))
stream = sock.makefile('rb')

# initiate the "version handshake"
sock.send(VERSION)

# receive their "version" response
msg = read_message(stream)

print(msg)
print(msg['command'])
print(msg['payload'])

{'command': b'version', 'payload': b'\x7f\x11\x01\x00\r\x04\x00\x00\x00\x00\x00\x0028j\\\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffFqPG\xa8\xc6\r\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00!\xf8\xe8\xff\xceL+s\x10/Satoshi:0.16.3/U\x99\x08\x00\x01'}
b'version'
b'\x7f\x11\x01\x00\r\x04\x00\x00\x00\x00\x00\x0028j\\\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffFqPG\xa8\xc6\r\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00!\xf8\xe8\xff\xceL+s\x10/Satoshi:0.16.3/U\x99\x08\x00\x01'



The payload of our message is still `bytes`. We need to decode the payload in the same way that we decoded the outer message structure itself. In order to do will apply exactly the same skills we started to learn last time in order to 

# Objective: Interpret the Message Payload

Here's how payloads look now:

```
b'\x7f\x11\x01\x00\r\x04\x00\x00\x00\x00\x00\x0028j\\\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffFqPG\xa8\xc6\r\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00!\xf8\xe8\xff\xceL+s\x10/Satoshi:0.16.3/U\x99\x08\x00\x01'
```

In this lesson we'll learn to interpret these bytes as a dictionary like this:

```
{'latest_block': 563541,
 'nonce': 8298811190300702753,
 'receiver_address': {'ip': '::ffff:70.113.80.71',
                      'port': 43206,
                      'services': 0},
 'relay': 1,
 'sender_address': {'ip': '0.0.0.0', 'port': 0, 'services': 1037},
 'services': 1037,
 'timestamp': 1550465074,
 'user_agent': b'/Satoshi:0.16.3/',
 'version': 70015}
```

In [14]:
# FIXME: kill

from io import BytesIO
from pprint import pprint
from proto import read_version_payload

pprint(read_version_payload(BytesIO(msg['payload'])))

{'latest_block': 563541,
 'nonce': 8298811190300702753,
 'receiver_address': {'ip': '::ffff:70.113.80.71',
                      'port': 43206,
                      'services': 0},
 'relay': 1,
 'sender_address': {'ip': '0.0.0.0', 'port': 0, 'services': 1037},
 'services': 1037,
 'timestamp': 1550465074,
 'user_agent': b'/Satoshi:0.16.3/',
 'version': 70015}


### The Payload

Our next task is to parse this payload. Besides the "/Satoshi:0.16.3/" -- clearly a user agent -- the payload bytes of the example above have no ASCII meaning.

But have no fear -- we will decode the message payload in the same manner as we decoded the overall message structure in our `read_message` method.

[This chart](https://en.bitcoin.it/wiki/Protocol_documentation#version) from the protocol documentation will act as our blueprint:

![image](../images/version-message.png)

### Old Types

Here we encounter some "types" we are now familiar with from the first lesson -- `int32_t` / `uint64_t` / `int64_t` -- which are different types in a "low-level" language like C++, but are all equivalent to the `int` type in Python. Our previously implemented `bytes_to_int` (FIXME) can handle these just fine.

### New Types

But we also encounter some new types: `net_addr`, `varstr`, and `bool`. 

Even worse, if we click on the [`varstr` link](https://en.bitcoin.it/wiki/Protocol_documentation#Variable_length_string) we see that it contains one additional type: `varint`. 

Worse still, the [`net_addr` link](https://en.bitcoin.it/wiki/Protocol_documentation#Network_address) contains `time`, `services`, `ip` and `port` fields nominally of types `uint32`, `uint64_t`, `uint16_t` and `char[16]` but in order for us to make sense of what they hell them mean each requires parsing: the `time` integer as a Unix timestamp, the `services` integer as a "bitfield" (whatever that is!), and IP address can be either IPv4 or IPv6 and our code must be able to tell the difference! Also, the `port` field is "network byte-order" according to the wiki.

Oh, and remember how I mentioned that Satoshi usually, but not always, encoded his integers in "little endian" byte order (least significant digits is on the left)? Well, the `port` attribute of `net_addr` is encoded "big endian", where the *most* significant digit is on the left. Yes, the exact opposite of everything else!!!

- [ ] Convert version message `bytes` to dictionary with appropriate keys and uninterpreted `bytes` values
- [ ] Interpret Integer fields
    - [ ] interpret `relay`
    - [ ] interpret `timestamp`
    - [ ] interpret `services`
- [ ] Interpret variable-length fields
- [ ] 

Hunker down for a looooooong lesson!

### Exercise: `read_version_payload`

Write a function `read_version_payload` which takes as an argument a stream containing the payload of a version message. The first eight bytes represent the version, and so on as described in the table above.

Your function should return a dictionary with keys equal to the Version Message attributes and values equal to the uninterpreted bytes corresponding to that key:

```
{
    'version': <raw version bytes>,
    'services': <raw services bytes>,
    'timestamp': <raw timestamp bytes>,
    'receiver_address': <raw network address bytes>,
    'sender_address': <raw network address bytes>,
    'nonce': <raw nonce bytes>,
    'user_agent': <raw user agent bytes>,
    'start_height': <raw start height bytes>,
    'relay': <raw relay bytes>,
}
```

Throughout the lesson we will slowly add functions to interpret the raw bytes left uninterpreted in this function

This exercise is simply asking you to read the correct number of bytes for every field and store these bytes under the correct key (listed ^^)

In [None]:
# FIXME: justins_read_varstr is shitty

def read_version_payload(stream):
    # We will build up this dictionary as we go
    r = {}
    
    # First read the 4 byte `version` number and save to the r['version'] key 
    r['version'] = stream.read(4)
    
    # Your turn: follow this pattern to fill in the "timestamp", "receiver_address", "sender_address", and "nonce" fields
    
    # I will do the "user_agent" attribute for you. You will re-implement later ...
    r['user_agent'] = justins_read_varstr(stream)
    
    # Your turn: Fill out the remaining "start_height" and "relay" attributes
    
    # Return the dictionary we've assembled
    return r
    
    

In [40]:
from proto import read_varint

def justins_read_varstr(stream):
    return stream.read(read_varint(stream))

def read_version_payload(stream):
    # We will build up this dictionary as we go
    r = {}
    
    # First read the 4 byte `version` number and save to the r['version'] key 
    r['version'] = stream.read(4)
    
    # Your turn: follow this pattern to fill in the "timestamp", "receiver_address", "sender_address", and "nonce" fields
    r['services'] = stream.read(8)
    r['timestamp'] = stream.read(8)
    r['receiver_address'] = stream.read(26)
    r['sender_address'] = stream.read(26)
    r['nonce'] = stream.read(8)
    
    # I will do the "user_agent" attribute for you. You will re-implement later ...
    r['user_agent'] = justins_read_varstr(stream)
    
    # Your turn: Fill out the remaining "start_height" and "relay" attributes
    r['start_height'] = stream.read(4)
    r['relay'] = stream.read(1)
    
    r['foo'] = 1
    
    # Return the dictionary we've assembled
    return r
    
    

In [None]:
VERSION = b'\x7f\x11\x01\x00\r\x04\x00\x00\x00\x00\x00\x0028j\\\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffFqPG\xa8\xc6\r\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00!\xf8\xe8\xff\xceL+s\x10/Satoshi:0.16.3/U\x99\x08\x00\x01'

def assert_len(payload, field, expected_len):
    observed_len = len(payload[field])
    assert observed_len == expected_len,\
        f'The "{field}" field should be {expected_len} bytes, was {observed_len} bytes'

def test_read_version_payload_initial():
    stream = BytesIO(VERSION)
    payload = read_version_payload(stream)
    
    # FIXME: make assertions on any absent keys
    observed_keys = set(payload.keys())
    expected_keys = set(['version', 'services', 'timestamp', 'receiver_address', 'sender_address', 
                         'nonce', 'user_agent', 'start_height', 'relay'])
    missing_keys = expected_keys - observed_keys
    extra_keys = observed_keys - expected_keys
    
    assert not missing_keys, f"The following keys were missing: {missing_keys}"
    assert not extra_keys, f"Encountered unexpected key(s): {extra_keys}"
    
    # FIXME: make assertions on any extra keys
    
    # Values are correct sizes
    assert_len(payload, 'version', 4)
    assert_len(payload, 'services', 8)
    assert_len(payload, 'timestamp', 8)
    assert_len(payload, 'receiver_address', 26)
    assert_len(payload, 'sender_address', 26)
    assert_len(payload, 'nonce', 8)
    assert_len(payload, 'start_height', 4)
    assert_len(payload, 'relay', 1)
    
    print("Test passed!")
    
test_read_version_payload_initial()

In [33]:
set(list('abc')) - set(list('bc'))

{'a'}

In [24]:
assert_len({'a': '123'}, 'a', 2)

AssertionError: The "a" field should be 2 bytes, was 3 bytes

## Integer fields



### Exercise: `read_int`


### Exercise: Call `read_int` in appropriate places

(version, services, nonce, start_height)


### Exercise: `read_bool`



### Exercise: Call `read_bool` in appropriate places

(relay)

## Timestamps

In a few different places we are faced with the same decision: an integer or other type has a special meaning. Should we leave it as an integer and define a special "accessor" function to translate to its special meaning, or should we convert it to the special meaning outright.

First, leaving stuff as ints requires less work and adds less complexity so we will favor that going forward. But we'll define a library of functions 

Given that our first application of this code will be to write a crawler that saves everything to a SQL database, it would be very nice if we can leave our data in some kind of universal format like an integer.

### Exercise: `read_timestamp`


### Exercise: Call `read_timestamp` in appropriate places


## Variable Length Fields


### Exercise: `read_varint`


### Exercise: `read_varstr`


### Exercise: Call `read_varstr` in appropriate places


## Services



In [None]:
def test_read_version_payload_final():
    
    # TODO: copy paste all the prior test code here so that it 
    
    # Or maybe just call all the prior functions here???
    
    pass