In [None]:
%load_ext autoreload
%autoreload 2

# Reading Version Message Payloads

In the last lesson we encountered the Bitcoin protocol's [Version Handshake](https://en.bitcoin.it/wiki/Version_Handshake). We saw how Bitcoin network peers will only converse with us if we first introduce ourselves with a `version` message.

But _we cheated_. I gave you a serialized `version` message and didn't tell you how I created it.

_We were lazy_: we didn't parse the cryptic `payload` of the `version` message that our peer sent us.

_We were rude_! After listening for our peer's `version` message we stopped listening and never received or responded to their `verack` message -- completing the handshake. Our peer was left hanging ...

So you see, we have much to fix!

### Housekeeping

Last time we created a `NetworkEnvelope` class. I'm going to throw that away and use functions and dictionaries instead. Simpler this way!

Here's where we are:

In [None]:
from hashlib import sha256

NETWORK_MAGIC = b'\xf9\xbe\xb4\xd9'

def double_sha256(s):
    return sha256(sha256(s).digest()).digest()

def read_message(stream):
    """ payload attributes at top level """
    msg = {}
    magic = stream.read(4)
    if magic != NETWORK_MAGIC:
        raise Exception(f'Magic is wrong: {magic}')
    msg['command'] = stream.read(12).strip(b'\x00')
    payload_length = int.from_bytes(stream.read(4), 'little')
    checksum = stream.read(4)
    msg['payload'] = stream.read(payload_length)
    calculated_checksum = double_sha256(msg['payload'])[:4]
    if calculated_checksum != checksum:
        raise Exception('Checksum does not match')
    return msg


In [None]:
import socket

# magic "version" bytestring
VERSION = b'\xf9\xbe\xb4\xd9version\x00\x00\x00\x00\x00j\x00\x00\x00\x9b"\x8b\x9e\x7f\x11\x01\x00\x0f\x04\x00\x00\x00\x00\x00\x00\x93AU[\x00\x00\x00\x00\x0f\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0f\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00rV\xc5C\x9b:\xea\x89\x14/some-cool-software/\x01\x00\x00\x00\x01'

sock = socket.socket()
sock.connect(("35.198.151.21", 8333))
stream = sock.makefile('rb')

# initiate the "version handshake"
sock.send(VERSION)

# receive their "version" response
msg = read_message(stream)

print(msg)
print(msg['command'])
print(msg['payload'])


The payload of our message is still `bytes`. We need to decode the payload in the same way that we decoded the outer message structure itself. In order to do will apply exactly the same skills we started to learn last time in order to 

# Objective: Interpret the Message Payload

Here's how payloads look now:

```
b'\x7f\x11\x01\x00\r\x04\x00\x00\x00\x00\x00\x0028j\\\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffFqPG\xa8\xc6\r\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00!\xf8\xe8\xff\xceL+s\x10/Satoshi:0.16.3/U\x99\x08\x00\x01'
```

In this lesson we'll learn to interpret these bytes as a dictionary like this:

```
{'latest_block': 563541,
 'nonce': 8298811190300702753,
 'receiver_address': {'ip': '::ffff:70.113.80.71',
                      'port': 43206,
                      'services': 0},
 'relay': 1,
 'sender_address': {'ip': '0.0.0.0', 'port': 0, 'services': 1037},
 'services': 1037,
 'timestamp': 1550465074,
 'user_agent': b'/Satoshi:0.16.3/',
 'version': 70015}
```

In [None]:
# FIXME: kill

from io import BytesIO
from pprint import pprint
from proto import read_version_payload

pprint(read_version_payload(BytesIO(msg['payload'])))

### The Payload

Our next task is to parse this payload. Besides the "/Satoshi:0.16.3/" -- clearly a user agent -- the payload bytes of the example above have no ASCII meaning.

But have no fear -- we will decode the message payload in the same manner as we decoded the overall message structure in our `read_message` method.

[This chart](https://en.bitcoin.it/wiki/Protocol_documentation#version) from the protocol documentation will act as our blueprint:

![image](../images/version-message.png)

### Old Types

Here we encounter some "types" we are now familiar with from the first lesson -- `int32_t` / `uint64_t` / `int64_t` -- which are different types in a "low-level" language like C++, but are all equivalent to the `int` type in Python. Our previously implemented `bytes_to_int` (FIXME) can handle these just fine.

### New Types

But we also encounter some new types: `net_addr`, `varstr`, and `bool`. 

Even worse, if we click on the [`varstr` link](https://en.bitcoin.it/wiki/Protocol_documentation#Variable_length_string) we see that it contains one additional type: `varint`. 

Worse still, the [`net_addr` link](https://en.bitcoin.it/wiki/Protocol_documentation#Network_address) contains `time`, `services`, `ip` and `port` fields nominally of types `uint32`, `uint64_t`, `uint16_t` and `char[16]` but in order for us to make sense of what they hell them mean each requires parsing: the `time` integer as a Unix timestamp, the `services` integer as a "bitfield" (whatever that is!), and IP address can be either IPv4 or IPv6 and our code must be able to tell the difference! Also, the `port` field is "network byte-order" according to the wiki.

Oh, and remember how I mentioned that Satoshi usually, but not always, encoded his integers in "little endian" byte order (least significant digits is on the left)? Well, the `port` attribute of `net_addr` is encoded "big endian", where the *most* significant digit is on the left. Yes, the exact opposite of everything else!!!

- [ ] Convert version message `bytes` to dictionary with appropriate keys and uninterpreted `bytes` values
- [ ] Interpret Integer fields
    - [ ] interpret `relay`
    - [ ] interpret `timestamp`
    - [ ] interpret `services`
- [ ] Interpret variable-length fields
- [ ] 

Hunker down for a looooooong lesson!

### Exercise: `read_version_payload`

Write a function `read_version_payload` which takes as an argument a stream containing the payload of a version message. The first eight bytes represent the version, and so on as described in the table above.

Your function should return a dictionary with keys equal to the Version Message attributes and values equal to the uninterpreted bytes corresponding to that key:

```
{
    'version': <raw version bytes>,
    'services': <raw services bytes>,
    'timestamp': <raw timestamp bytes>,
    'receiver_address': <raw network address bytes>,
    'sender_address': <raw network address bytes>,
    'nonce': <raw nonce bytes>,
    'user_agent': <raw user agent bytes>,
    'start_height': <raw start height bytes>,
    'relay': <raw relay bytes>,
}
```

Throughout the lesson we will slowly add functions to interpret the raw bytes left uninterpreted in this function

This exercise is simply asking you to read the correct number of bytes for every field and store these bytes under the correct key (listed ^^)

In [None]:
VERSION_PAYLOAD = b'\x7f\x11\x01\x00\r\x04\x00\x00\x00\x00\x00\x0028j\\\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffFqPG\xa8\xc6\r\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00!\xf8\xe8\xff\xceL+s\x10/Satoshi:0.16.3/U\x99\x08\x00\x01'

In [None]:
# FIXME: justins_read_varstr is shitty

def read_version_payload(stream):
    # We will build up this dictionary as we go
    r = {}
    
    # First read the 4 byte `version` number and save to the r['version'] key 
    r['version'] = stream.read(4)
    
    # Your turn: follow this pattern to fill in the "timestamp", "receiver_address", "sender_address", and "nonce" fields
    
    # I will do the "user_agent" attribute for you. You will re-implement later ...
    r['user_agent'] = justins_read_varstr(stream)
    
    # Your turn: Fill out the remaining "start_height" and "relay" attributes
    
    # Return the dictionary we've assembled
    return r
    
    

In [None]:
def justins_read_varstr(stream):
    from proto import read_varint
    return stream.read(read_varint(stream))

def read_version_payload(stream):
    # We will build up this dictionary as we go
    r = {}
    
    # First read the 4 byte `version` number and save to the r['version'] key 
    r['version'] = stream.read(4)
    
    # Your turn: follow this pattern to fill in the "timestamp", "receiver_address", "sender_address", and "nonce" fields
    r['services'] = stream.read(8)
    r['timestamp'] = stream.read(8)
    r['receiver_address'] = stream.read(26)
    r['sender_address'] = stream.read(26)
    r['nonce'] = stream.read(8)
    
    # I will do the "user_agent" attribute for you. You will re-implement later ...
    r['user_agent'] = justins_read_varstr(stream)
    
    # Your turn: Fill out the remaining "start_height" and "relay" attributes
    r['start_height'] = stream.read(4)
    r['relay'] = stream.read(1)
        
    # Return the dictionary we've assembled
    return r
    
    

In [None]:
# FIXME: should I import the unittests?

VERSION_PAYLOAD = b'\x7f\x11\x01\x00\r\x04\x00\x00\x00\x00\x00\x0028j\\\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffFqPG\xa8\xc6\r\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00!\xf8\xe8\xff\xceL+s\x10/Satoshi:0.16.3/U\x99\x08\x00\x01'

def assert_len(payload, field, expected_len):
    observed_len = len(payload[field])
    assert observed_len == expected_len,\
        f'The "{field}" field should be {expected_len} bytes, was {observed_len} bytes'

def test_read_version_payload_initial():
    stream = BytesIO(VERSION_PAYLOAD)
    payload = read_version_payload(stream)

    # Dictionary keys
    observed_keys = set(payload.keys())
    expected_keys = set(['version', 'services', 'timestamp', 'receiver_address', 'sender_address', 
                         'nonce', 'user_agent', 'start_height', 'relay'])
    missing_keys = expected_keys - observed_keys
    extra_keys = observed_keys - expected_keys
    
    assert not missing_keys, f"The following keys were missing: {missing_keys}"
    assert not extra_keys, f"Encountered unexpected key(s): {extra_keys}"
    
    # Dictionary values
    assert_len(payload, 'version', 4)
    assert_len(payload, 'services', 8)
    assert_len(payload, 'timestamp', 8)
    assert_len(payload, 'receiver_address', 26)
    assert_len(payload, 'sender_address', 26)
    assert_len(payload, 'nonce', 8)
    assert_len(payload, 'start_height', 4)
    assert_len(payload, 'relay', 1)
    
    print("Test passed!")
    
test_read_version_payload_initial()

# Integer fields

In the last lesson we wrote a function `read_length` that could read bytes and interpret them as integers, but it wasn't very flexible: it could only ever read 4 bytes at a time and could only interpret them as little endian byte-order.

Let's break out the integer-interpretation into it's own functions depending on whether the bytes are little or big endian

Let's write two helper functions `little_endian_to_int(bytes)` and `big_endian_to_int(bytes)` 

### Exercise: `little_endian_to_int(bytes)` and `big_endian_to_int(bytes)` 

In [None]:
def little_endian_to_int(b):
    raise NotImplementedError()

In [None]:
def little_endian_to_int(b):
    return int.from_bytes(b, 'little')

In [None]:
def test_little_endian_to_int():
    i = 22
    bytes = int.to_bytes(22, 10, 'little')
    result = little_endian_to_int(bytes)
    assert i == result, f'Correct answer: {i}. Your answer: {result}'
    print("Test passed!")

test_little_endian_to_int()

In [None]:
def big_endian_to_int(b):
    raise NotImplementedError()

In [None]:
def big_endian_to_int(b):
    return int.from_bytes(b, 'big')

In [None]:
def test_big_endian_to_int():
    i = 1_000_000
    bytes = int.to_bytes(i, 7, 'big')
    result = big_endian_to_int(bytes)
    assert i == result, f'Correct answer: {i}. Your answer: {result}'
    print("Test passed!")
    
test_big_endian_to_int()

This exercise is a little artificial. You should be able to accomplish each with a single line. `little_endian_to_int(bytes)` isn't all that much simpler than `int.from_bytes(bytes, 'little')`. It really just's binds the `byteorder` parameter to `little`. But it will hopefully be a little easier to remember and therefore represents a small improvement. At the very least I'm testing that you remember how to do a "little endian bytes" -> integer conversion.

### Exercise: Call `read_int` in appropriate places

According to the protocol documentation, the following fields are integers: `version`, `services`, `timestamp`, `nonce`, `start_height`.

Copy over the body of `read_version_payload` you wrote earlier and update the code that handles the fields above. Don't leave them as uninterpreted bytes. Convert them to integers with correct byte order (You can assume each field is little-endian unless the docs tell you otherwise).

In [None]:
def read_version_payload(stream):
    raise NotImplementedError()

In [None]:
def read_version_payload(stream):
    r = {}    
    r['version'] = little_endian_to_int(stream.read(4))
    r['services'] = little_endian_to_int(stream.read(8))
    r['timestamp'] = little_endian_to_int(stream.read(8))
    r['receiver_address'] = stream.read(26)
    r['sender_address'] = stream.read(26)
    r['nonce'] = little_endian_to_int(stream.read(8))
    r['user_agent'] = justins_read_varstr(stream)
    r['start_height'] = little_endian_to_int(stream.read(4))
    r['relay'] = stream.read(1)
    return r

In [None]:
def check_field(payload, field, bytes_value, int_value):
    assert payload[field] == int_value,\
        f'Correct integer interpretation of {bytes_value} is {int_value}'
   
def test_read_version_payload_integer_fields():
    stream = BytesIO(VERSION_PAYLOAD)
    payload = read_version_payload(stream)

    check_field(payload, 'version', b'\x7f\x11\x01\x00', 70015)   
    check_field(payload, 'services', b'\r\x04\x00\x00\x00\x00\x00\x00', 1037)   
    check_field(payload, 'timestamp', b'28j\\\x00\x00\x00\x00', 1550465074)   
    check_field(payload, 'nonce', b'!\xf8\xe8\xff\xceL+s', 8298811190300702753)   
    check_field(payload, 'start_height', b'U\x99\x08\x00', 563541)

    print('Test passed!')
    
test_read_version_payload_integer_fields()

## What do the integers mean?

Now that we know how to interpret these fields as integers, we will don't always know what they mean.

`nonce` is pretty easy -- it's just a random number generated with every request. It's used by Bitcoin clients to determine whether they are connecting to themselves. Basically, you keep track of the nonce in every version message you send and reject any incoming version messages with a nonce in this list -- it's almost certainly you connecting to yourself. The chances of it being someone else are `256**8 == 18446744073709551616`

`start_height` is also straightfoward: it's the block height the sending node claims to be at.

`version` is also straightforward. It's just a number signifying a version of the Bitcoin protocol. Here's an exercise

### Exercise: Given a version message payload, determine whether it can send a `pong` message 

This exercise should give you a taste of the kind of information the version number encodes. [This table](https://bitcoin.org/en/developer-reference#protocol-versions) will show you the way!

In [None]:
def can_send_pong(version_payload):
    return 

In [None]:
def can_send_pong(version_payload):
    return version_payload['version'] >= 60001

In [None]:
index = VERSION_PAYLOAD.index( b'\x7f\x11\x01\x00')
prefix = VERSION_PAYLOAD[:index]
suffix = VERSION_PAYLOAD[index+4:]

def test_can_send_pong():
    for version, can_send in [(70015, True), (60001, True), (60000, False), (106, False)]:
        stream = BytesIO(prefix + version.to_bytes(4, 'little') + suffix)
        payload = read_version_payload(stream)
        assert can_send == can_send_pong(payload),\
            f'Version "{version}" {"can" if can_send else "cannot"} send "pong" messages'

    print("Test passed!")
    
test_can_send_pong()

Hopefully now you can see why the `version` number is so important. This number tells us which dialects of the bitcoin protocol our peer is capable of speaking. If they can't send `pong` messages, we shouldn't send them `ping`s!

## Timestamp

Next comes the `timestamp` field. This is simply a ["Unix timestamps"](https://en.wikipedia.org/wiki/Unix_time). "Unix time" is just a running count of the number of seconds elapsed since the start of the year 1970 -- so it is represented as an integer.

Here's how we interpret a Unix timestamp in Python

In [None]:
from datetime import datetime, timedelta

unix_time_as_justin_wrote_this_exercise = 1550538302
t = datetime.fromtimestamp(unix_time_as_justin_wrote_this_exercise)

n = datetime.now()

In [None]:
datetime.now() - timedelta(hours=2)

### Exercise: Given a version message payload, tell me if the is from the last hour or not

You'd probably never want to do something like this with a version message -- but blocks also have timestamps and the bitcoin protocol is supposed to reject blocks with timestamps that are too far in the future or past.

One way to do this would be to use `time.time` to compare with raw timestamps -- you just need to count the seconds in an hour. One minute ago would be calculated with:

In [None]:
import time

unix_now = time.time()
print("Now: ", unix_now)

one_min_ago = unix_now - 60
print("One minute ago: ", one_min_ago)


Another way would be to use `datetime.timedelta(days=0, seconds=0, microseconds=0, milliseconds=0, minutes=0, hours=0, weeks=0)` which would handle such calculations for you. For example, one day and one second ago would be calculated with:

In [None]:
dt_now = datetime.now()
dt_now

In [None]:
dt_now - timedelta(days=1, seconds=1)

In [None]:
def is_less_than_one_hour_old(version_payload_dict):
    raise NotImplementedError()

In [None]:
def is_less_than_one_hour_old(version_payload_dict):
    return version_payload_dict['timestamp'] > time.time() - 60*60

In [None]:

# FIXME: would be much more intelligent to have a function that could just build this dictionary for me .,..
def replace_bytes(base, index, new_bytes):
    prefix = base[:index]
    suffix = base[index+len(new_bytes):]
    return prefix + new_bytes + suffix

def test_is_less_than_one_hour_old():
    five_min_ago = int(time.time() - 60*5)
    raw_version_payload = replace_bytes(VERSION_PAYLOAD, 12, five_min_ago.to_bytes(8, 'little'))
    version_payload_dict = read_version_payload(BytesIO(raw_version_payload))
    assert is_less_than_one_hour_old(version_payload_dict)

    five_hours_ago = int(time.time() - 60*60*5)
    raw_version_payload = replace_bytes(VERSION_PAYLOAD, 12, five_hours_ago.to_bytes(8, 'little'))
    version_payload_dict = read_version_payload(BytesIO(raw_version_payload))
    assert not is_less_than_one_hour_old(version_payload_dict)

    print("Test passed!")

test_is_less_than_one_hour_old()

# "Services" Field

[The version section of the protocol docs](https://en.bitcoin.it/wiki/Protocol_documentation#version) provides us with the following guide for interpreting the `services` field of the `version` payload:

![image](../images/services.png)

The type of this field is "bitfield". [Check out the wikipedia entry](https://en.wikipedia.org/wiki/Bit_field) for a more detailed explanation that I can provide.

A bitfield is an integer. Every bit of the base-2 representation (e.g. "101" is base-2 representation of 5) holds some pre-defined meaning. This particular bitfield is 8 bytes / 64 bits (remember, a byte is just a collection of 8 bits so 8 bytes is 8*8=64 bits).

From the table above we can see that the least significant digit in the binary representation (decimal value `2^0=1`) represents `NODE_NETWORK`, or whether this peer "can be asked for full blocks or just headers".

The second least-significant digit (decimal value `2^1=2`): `NODE_GETUTXO`

The third least-significant digit (decimal value `2^2=4`): `NODE_BLOOM`

The fourth least-significant digit (decimal value `2^3=8`): `NODE_WITNESS`

The eleventh least-significant digit (decimal value `2^10=1024`): `NODE_NETWORK_LIMITED`

The rest of the bits (decimal values `2*n` where n in {4, 5, 6, 7, 8, 9, 11, 12, ..., 63} have no meaning, yet.

So, in order to interpret this field we need to look up the nth bit in the table above and see if it means anything.

So, our Python code could produce a dictionary like this for every node we connect to. This would allow us to look up what services that node offers _by name_ (which is why it's called a dictionary!):

```
{
    'NODE_NETWORK': True,
    'NODE_GETUTXO': False,
    'NODE_BLOOM': True,
    'NODE_WITNESS': False,
    'NODE_NETWORK_LIMITED': True,
}
```

Furthermore, we could write a function that produces this lookup table for us given an integer bitfield and a magical `check_bit(n)` function:

```
def services_int_to_dict(services_int):
    return {
        'NODE_
        'NODE_NETWORK': check_bit(services_int, 0),           # 1    = 2**0
        'NODE_GETUTXO': check_bit(services_int, 1),           # 2    = 2**1
        'NODE_BLOOM': check_bit(services_int, 2),             # 4    = 2**2
        'NODE_WITNESS': check_bit(services_int, 3),           # 8    = 2**3
        'NODE_NETWORK_LIMITED': check_bit(services_int, 10),  # 1024 = 2**10
    }
```

For now, I'm just going to give you a definition of the magical `check_bit` function:

In [None]:
def check_bit(number, index):
    """See if the bit at `index` in binary representation of `number` is on"""
    mask = 1 << index
    return bool(number & mask)

### Exercise #7: Fill out the remainder of the `services_int_to_dict` and `read_services` functions:

Replace each occurrence of `FIXME` with correct strings and numbers

In [None]:
def services_int_to_dict(services_int):
    return {
        'NODE_NETWORK': check_bit(services_int, "FIXME"),
        'NODE_GETUTXO': check_bit(services_int, "FIXME"),
        '?': check_bit(services_int, "FIXME"),
        '??': check_bit(services_int, "FIXME"),
        '???': check_bit(services_int, "FIXME"),
    }

In [None]:
def services_int_to_dict(services_int):
    return {
        'NODE_NETWORK': check_bit(services_int, 0),
        'NODE_GETUTXO': check_bit(services_int, 1),
        'NODE_BLOOM': check_bit(services_int, 2),
        'NODE_WITNESS': check_bit(services_int, 3),
        'NODE_NETWORK_LIMITED': check_bit(services_int, 10),
    }

In [None]:
def test_services_int_to_dict():
    services = 1 + 2 + 4 + 1024
    answer = {
        'NODE_NETWORK': True,
        'NODE_GETUTXO': True,
        'NODE_BLOOM': True,
        'NODE_WITNESS': False,
        'NODE_NETWORK_LIMITED': True,
    }
    assert services_int_to_dict(services) == answer
    print("Tests passed!")

test_services_int_to_dict()

To give you a better idea what's going on here, check out these `read_services` outputs for some possible inputs:

In [None]:
from pprint import pprint

bitfields = [
    1,
    8,
    1 + 8,
    1024,
    8 + 1024,
    1 + 2 + 4 + 8 + 1024,
    2**5 + 2**9 + 2**25,
]

for bitfield in bitfields:
    pprint(f"(n={bitfield})")
    pprint(services_int_to_dict(bitfield))
    print()

### Exercise #8: Complete these function definitions to hammer home you understanding of this strange `services` "bitfield"

In [None]:
def offers_node_network_service(services_bitfield):
    # given integer services_bitfield, return whether the NODE_NETWORK bit is on
    raise NotImplementedError()

In [None]:
def offers_node_network_service(services_bitfield):
    services_dict = services_int_to_dict(services_bitfield)
    return services_dict['NODE_NETWORK']

In [None]:
def test_offers_node_network_service():
    assert offers_node_network_service(1) is True
    assert offers_node_network_service(1 + 8) is True
    assert offers_node_network_service(4) is False
    print('Test passed!')

test_offers_node_network_service()

In [None]:
def offers_node_bloom_and_node_witness_services(services_bitfield):
    # given integer services_bitfield, return whether the 
    # NODE_BLOOM and NODE_WITNESS bits are on
    raise NotImplementedError()

In [None]:
def offers_node_bloom_and_node_witness_services(services_bitfield):
    services_dict = services_int_to_dict(services_bitfield)
    return services_dict['NODE_BLOOM'] and services_dict['NODE_WITNESS']

In [None]:
def test_offers_node_bloom_and_node_witness_services():
    assert offers_node_bloom_and_node_witness_services(1) is False
    assert offers_node_bloom_and_node_witness_services(1 + 8) is False
    assert offers_node_bloom_and_node_witness_services(4 + 8) is True
    print('Test passed!')
    
test_offers_node_bloom_and_node_witness_services()

As a parting note, here's a look at some nodes that define services not mentioned in the wiki:

![image](../images/other-services.png)

# Boolean Values

We could have treated `relay` like an `int` given how in Python `True` / `False` values are equivalent to `1` / `0`:

In [None]:
print("True is 1: ", True == 1)
print("False is 0: ", False == 0)

But Python `bool` values will make the data in our programs more readable than just using `1` and `0` so let's write a `bytes_to_bool` function which will handle this for us:

### Exercise: `bytes_to_bool(bytes)`

Write a function that will interpret bytes as a boolean 

In [None]:
def bytes_to_bool(bytes):
    raise NotImplementedError()

In [None]:


def bytes_to_bool(bytes):
    return bool(little_endian_to_int(bytes))

In [None]:
def test_bytes_to_bool():
    assert bytes_to_bool(b'\x00') is False,\
        f'bytes_to_bool(b"\x00") should return False'
    assert bytes_to_bool(b'\x00') is False,\
        f'bytes_to_bool(b"\x01") should return True'
    print('Tests passed!')
    
test_bytes_to_bool()

Question: must you be concered of the byteorder with this field?

You actually don't need to be concerned. Byte-order only applies if you have multiple bytes.

In [None]:
one = b'\x01'
little_endian_to_int(one) == big_endian_to_int(one)

### Exercise: Use `bytes_to_bool` to the interpret the `relay` bytes in `read_version_payload`

In [None]:
def read_version_payload(stream):
    raise NotImplementedError()

In [None]:
def read_version_payload(stream):
    r = {}    
    r['version'] = little_endian_to_int(stream.read(4))
    r['services'] = little_endian_to_int(stream.read(8))
    r['timestamp'] = little_endian_to_int(stream.read(8))
    r['receiver_address'] = stream.read(26)
    r['sender_address'] = stream.read(26)
    r['nonce'] = little_endian_to_int(stream.read(8))
    r['user_agent'] = justins_read_varstr(stream)
    r['start_height'] = little_endian_to_int(stream.read(4))
    r['relay'] = bytes_to_bool(stream.read(1))
    return r

In [None]:
def test_read_version_payload_boolean_fields():
    stream = BytesIO(VERSION_PAYLOAD)
    payload = read_version_payload(stream)
    assert payload['relay'] is True
    
    stream = BytesIO(VERSION_PAYLOAD[:-1] + b'\x00')
    payload = read_version_payload(stream)
    assert payload['relay'] is False
    
    print('Test passed!')
    
test_read_version_payload_boolean_fields()

## Timestamps

FIXME: 

In a few different places we are faced with the same decision: an integer or other type has a special meaning. Should we leave it as an integer and define a special "accessor" function to translate to its special meaning, or should we convert it to the special meaning outright.

First, leaving stuff as ints requires less work and adds less complexity so we will favor that going forward. But we'll define a library of functions 

Given that our first application of this code will be to write a crawler that saves everything to a SQL database, it would be very nice if we can leave our data in some kind of universal format like an integer.

### Exercise: `read_timestamp`


### Exercise: Call `read_timestamp` in appropriate places

## Variable Length Fields

In the first exercise of this lesson we encountered the `?`-length `user_agent` field. I gave you a `justins_read_varstr` function and the rest of the code magically worked.

What the hell was going on there?



### Exercise: `read_varint`


### Exercise: `read_varstr`


### Exercise: Call `read_varstr` in appropriate places


## Services

# "Variable Length" fields

Next comes `var_str`, the type of the "User Agent", which is basically an advertisement of the Bitcoin software implementation that the node is using. You can see a listing of popular values [here](https://bitnodes.earn.com/nodes/).

["Variable Length Strings"](https://en.bitcoin.it/wiki/Protocol_documentation#Variable_length_string) are used for string fields of unpredictible length. This technique strives to use only the space it needs. It does so by prepending a "variable length integer" in front of the string value being communicated, which tells the receiver how many bytes they should read in order to read the encoded string value. This is kind of similar to how the payload bytes are handled in our `read_message` function -- first we read `length` and then we read `length`-many bytes to get our raw payload. Same idea here, but now the length of the string isn't an integer, but a "variable length integer".

How does this `varint` work?

The first byte of a `varint` is a marker which says how many bytes come after it:
* `0xFF`: 8 byte integer follows
* `0xFE`: 4 byte integer follows
* `0xFD`: 2 byte integer follows
* < `0xFD`: 0 bytes follow. Interpret first byte as a 1 byte integer.

### Exercise #5:  Implement `read_varint`, since `read_varstr` will depend on it and the version message's `user_agent` requires `read_varstr`

Since this is a somewhat complicated function, I've outlined it for you. Replace the `"FIXME"`s:

In [None]:
def read_varint(stream):
    i = read_int(stream, 1)
    if i == 0xff:
        return read_int(stream, 8)
    elif i == 0xfe:
        return "FIXME"
    elif "FIXME":
        return "FIXME"
    else:
        "FIXME"

In [None]:
def read_varint(stream):
    i = little_endian_to_int(stream.read(1))
    if i == 0xff:
        return little_endian_to_int(stream.read(8))
    elif i == 0xfe:
        return little_endian_to_int(stream.read(4))
    elif i == 0xfd:
        return little_endian_to_int(stream.read(2))
    else:
        return i

In [None]:
eight_byte_int = 2 ** (8 * 8) - 1
four_byte_int = 2 ** (8 * 4) - 1
two_byte_int = 2 ** (8 * 2) - 1
one_byte_int = 7

eight_byte_int_bytes = eight_byte_int.to_bytes(8, 'little')
four_byte_int_bytes = four_byte_int.to_bytes(4, 'little')
two_byte_int_bytes = two_byte_int.to_bytes(2, 'little')
one_byte_int_bytes = one_byte_int.to_bytes(1, 'little')

eight_byte_prefix = (0xff).to_bytes(1, 'little')
four_byte_prefix = (0xfe).to_bytes(1, 'little')
two_byte_prefix = (0xfd).to_bytes(1, 'little')

eight_byte_var_int =  eight_byte_prefix + eight_byte_int_bytes
four_byte_var_int = four_byte_prefix + four_byte_int_bytes
two_byte_var_int = two_byte_prefix + two_byte_int_bytes
one_byte_var_int = one_byte_int_bytes

enumerated = (
    (eight_byte_int, eight_byte_var_int),
    (four_byte_int, four_byte_var_int),
    (two_byte_int, two_byte_var_int),
    (one_byte_int, one_byte_var_int),
)

def test_read_varint():
    for correct_int, var_int in enumerated:
        stream = BytesIO(var_int)
        calculated_int = read_varint(stream)
        assert correct_int == calculated_int, (correct_int, calculated_int)
        
    print('Test passed!')

test_read_varint()

Now that we have that out of the way:

### Exercise #6: Implement `read_varstr`

In [None]:
def read_varstr(stream):
    raise NotImplementedError()

In [None]:
def read_varstr(stream):
    length = read_varint(stream)
    string = stream.read(length)
    return string

In [None]:
from proto import encode_varstr

long_str = b"A purely peer-to-peer version of electronic cash would allow online payments to be sent directly from one party to another without going through a financial institution. Digital signatures provide part of the solution, but the main benefits are lost if a trusted third party is still required to prevent double-spending. We propose a solution to the double-spending problem using a peer-to-peer network.  The network timestamps transactions by hashing them into an ongoing chain of hash-based proof-of-work, forming a record that cannot be changed without redoing the proof-of-work. The longest chain not only serves as proof of the sequence of events witnessed, but proof that it came from the largest pool of CPU power. As long as a majority of CPU power is controlled by nodes that are not cooperating to attack the network, they'll generate the longest chain and outpace attackers. The network itself requires minimal structure. Messages are broadcast on a best effort basis, and nodes can leave and rejoin the network at will, accepting the longest proof-of-work chain as proof of what happened while they were gone."
long_var_str = encode_varstr(long_str)

short_str = b"!"
short_var_str = encode_varstr(short_str)

enumerated = (
    (short_str, short_var_str),
    (long_str, long_var_str),
)

def test_read_varstr():
    for correct_byte_str, var_str in enumerated:
        stream = BytesIO(var_str)
        calculated_byte_str = read_varstr(stream)
        assert correct_byte_str == calculated_byte_str
    print('Test passed!')
    
test_read_varstr()

# "Network Address" Type

[`net_addr`](https://en.bitcoin.it/wiki/Protocol_documentation#Network_address) is the most complicated new type we encounter this lesson, so we'll handle it last. Plus, it builds on the `timestamp` and `services` types we learned to read above.

![image](../images/network-address.png)

Network addresses require we interpret 4 new kinds of data:

1. `time`: Unix timestamp. Already done.
2. `services`: integer bitfield. Already done.
3. `IP address`: complicated ...
4. `port`: big-endian encoded `int`

Since network addresses are sort of their own "type" -- and this code will be re-used when we write the crawler -- let's create a `read_address` which our `read_version_payload` function can call when it reaches the network address bytes.

We give it a `has_timestamp` variable because the docs mention how the `timestamp` attribute is always present *except* in version messages. So the `has_timestamp` flag will be used to denote whether or not we're in such a context ...

FIXME: make this into an exercise

In [None]:
IPV4_PREFIX = b"\x00" * 10 + b"\xff" * 2

def bytes_to_ip(b):
    # IPv4
    if b[0:12] == IPV4_PREFIX:
        return socket.inet_ntop(socket.AF_INET, b[12:16])

    # IPv6
    else:
        return socket.inet_ntop(socket.AF_INET6, b)

def read_address(stream, has_timestamp):
    r = {}
    if has_timestamp:
        r["timestamp"] = little_endian_to_int(stream.read(4))
    r["services"] = little_endian_to_int(stream.read(8))
    r["ip"] = bytes_to_ip(stream.read(16))
    r["port"] = big_endian_to_int(stream.read(2))
    return r


In [None]:
def test_read_address():
    services = 7
    services_bytes = services.to_bytes(8, 'little')
    ipv4 = '10.10.10.10'
    ipv4_bytes = IPV4_PREFIX + socket.inet_pton(socket.AF_INET, ipv4)
    ipv6 = '2a02:1205:501e:d30:f57a:6958:7e47:2694'
    ipv6_bytes = socket.inet_pton(socket.AF_INET6, ipv6)
    port = 8333
    port_bytes = port.to_bytes(2, 'big')
    
    # IPv4
    stream = BytesIO(services_bytes + ipv4_bytes + port_bytes)
    address = read_address(stream, has_timestamp=False)
    assert address['services'] == services
    assert address['ip'] == ipv4
    assert address['port'] == port
    
    # IPv6
    stream = BytesIO(services_bytes + ipv6_bytes + port_bytes)
    address = read_address(stream, has_timestamp=False)
    assert address['ip'] == ipv6
    
    print('Test passed!')

test_read_address()

# Putting it all together

In [None]:
# FIXME make into exercise
def read_version_payload(stream):
    r = {}
    r["version"] = little_endian_to_int(stream.read(4))
    r["services"] = little_endian_to_int(stream.read(8))
    r["timestamp"] = little_endian_to_int(stream.read(8))
    r["receiver_address"] = read_address(stream, has_timestamp=False)
    r["sender_address"] = read_address(stream, has_timestamp=False)
    r["nonce"] = little_endian_to_int(stream.read(8))
    r["user_agent"] = read_varstr(stream)
    r["start_height"] = little_endian_to_int(stream.read(4))
    r["relay"] = little_endian_to_int(stream.read(1))
    return r

version_payload = read_version_payload(BytesIO(VERSION_PAYLOAD))
version_payload

In [None]:
services_int_to_dict(version_payload['services'])

In [None]:
def test_read_version_payload_final():
    
    # TODO: copy paste all the prior test code here so that it 
    
    # Or maybe just call all the prior functions here???
    
    pass

Homework:
* follow along with this article: https://coinlogic.wordpress.com/2014/03/09/the-bitcoin-protocol-4-network-messages-1-version/. It covers how to use Wireshark to observe the messages that `bitcoind` sends.