# Reading Version Message Payloads

In the last lesson we encountered the Bitcoin protocol's [Version Handshake](https://en.bitcoin.it/wiki/Version_Handshake). We saw how Bitcoin network peers will only converse with us if we first introduce ourselves with a `version` message.

But _we cheated_. I gave you a serialized `version` message and didn't tell you how I created it.

_We were lazy_: we didn't parse the cryptic `payload` of the `version` message that our peer sent us.

_We were rude_! After listening for our peer's `version` message we stopped listening and never received or responded to their `verack` message -- completing the handshake. Our peer was left hanging ...

So you see, we have much to fix!

### Housekeeping

Last time we created a `NetworkEnvelope` class. I'm going to throw that away and use functions and dictionaries instead. Simpler this way!

Here's where we are:

In [None]:
from hashlib import sha256

NETWORK_MAGIC = b'\xf9\xbe\xb4\xd9'

def double_sha256(s):
    return sha256(sha256(s).digest()).digest()

def read_message(stream):
    msg = {}
    magic = stream.read(4)
    if magic != NETWORK_MAGIC:
        raise Exception(f'Magic is wrong: {magic}')
    msg['command'] = stream.read(12).strip(b'\x00')
    payload_length = int.from_bytes(stream.read(4), 'little')
    checksum = stream.read(4)
    msg['payload'] = stream.read(payload_length)
    calculated_checksum = double_sha256(msg['payload'])[:4]
    if calculated_checksum != checksum:
        raise Exception('Checksum does not match')
    return msg

In [None]:
import socket
from pprint import pprint

# magic "version" bytestring
VERSION = b'\xf9\xbe\xb4\xd9version\x00\x00\x00\x00\x00j\x00\x00\x00\x9b"\x8b\x9e\x7f\x11\x01\x00\x0f\x04\x00\x00\x00\x00\x00\x00\x93AU[\x00\x00\x00\x00\x0f\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0f\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00rV\xc5C\x9b:\xea\x89\x14/some-cool-software/\x01\x00\x00\x00\x01'

sock = socket.socket()
sock.connect(("35.198.151.21", 8333))
stream = sock.makefile('rb')

# initiate the "version handshake"
sock.send(VERSION)

# receive their "version" response
msg = read_message(stream)

print(msg)



The payload of our message is still `bytes`. We need to decode the payload in the same way that we decoded the outer message structure itself. To do this we'll continue to interpret the Bitcoin wiki's protocol documentation field-by-field.


# Objective: Interpret the Message Payload

Here's how payloads look now:

```
b'\x7f\x11\x01\x00\r\x04\x00\x00\x00\x00\x00\x0028j\\\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffFqPG\xa8\xc6\r\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00!\xf8\xe8\xff\xceL+s\x10/Satoshi:0.16.3/U\x99\x08\x00\x01'
```

In this lesson we'll learn to interpret these bytes as a dictionary like this:

```
{'nonce': 8298811190300702753,
 'receiver_address': {'ip': '::ffff:70.113.80.71',
                      'port': 43206,
                      'services': 0},
 'relay': 1,
 'sender_address': {'ip': '0.0.0.0', 'port': 0, 'services': 1037},
 'services': 1037,
 'start_height': 563541,
 'timestamp': 1550465074,
 'user_agent': b'/Satoshi:0.16.3/',
 'version': 70015}
```

### The Payload

To do this we figure out the fields contained in this payload, together with their respective lengths, types and meanings.

[This chart](https://en.bitcoin.it/wiki/Protocol_documentation#version) from the protocol documentation will act as our blueprint:

![image](../images/version-message.png)

### Old Types

Here we encounter some "types" we are now familiar with from the first lesson -- `int32_t` / `uint64_t` / `int64_t` -- which are different types in a "low-level" language like C++, but are all equivalent to the `int` type in Python just like the `length` field from lesson 1.

### New Types

But we also encounter some new types: `net_addr`, `varstr`, and `bool`. 

Even worse, if we click on the [`varstr` link](https://en.bitcoin.it/wiki/Protocol_documentation#Variable_length_string) we see that it contains _another_ additional type: `varint`.

Worse still, the [`net_addr` link](https://en.bitcoin.it/wiki/Protocol_documentation#Network_address) contains `time`, `services`, `ip` and `port` fields nominally of types `uint32`, `uint64_t`, `char[16]`, and `uint16_t` but each has some special meaning: the `time` integer is a Unix timestamp, the `services` integer is a "bitfield" (whatever that is!), and the IP address can be either IPv4 or IPv6 and our code must be able to tell the difference!

And remember how I mentioned that Satoshi usually, but not always, encoded bytes in "little endian" byte order (least significant digits is on the left)? Well, the `port` attribute of `net_addr` is encoded "big endian", where the *most* significant digit is on the left. Yes, the exact opposite of everything else!!!

Hunker down for a looooooong lesson!

### Progress

Here's a little blueprint of what we'll cover this lesson. At times we'll get deep into the weeds, and I hope this chart will help you keep track of the plot:

- [ ] `read_version_payload` v1: convert payload to dictionary with `bytes` values
- [ ] Interpret Integer fields
    - [ ] Convert bytes to integers
    - [ ] Interpret `version`
    - [ ] Interpret `timestamp`
    - [ ] Interpret `services`
- [ ] Interpret boolean `relay`
- [ ] Interpret variable-length `user_agent`
    - [ ] Implement `read_varint`
    - [ ] Implement `read_varstr`
- [ ] Interpret sender and receiver `net_addr` values
    - [ ] Interpret IP addresses
    - [ ] Interpret big-endian port numbers
- [ ] `read_version_payload` v2: convert payload to dictionary interpreted values where appropriate

### Exercise: `read_version_payload`

Write a function `read_version_payload(stream)` takes a stream of bytes composed of the payload of a version message and turns it into a dictionary. The first four bytes represent the `version`, next eight represent `services`, and so on as described in the table above.

Your function should return a dictionary with keys equal to the version message attributes and values equal to the uninterpreted bytes corresponding to that key:

```
{
    'version': <4 raw version bytes>,
    'services': <8 raw services bytes>,
    'timestamp': <8 raw timestamp bytes>,
    'receiver_address': <26 raw network address bytes>,
    'sender_address': <26 raw network address bytes>,
    'nonce': <8 raw nonce bytes>,
    'user_agent': <variable raw user agent bytes>,
    'start_height': <4 raw start height bytes>,
    'relay': <1 raw relay byte>,
}
```

Throughout the lesson we will slowly add functions to interpret the raw bytes left uninterpreted in this function.

This exercise is simply asking you to read the correct number of bytes for every field and store these bytes under the keys used by the dictionary snipped above.

In [None]:
from lib import read_varstr as magic_read_varstr

VERSION_PAYLOAD = b'\x7f\x11\x01\x00\r\x04\x00\x00\x00\x00\x00\x0028j\\\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffFqPG\xa8\xc6\r\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00!\xf8\xe8\xff\xceL+s\x10/Satoshi:0.16.3/U\x99\x08\x00\x01'

def read_version_payload(stream):
    # We will build up this dictionary as we go
    r = {}
    
    # First read the 4 byte `version` number and save to the r['version'] key 
    r['version'] = stream.read(4)
    
    # Your turn: Follow this pattern to fill in the 
    # "timestamp", "receiver_address", "sender_address", and "nonce" fields
    
    # Giving you this one. You will re-implement later ...
    r['user_agent'] = magic_read_varstr(stream)
    
    # Your turn: Fill out the remaining "start_height" and "relay" attributes
    
    # Return the dictionary we've assembled
    return r

In [None]:
from io import BytesIO
from utils import assert_len

VERSION_PAYLOAD = b'\x7f\x11\x01\x00\r\x04\x00\x00\x00\x00\x00\x0028j\\\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffFqPG\xa8\xc6\r\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00!\xf8\xe8\xff\xceL+s\x10/Satoshi:0.16.3/U\x99\x08\x00\x01'

def test_read_version_payload_initial():
    stream = BytesIO(VERSION_PAYLOAD)
    payload = read_version_payload(stream)

    # Dictionary keys
    observed_keys = set(payload.keys())
    expected_keys = set(['version', 'services', 'timestamp', 'receiver_address', 'sender_address', 
                         'nonce', 'user_agent', 'start_height', 'relay'])
    missing_keys = expected_keys - observed_keys
    extra_keys = observed_keys - expected_keys
    
    assert not missing_keys, f"The following keys were missing: {missing_keys}"
    assert not extra_keys, f"Encountered unexpected key(s): {extra_keys}"
    
    # Dictionary values
    assert_len(payload, 'version', 4)
    assert_len(payload, 'services', 8)
    assert_len(payload, 'timestamp', 8)
    assert_len(payload, 'receiver_address', 26)
    assert_len(payload, 'sender_address', 26)
    assert_len(payload, 'nonce', 8)
    assert_len(payload, 'start_height', 4)
    assert_len(payload, 'relay', 1)
    
    print("Test passed!")
    
test_read_version_payload_initial()

[Link to the answers for this lesson](./Answers.ipynb)

### Progress

The first step is always hardest!

- [x] `read_version_payload` v1: convert payload to dictionary with `bytes` values
- [ ] Interpret Integer fields
    - [ ] Convert bytes to integers
    - [ ] Interpret `version`
    - [ ] Interpret `timestamp`
    - [ ] Interpret `services`
- [ ] Interpret boolean `relay`
- [ ] Interpret variable-length fields
- [ ] Interpret variable-length `user_agent`
    - [ ] Implement `read_varint`
    - [ ] Implement `read_varstr`
- [ ] Interpret sender and receiver `net_addr` values
    - [ ] Interpret IP addresses
    - [ ] Interpret big-endian port numbers
- [ ] `read_version_payload` v2: convert payload to dictionary interpreted values where appropriate

# Integer fields

In the last lesson we wrote a function `read_length` that could read bytes and interpret them as integers, but it wasn't very flexible: it could only ever read 4 bytes at a time and could only interpret them as little endian byte-order.

Let's break out the integer-interpretation into it's own functions depending on whether the bytes are little or big endian: `little_endian_to_int(bytes)` & `big_endian_to_int(bytes)` 

### Exercise: `little_endian_to_int(bytes)` and `big_endian_to_int(bytes)` 

In [None]:
def little_endian_to_int(b):
    raise NotImplementedError()

In [None]:
def test_little_endian_to_int():
    i = 22
    bytes = int.to_bytes(22, 10, 'little')
    result = little_endian_to_int(bytes)
    assert i == result, f'Correct answer: {i}. Your answer: {result}'
    print("Test passed!")

test_little_endian_to_int()

In [None]:
def big_endian_to_int(b):
    raise NotImplementedError()

In [None]:
def test_big_endian_to_int():
    i = 1_000_000
    bytes = int.to_bytes(i, 7, 'big')
    result = big_endian_to_int(bytes)
    assert i == result, f'Correct answer: {i}. Your answer: {result}'
    print("Test passed!")
    
test_big_endian_to_int()

This exercise is a little artificial. You should be able to accomplish each with a single line. `little_endian_to_int(bytes)` isn't all that much simpler than `int.from_bytes(bytes, 'little')`. It really just's binds the `byteorder` parameter to `"little"`. But it will hopefully be a little easier to remember and therefore represents a small improvement. At the very least I'm testing that you remember how to do a "little endian bytes" -> integer conversion!

### Exercise: Update `read_version_payload` to interpret integers using `little_endian_to_int` and `big_endian_to_int`

According to the protocol documentation, the following fields are integers: `version`, `services`, `timestamp`, `nonce`, `start_height`.

Copy over the body of `read_version_payload` you wrote earlier and update the code that handles the fields above. Don't leave them as uninterpreted bytes. Convert them to integers with correct byte order (You can assume each field is little-endian unless the docs tell you otherwise).

In [None]:
def read_version_payload(stream):
    raise NotImplementedError()

In [None]:
from utils import check_field

def test_read_version_payload_integer_fields():
    stream = BytesIO(VERSION_PAYLOAD)
    payload = read_version_payload(stream)

    check_field(payload, 'version', b'\x7f\x11\x01\x00', 70015)   
    check_field(payload, 'services', b'\r\x04\x00\x00\x00\x00\x00\x00', 1037)   
    check_field(payload, 'timestamp', b'28j\\\x00\x00\x00\x00', 1550465074)   
    check_field(payload, 'nonce', b'!\xf8\xe8\xff\xceL+s', 8298811190300702753)   
    check_field(payload, 'start_height', b'U\x99\x08\x00', 563541)

    print('Test passed!')
    
test_read_version_payload_integer_fields()

## What do the integers mean?

We can now interpret bytes as integers, but what do the integers themselves mean?

`nonce` is pretty easy -- it's just a random number generated with every request. It's used by Bitcoin clients to determine whether they are connecting to themselves. Basically, generate and record a 64-bit random nonce for every version message you send and reject any incoming version messages with a nonce in this list -- it's almost certainly you connecting to yourself accidentally. The chances of it being someone else are one in `256**8 == 18446744073709551616`

`start_height` is also straightfoward: it's the block height the sending node claims to be at. If you're doing initial block download and your peer is at height 1, you might reject the connection because they can't help you download blocks (although you might allow it because you could help them).

`version` is also straightforward. It's just a number signifying a version of the Bitcoin protocol. Here's an exercise:

### Exercise: Given a version message payload (a dictionary of the sort returned by `read_version_payload`), determine whether it's node can send a `pong` message

The `ping` and `pong` messages are used by bitcoin to periodically check if a peer is still online.

This exercise should give you a taste of the kind of information the version number encodes. [This table](https://bitcoin.org/en/developer-reference#protocol-versions) contains all the information you need!

In [None]:
def can_send_pong(version_payload):
    raise NotImplementedError()

In [None]:
from utils import replace_bytes

def test_can_send_pong():
    for version, can_send in [(70015, True), (60001, True), (60000, False), (106, False)]:
        index = VERSION_PAYLOAD.index( b'\x7f\x11\x01\x00')
        new_bytes = version.to_bytes(4, 'little')
        stream = BytesIO(replace_bytes(VERSION_PAYLOAD, index, new_bytes))
        payload = read_version_payload(stream)
        assert can_send == can_send_pong(payload),\
            f'Version "{version}" {"can" if can_send else "cannot"} send "pong" messages'

    print("Test passed!")
    
test_can_send_pong()

Hopefully now you can see how the `version` number can be useful. This number tells us which dialects of the bitcoin protocol our peer is capable of speaking. If they can't send `pong` messages, we shouldn't send them a `ping`!

## `timestamp`

Next comes the `timestamp` field. This is simply a ["Unix timestamp"](https://en.wikipedia.org/wiki/Unix_time). "Unix time" is just a running count of the number of seconds elapsed since the start of the year 1970 in the UTC timezone.

Here are two ways to compare a given unix timestamp with the current time.

In [None]:
import time

unix_time_as_justin_wrote_this_exercise = 1550538302

# Simple:

unix_now = time.time()
elapsed = unix_now - unix_time_as_justin_wrote_this_exercise
print(elapsed, "seconds have elapsed since this exercise was written")

In [None]:
# Sophisticated

from datetime import datetime, timedelta

dt_as_justin_wrote_this_exercise = datetime.fromtimestamp(
    unix_time_as_justin_wrote_this_exercise)

dt_now = datetime.now()
delta = dt_now - dt_as_justin_wrote_this_exercise
print(delta.seconds, "seconds have elapsed since this exercise was written")
print(delta.days, "days have elapsed since this exercise was written")

In [None]:
# "datetime.datetime" & "datetime.timedelta" objects make aritmetic
# using timestamps easier. You don't need to count seconds yourself!

two_days_ago = datetime.now() - timedelta(days=2)

print("Two days ago", two_days_ago)

### Exercise: Given a version message payload (dictionary of the sort returned by `read_version_payload`), tell me if it is from the last hour or not

You'd probably never want to do something like this with a version message -- but blocks also have timestamps and the bitcoin protocol is supposed to reject blocks with timestamps that are too far in the future or past.

Use either `time.time()` to calculate the current Unix time and count the numbers of seconds in an hours, or convert the unix timestamp using `datetime.datetime.fromtimestamp()`, calculate the current time using `datetime.now()`, and let `datetime.timedelta(days=?, seconds=?, microseconds=?, milliseconds=?, minutes=?, hours=?, weeks=?)` do the counting for you.

In [None]:
def is_less_than_one_hour_old(version_payload_dict):
    raise NotImplementedError()

In [None]:
def test_is_less_than_one_hour_old():
    # FIXME: would be much more intelligent to have a function that could just build this dictionary for me .,..
    
    five_min_ago = int(time.time() - 60*5)
    raw_version_payload = replace_bytes(VERSION_PAYLOAD, 12, five_min_ago.to_bytes(8, 'little'))
    version_payload_dict = read_version_payload(BytesIO(raw_version_payload))
    assert is_less_than_one_hour_old(version_payload_dict)

    five_hours_ago = int(time.time() - 60*60*5)
    raw_version_payload = replace_bytes(VERSION_PAYLOAD, 12, five_hours_ago.to_bytes(8, 'little'))
    version_payload_dict = read_version_payload(BytesIO(raw_version_payload))
    assert not is_less_than_one_hour_old(version_payload_dict)

    print("Test passed!")

test_is_less_than_one_hour_old()

# "Services" Field

[The version section of the protocol docs](https://en.bitcoin.it/wiki/Protocol_documentation#version) provides us with the following guide for interpreting the `services` field of the `version` payload:

![image](../images/services.png)

It is a "bitfield". [Check out the wikipedia entry](https://en.wikipedia.org/wiki/Bit_field) for a more detailed explanation that I can provide.

A bitfield is an integer. Every bit of the base-2 representation (e.g. "101" is base-2 representation of 5) holds some pre-defined meaning. This particular bitfield is 8 bytes / 64 bits (remember, a byte is just a collection of 8 bits so 8 bytes is 8 bytes * 8 bits/byte = 64 bits).

From the table above we can see that the least significant digit in the binary representation (decimal value `2^0=1`) represents `NODE_NETWORK`. If this bit is on the node can serve full blocks, if it's off it can only serve block headers.

The second least-significant digit (decimal value `2^1=2`) represents`NODE_GETUTXO`, from a _failed_ BIP by Mike Hearn proposing a new `getutxos` network message that would have allowed SPV clients to query full nodes regarding specific UTXOs of interest to them. Currently, the [only nodes advertising this service are running very old versions of Bitcoin XT](https://bitnodes.earn.com/nodes/?q=node_getutxo). This is kind of interesting -- these specific nodes support extra network messages that current Bitcoin core clients do not!

The third least-significant digit (decimal value `2^2=4`) represents `NODE_BLOOM`. [BIP 37](https://github.com/bitcoin/bips/blob/master/bip-0037.mediawiki) created a system for SPV clients to be able to download specific transactions (those which effect their UTXOs) without having to download the whole blocks (which are unnecessary b/c SPV nodes don't validate the blockchain). This system uses ["Bloom filters"](https://en.wikipedia.org/wiki/Bloom_filter). The subsequent [BIP 111](https://github.com/bitcoin/bips/blob/master/bip-0111.mediawiki) defined a `services` bit to advertise this service.

The fourth least-significant digit (decimal value `2^3=8`) represents `NODE_WITNESS`, or whether the emitting node can serve transactions and blocks containing Segwit "witness data". This is described in [BIP 144](https://github.com/bitcoin/bips/blob/master/bip-0144.mediawiki).

The eleventh least-significant digit (decimal value `2^10=1024`) represents `NODE_NETWORK_LIMITED`. This bit has the same meaning as `NODE_NETWORK`, except it only applies to the last 144 blocks. In a nutshell, this identifies a "pruned full node" which does validate blocks but doesn't maintain a historical copy of the blockchain.

[Here are these definitions in the Bitcoin Core source code](https://github.com/bitcoin/bitcoin/blob/38429c4b622887f2c1db15a7826215477ca6868c/src/protocol.h#L247) BTW

The rest of the bits (decimal values `2*n` where n in {4, 5, 6, 7, 8, 9, 11, 12, ..., 63} have no meaning, yet.

So, in order to interpret this field we need to look up the nth bit in the table above and see if it means anything.

Let's write a function that can turn one of these `services` integers into a dictionary that would allow us to look up whether any give service is on or off:

```
{
    'NODE_NETWORK': True,
    'NODE_GETUTXO': False,
    'NODE_BLOOM': True,
    'NODE_WITNESS': False,
    'NODE_NETWORK_LIMITED': True,
}
```

This would be pretty easy if we had a `check_bit(bitfield, index)` function that could tell us whether the `n`-th bit is on-or-off.

### Exercise: `check_bit(bitfield, index)`

Return a boolean value indicating whether the `index`-th bit is on-or-off. This is a tricky one ....

In [None]:
def check_bit(bitfield, index):
    raise NotImplementedError()

In [None]:
def test_check_bit():
    n =  2** 0 + 2**2 + 2**4
    assert check_bit(n, 0) is True
    assert check_bit(n, 1) is False
    assert check_bit(n, 2) is True
    assert check_bit(n, 3) is False
    assert check_bit(n, 4) is True
    print('Test passed!')
    
test_check_bit()

A graphical representation of this testcase is kind of interesting:

In [None]:
n =  2** 0 + 2**2 + 2**4

for i in range(5):
    print(f"The {i} bit of {bin(n)} is {'on' if check_bit(n, i) else 'off'}")

### Exercise #7:  `services_int_to_dict`

Write a function `services_int_to_dict` which can produces the following dictionary, for example, when called with argument `1029`

```
{
    'NODE_NETWORK': True,
    'NODE_GETUTXO': False,
    'NODE_BLOOM': True,
    'NODE_WITNESS': False,
    'NODE_NETWORK_LIMITED': True,
}
```

Hint: use `check_bit`

In [None]:
def services_int_to_dict(services_int):
    raise NotImplementedError()

In [None]:
def test_services_int_to_dict():
    services = 1 + 2 + 4 + 1024
    answer = {
        'NODE_NETWORK': True,
        'NODE_GETUTXO': True,
        'NODE_BLOOM': True,
        'NODE_WITNESS': False,
        'NODE_NETWORK_LIMITED': True,
    }
    assert services_int_to_dict(services) == answer
    print("Tests passed!")

test_services_int_to_dict()

To give you a better idea what's going on here, check out these `read_services` outputs for some possible inputs:

In [None]:
from pprint import pprint

bitfields = [
    1,
    8,
    1 + 8,
    1024,
    8 + 1024,
    1 + 2 + 4 + 8 + 1024,
    2**5 + 2**9 + 2**25,
]

for bitfield in bitfields:
    pprint(f"(n={bitfield})")
    pprint(services_int_to_dict(bitfield))
    print()

### Exercise #8: Complete these function definitions to hammer home you understanding of this strange `services` "bitfield"

In [None]:
def offers_node_network_service(services_bitfield):
    # given integer services_bitfield, 
    # return whether the NODE_NETWORK bit is on
    raise NotImplementedError()

In [None]:
def test_offers_node_network_service():
    assert offers_node_network_service(1) is True
    assert offers_node_network_service(1 + 8) is True
    assert offers_node_network_service(4) is False
    print('Test passed!')

test_offers_node_network_service()

In [None]:
def offers_node_bloom_and_node_witness_services(services_bitfield):
    # given integer services_bitfield, return whether the 
    # NODE_BLOOM and NODE_WITNESS bits are on
    raise NotImplementedError()

In [None]:
def test_offers_node_bloom_and_node_witness_services():
    assert offers_node_bloom_and_node_witness_services(1) is False
    assert offers_node_bloom_and_node_witness_services(1 + 8) is False
    assert offers_node_bloom_and_node_witness_services(4 + 8) is True
    print('Test passed!')
    
test_offers_node_bloom_and_node_witness_services()

As a parting note, here's a look at some nodes that define services not mentioned in the wiki:

![image](../images/other-services.png)

### Progress report:

- [x] `read_version_payload` v1: convert payload to dictionary with `bytes` values
- [x] Interpret Integer fields
    - [x] Convert bytes to integers
    - [x] Interpret `version`
    - [x] Interpret `timestamp`
    - [x] Interpret `services`
- [ ] Interpret boolean `relay`
- [ ] Interpret variable-length `user_agent`
    - [ ] Implement `read_varint`
    - [ ] Implement `read_varstr`
- [ ] Interpret sender and receiver `net_addr` values
    - [ ] Interpret IP addresses
    - [ ] Interpret big-endian port numbers
- [ ] `read_version_payload` v2: convert payload to dictionary interpreted values where appropriate

# Boolean Values

We could have treated `relay` like an `int` given how in Python `True` / `False` values are equivalent to `1` / `0`:

In [None]:
print("True is 1: ", True == 1)
print("False is 0: ", False == 0)

But Python `bool` values will make the data in our programs more readable than just using `1` and `0` so let's write a `bytes_to_bool` function which will handle this for us:

### Exercise: `bytes_to_bool(bytes)`

Write a function that will interpret bytes as a boolean 

In [None]:
def bytes_to_bool(bytes):
    raise NotImplementedError()

In [None]:
def test_bytes_to_bool():
    assert bytes_to_bool(b'\x00') is False,\
        f'bytes_to_bool(b"\x00") should return False'
    assert bytes_to_bool(b'\x01') is True,\
        f'bytes_to_bool(b"\x01") should return True'
    print('Tests passed!')
    
test_bytes_to_bool()

Question: must you be concered of the byteorder with this field?

You actually don't need to be concerned. Byte-order only applies if you have multiple bytes.

In [None]:
one = b'\x01'
little_endian_to_int(one) == big_endian_to_int(one)

### Exercise: Use `bytes_to_bool` to interpret the `relay` bytes in `read_version_payload`

In [None]:
def read_version_payload(stream):
    raise NotImplementedError()

In [None]:
def test_read_version_payload_boolean_fields():
    stream = BytesIO(VERSION_PAYLOAD)
    payload = read_version_payload(stream)
    assert payload['relay'] is True
    
    stream = BytesIO(VERSION_PAYLOAD[:-1] + b'\x00')
    payload = read_version_payload(stream)
    assert payload['relay'] is False
    
    print('Test passed!')
    
test_read_version_payload_boolean_fields()

# "Variable Length" fields

Next comes `var_str`, the type of the `user_agent` field, which is basically an advertisement of the Bitcoin software implementation that the node is using. You can see a listing of popular values [here](https://bitnodes.earn.com/nodes/).

["Variable length strings"](https://en.bitcoin.it/wiki/Protocol_documentation#Variable_length_string) are used for string fields of unpredictible length. This technique strives to use only the space it needs. It does so by prepending a "variable length integer" in front of the string value being transmitted, which tells the receiver how many bytes they should read in order to read the encoded string value. This gives us the flexibility to send a string with lengths 1 to 18446744073709551615. This is kind of similar to how the payload bytes are handled in our `read_message` function -- first we read `length` and then we read `length`-many bytes to get our raw payload. Same idea here, but now the length of the string isn't an integer, but a "variable length integer".

How does this `varint` work?

The first byte of a `varint` is a marker which says how many bytes come after it:
* `0xFF`: 8 byte integer follows
* `0xFE`: 4 byte integer follows
* `0xFD`: 2 byte integer follows
* < `0xFD`: 0 bytes follow. Interpret first byte as a 1 byte integer.

[Here's another brief tutorial on the `varint` structure](http://learnmeabitcoin.com/glossary/varint) if you thirst for more!

### Exercise:  Implement `read_varint`, since `read_varstr` will depend on it and the version message's `user_agent` requires `read_varstr`

Since this is a somewhat complicated function, I've outlined it for you. Replace the `"FIXME"`s:

In [None]:
def read_varint(stream):
    i = little_endian_to_int(stream.read(1))
    if i == 0xff:
        return little_endian_to_int(stream.read(1))
    elif i == 0xfe:
        return "FIXME"
    elif "FIXME":
        return "FIXME"
    else:
        "FIXME"

In [None]:
def test_read_varint():
    # FIXME: Ungodly amount of test code ...

    eight_byte_int = 2 ** (8 * 8) - 1
    four_byte_int = 2 ** (8 * 4) - 1
    two_byte_int = 2 ** (8 * 2) - 1
    one_byte_int = 7

    eight_byte_int_bytes = eight_byte_int.to_bytes(8, 'little')
    four_byte_int_bytes = four_byte_int.to_bytes(4, 'little')
    two_byte_int_bytes = two_byte_int.to_bytes(2, 'little')
    one_byte_int_bytes = one_byte_int.to_bytes(1, 'little')

    eight_byte_prefix = (0xff).to_bytes(1, 'little')
    four_byte_prefix = (0xfe).to_bytes(1, 'little')
    two_byte_prefix = (0xfd).to_bytes(1, 'little')

    eight_byte_var_int =  eight_byte_prefix + eight_byte_int_bytes
    four_byte_var_int = four_byte_prefix + four_byte_int_bytes
    two_byte_var_int = two_byte_prefix + two_byte_int_bytes
    one_byte_var_int = one_byte_int_bytes

    enumerated = (
        (eight_byte_int, eight_byte_var_int),
        (four_byte_int, four_byte_var_int),
        (two_byte_int, two_byte_var_int),
        (one_byte_int, one_byte_var_int),
    )
    for correct_int, var_int in enumerated:
        stream = BytesIO(var_int)
        calculated_int = read_varint(stream)
        assert correct_int == calculated_int, (correct_int, calculated_int)
        
    print('Test passed!')

test_read_varint()

Now that we have that out of the way:

### Exercise: Implement `read_varstr`

Read a varint, and then read that many bytes:

In [None]:
def read_varstr(stream):
    raise NotImplementedError()

In [None]:
def test_read_varstr():
    from lib import serialize_varstr

    long_str = b"A purely peer-to-peer version of electronic cash would allow online payments to be sent directly from one party to another without going through a financial institution. Digital signatures provide part of the solution, but the main benefits are lost if a trusted third party is still required to prevent double-spending. We propose a solution to the double-spending problem using a peer-to-peer network.  The network timestamps transactions by hashing them into an ongoing chain of hash-based proof-of-work, forming a record that cannot be changed without redoing the proof-of-work. The longest chain not only serves as proof of the sequence of events witnessed, but proof that it came from the largest pool of CPU power. As long as a majority of CPU power is controlled by nodes that are not cooperating to attack the network, they'll generate the longest chain and outpace attackers. The network itself requires minimal structure. Messages are broadcast on a best effort basis, and nodes can leave and rejoin the network at will, accepting the longest proof-of-work chain as proof of what happened while they were gone."
    long_var_str = serialize_varstr(long_str)

    short_str = b"!"
    short_var_str = serialize_varstr(short_str)

    enumerated = (
        (short_str, short_var_str),
        (long_str, long_var_str),
    )
    for correct_byte_str, var_str in enumerated:
        stream = BytesIO(var_str)
        calculated_byte_str = read_varstr(stream)
        assert correct_byte_str == calculated_byte_str
    print('Test passed!')
    
test_read_varstr()

### Exercise: `read_version_payload` calls your `read_varstr`

This one is easy. In `read_version_payload`, replace the `magic_read_varstr` I gave you with the `read_varstr` you just wrote.

In [None]:
def read_version_payload(stream):
    raise NotImplementedError()

In [None]:
def test_read_version_payload_varstr():
    stream = BytesIO(VERSION_PAYLOAD)
    payload = read_version_payload(stream)
    assert payload['user_agent'] == b'/Satoshi:0.16.3/'
    
    print('Test passed!')
    
test_read_version_payload_varstr()

### Progress

- [x] `read_version_payload` v1: convert payload to dictionary with `bytes` values
- [x] Interpret Integer fields
    - [x] Convert bytes to integers
    - [x] Interpret `version`
    - [x] Interpret `timestamp`
    - [x] Interpret `services`
- [x] Interpret boolean `relay`
- [x] Interpret variable-length `user_agent`
    - [x] Implement `read_varint`
    - [x] Implement `read_varstr`
- [ ] Interpret sender and receiver `net_addr` values
    - [ ] Interpret IP addresses
    - [ ] Interpret big-endian port numbers
- [ ] `read_version_payload` v2: convert payload to dictionary interpreted values where appropriate

# "Network Address" Type

[`net_addr`](https://en.bitcoin.it/wiki/Protocol_documentation#Network_address) is the most complicated new type we encounter this lesson, so we'll handle it last. Plus, it builds on the `timestamp` and `services` types we learned to read above.

![image](../images/network-address.png)

Network addresses require we interpret 4 new kinds of data:

1. `time`: Unix timestamp. Already done.
2. `services`: integer bitfield. Already done.
3. `IP address`: complicated ...
4. `port`: big-endian encoded `int`

### IP Address

IPv4 addresses are 4 bytes. We usually represent them with the decimal representation of each byte separated by dots (`.`):

```
172.16.254.1
```

Since there are only `256**4 = 4_294_967_296` possible addresses -- less than the global human population -- it was decided that a new format was needed: IPv6.

IPv6 addresses are 16 bytes. They are represented as eight groups of four hexadecimal digits with the groups being separated by colons, for example `2001:0db8:0000:0042:0000:8a2e:0370:7334`, but methods to abbreviate this full notation exist. So 2 bytes exist between every colon ...

Since the internet will eventually need to migrate to IPv6 addresses, Satoshi decided to make the `ip` field in the `net_addr` structure 16 bytes -- just like IPv6 addresses.

But this begs a question: with all this extra space, how should we represent a 4-byte IPv4 address?

As the network address table above describes, Satoshi employed an IPv4-mapped IPv4 address. This just means that he stuck a 12 byte prefix on the left side: `b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff'`, the rightmost 4 bytes represent the IPv4 address.

### Exercise: Read an IPv4 address

What IP address is encoded in the `ip_bytes` address below?

In [None]:
# Not unittest this time. 
# Play around with this in Jupyter and try to extract the IP address. 
# Maybe make a few more cells and break the process into a few steps. 
# Have fun with it!

ipv4_bytes = b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\n\x00\x00\x01'
ipv4_bytes

[answer](./Answers.ipynb#Exercise:-Read-an-IPv4-address)

### Exercise: Read an IPv6 address

What IP address is encoded in the `ip_bytes` address below?

In [None]:
ipv6_bytes = b'\xfe\x80\x00\x00\x00\x00\x00\x00\x02\x02\xb3\xff\xfe\x1e\x83)'
ipv6_bytes

[answer](./Answers.ipynb#Exercise:-Read-an-IPv6-Address)

Luckily, we won't have to rely on our own code for such a low-level operation: [`socket.inet_ntop(address_family, packed_ip)](https://docs.python.org/3/library/socket.html#socket.inet_ntop) will do this work for us:

In [None]:
# socket.AF_INET is the IPv4 address family
# Because second argument is a "packed" ip address, 
# we must lop off the prefix before calling
socket.inet_ntop(socket.AF_INET, ipv4_bytes[-4:])

In [None]:
# socket.AF_INET6 is the IPv6 address family
# Notice how it does some compacting of empty parts for us ...
socket.inet_ntop(socket.AF_INET6, ipv6_bytes)

### Exercise: `bytes_to_ip`

Write a function that will take 16 bytes and convert to the correct IP addres string based on whether it is a IPv4 or IPv6.

In [None]:
# this IPv4 prefix will be useful:
IPV4_PREFIX = b"\x00" * 10 + b"\xff" * 2

def bytes_to_ip(b):
    raise NotImplementedError()

In [None]:
def test_bytes_to_ip():
    assert bytes_to_ip(ipv4_bytes) == '10.0.0.1'
    assert bytes_to_ip(ipv6_bytes) == 'fe80::202:b3ff:fe1e:8329'
    print('Test passed!')
    
test_bytes_to_ip()

### Exercise: `read_address`

Since network addresses are sort of their own "type" -- and this code will be re-used when we write the crawler -- let's create a `read_address` which our `read_version_payload` function can call when it reaches the network address bytes.

We give it a `has_timestamp` variable because the docs mention how the `timestamp` attribute is always present *except* in version messages. So the `has_timestamp` flag will be used to denote whether or not we're in such a context ...

![image](../images/netaddr-timestamp.png)

Fill in the `?`s with appropriate values:

In [None]:
def read_address(stream, has_timestamp):
    r = {}
    if has_timestamp:
        r["timestamp"] = ?
    r["services"] = ?
    r["ip"] = ?
    r["port"] = ?
    return r

In [None]:
def test_read_address():
    services = 7
    services_bytes = services.to_bytes(8, 'little')
    ipv4 = '10.0.10.0'
    ipv4_bytes = IPV4_PREFIX + socket.inet_pton(socket.AF_INET, ipv4)
    ipv6 = '2a02:1205:501e:d30:f57a:6958:7e47:2694'
    ipv6_bytes = socket.inet_pton(socket.AF_INET6, ipv6)
    port = 8333
    port_bytes = port.to_bytes(2, 'big')
    
    # IPv4
    stream = BytesIO(services_bytes + ipv4_bytes + port_bytes)
    address = read_address(stream, has_timestamp=False)
    assert address['services'] == services
    assert address['ip'] == ipv4
    assert address['port'] == port
    
    # IPv6
    stream = BytesIO(services_bytes + ipv6_bytes + port_bytes)
    address = read_address(stream, has_timestamp=False)
    assert address['ip'] == ipv6
    
    print('Test passed!')

test_read_address()

- [x] `read_version_payload` v1: convert payload to dictionary with `bytes` values
- [x] Interpret Integer fields
    - [x] Convert bytes to integers
    - [x] Interpret `version`
    - [x] Interpret `timestamp`
    - [x] Interpret `services`
- [x] Interpret boolean `relay`
- [x] Interpret variable-length `user_agent`
    - [x] Implement `read_varint`
    - [x] Implement `read_varstr`
- [x] Interpret sender and receiver `net_addr` values
    - [x] Interpret IP addresses
    - [x] Interpret big-endian port numbers
- [ ] `read_version_payload` v2: convert payload to dictionary interpreted values where appropriate

# Putting it all together

### Exercise: final `read_version_payload`

Copy the last version of `read_version_payload` you had written, and add in calls to `read_address` where appropriate. This is the finished product!

In [None]:
def read_version_payload(stream):
    raise NotImplementedError()

version_payload = read_version_payload(BytesIO(VERSION_PAYLOAD))
version_payload

In [None]:
def test_read_version_payload_final():
    vp = version_payload = read_version_payload(BytesIO(VERSION_PAYLOAD))
    assert vp['version'] == 70015
    assert vp['services'] == 1037
    assert vp['timestamp'] == 1550465074
    assert vp['receiver_address'] == {'services': 0, 'ip': '70.113.80.71', 'port': 43206}
    assert vp['sender_address'] =={'services': 1037, 'ip': '::', 'port': 0}
    assert vp['nonce'] == 8298811190300702753
    assert vp['user_agent'] == b'/Satoshi:0.16.3/'
    assert vp['start_height'] == 563541
    assert vp['relay'] == 1
    print('Test Passed')

test_read_version_payload_final()

# That's all, folks!

If you have any feedback about this lesson, take a moment and fill out [this 1 question Google form](https://goo.gl/forms/izKsoQkN5QSDGHF63). This will help me iterate and improve this course!

# Homework

* follow along with this article: https://coinlogic.wordpress.com/2014/03/09/the-bitcoin-protocol-4-network-messages-1-version/. It covers how to use Wireshark to observe the messages that `bitcoind` sends.