# Basics of using protobufs
- based mostly on https://developers.google.com/protocol-buffers/docs/pythontutorial

# Step 0: installing dependencies and protobuf

### install the protobuf tooling
- simple setup: ```sudo snap install protobuf --classic ```
- complete setup: https://github.com/protocolbuffers/protobuf/blob/master/src/README.md

### install python requirements libraries
```pip install -r requirements.txt```

# Step 1: define a protocol format
- stored in .proto file
- a message for each data structure you want to serialize

In [1]:
# canonical example used by google documentation is an address book
# it's a list of people with their phone numbers
# https://developers.google.com/protocol-buffers/docs/pythontutorial
! cat proto/addressbook.proto


syntax = "proto2";

package tutorial;

message Person {
  optional string name = 1;
  optional int32 id = 2;
  optional string email = 3;

  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
  }

  message PhoneNumber {
    optional string number = 1;
    optional PhoneType type = 2 [default = HOME];
  }

  repeated PhoneNumber phones = 4;
}

message AddressBook {
  repeated Person people = 1;
}


# Step 2: Generate language-specific bindings
- given the .proto schema, generate the necessary files for target languages
- protoc is the protobuf compiler, it creates necessary boilerplace files in the target

<img src="../figures/protoc.png"/>

In [2]:
import os

!rm -rf tmp
!mkdir tmp

SRC_DIR = os.path.realpath("proto") # keep the .proto file in proto folder
DST_DIR = os.path.realpath("tmp") # keep generated files in tmp folder

# generate python bindings
!protoc -I=$SRC_DIR --python_out=$DST_DIR  $SRC_DIR/addressbook.proto

# generate c++ bindings for later
!protoc -I=$SRC_DIR --cpp_out=$DST_DIR  $SRC_DIR/addressbook.proto

/bin/bash: protoc: command not found
/bin/bash: protoc: command not found


# Step 3: instantiation, serialization and de-serialization
- encoding process described in more details here https://developers.google.com/protocol-buffers/docs/encoding

In [None]:
import sys
sys.path.insert(0, DST_DIR) # add the tmp folder to the PYTHON_PATH

import addressbook_pb2

### Equality of objects is based on state

In [None]:
def instantiate_person():
    person = addressbook_pb2.Person()
    person.id = 1234
    person.name = "person_name"
    person.email = "person_email"
    phone = person.phones.add()
    phone.number = "123-4567"
    phone.type = addressbook_pb2.Person.HOME
    return person

def test_equality():
    """establishes the concept of equality for a protobuf message in python"""
    first = instantiate_person()
    second = instantiate_person()

    # different memory addresses
    assert id(first) != id(second)

    # yet they are equal since their fields have the same values
    assert first == second

    # once a field changes, they're not equal anymore
    first.name = "new_name"
    assert first != second

test_equality()

### Objects can be converted back and forth to wire format (serialized)

In [None]:
def test_serialization():
    """establishes properties of the conversion object <-> bytes"""
    first = instantiate_person()
    
    first_serialized = first.SerializeToString()
    
    # SerializeToString generates a byte representation of the object
    # this is adequate to send over network or store in a file as long as the
    # protobuf schema doesn't change in certain ways
    assert isinstance(first_serialized, bytes)
    print("this is what it looks like serialized:\n{}".format(first_serialized))
    
    # we can re-construct an object from bytes after serializing
    first_deserialized = addressbook_pb2.Person()
    first_deserialized.ParseFromString(first_serialized)

    assert id(first) != id(first_deserialized)
    assert first == first_deserialized

test_serialization()

### We can convert a protobuf object to json (useful for debugging)

In [None]:
from google.protobuf.json_format import MessageToJson

def test_json_conversion():
    """verify what a """
    person = instantiate_person()
    return MessageToJson(person)

print(test_json_conversion())

## Step 4: communication between languages/architectures
- one of the big challenges of a wire format is to abstract away architecture-specific details e.g. endianness
- there's an extensive description of various "wire types" in the message structure part of https://developers.google.com/protocol-buffers/docs/encoding


In [None]:
# compile the cpp code
!g++ -L $DST_DIR cpp/proto_reader.cpp -lprotobuf -pthread

In [None]:
help(person)

In [38]:
id(p1)

139936239060032