# Data Encoding, Decoding and Flow

Sure! Let’s go step by step and break down **Protocol Buffers** (often called **Protobuf**), a technology developed by Google. I’ll keep it beginner-friendly, but also cover technical details you’ll need for practical use.

---

## 1️⃣ What are Protocol Buffers?

**Protocol Buffers** are a **language-neutral, platform-neutral, extensible mechanism for serializing structured data**—basically, a way to encode data efficiently for storage or communication.

Think of it as **like JSON or XML**, but:

* Much **faster**
* **Smaller** in size
* Strongly typed

They are widely used in **network communications, microservices, or storing data** where efficiency matters.

---

## 2️⃣ Why use Protocol Buffers?

Compared to JSON or XML:

| Feature        | JSON/XML        | Protobuf                                               |
| -------------- | --------------- | ------------------------------------------------------ |
| Size           | Larger          | Smaller (binary format)                                |
| Speed          | Slower to parse | Faster parsing                                         |
| Schema         | Optional        | Mandatory (strongly typed)                             |
| Versioning     | Harder          | Easy (can add/remove fields without breaking old data) |
| Human-readable | Yes             | No (binary)                                            |

So, Protobuf is preferred when you care about **performance and bandwidth**, especially in **distributed systems**.

---

## 3️⃣ How it works

Protobuf involves **three main steps**:

### Step 1: Define your data structure

You create a `.proto` file describing your data schema. Example:

```proto
syntax = "proto3";

message Person {
  int32 id = 1;
  string name = 2;
  string email = 3;
}
```

* `message` is like a class or struct.
* Each field has a **type**, a **name**, and a **unique tag number** (used in the binary encoding).
* `syntax = "proto3";` specifies the version of Protobuf.

---

### Step 2: Compile the schema

Use the **Protobuf compiler** (`protoc`) to generate code for your programming language:

```bash
protoc --python_out=. person.proto
```

This generates a Python class `Person` that you can use to **serialize/deserialize** objects.

---

### Step 3: Serialize and deserialize data

Example in Python:

```python
from person_pb2 import Person

# Create a new person
p = Person()
p.id = 1
p.name = "Elton"
p.email = "elton@example.com"

# Serialize to binary
data = p.SerializeToString()

# Deserialize
p2 = Person()
p2.ParseFromString(data)

print(p2)
```

* `SerializeToString()` → converts the object into a **compact binary format**.
* `ParseFromString()` → converts binary back to the object.

---

## 4️⃣ Key Concepts

1. **Fields and Tags**: Each field in a message has a unique tag number. This is used in the binary format, not the field name.
2. **Optional/Repeated fields**:

   * `repeated` → like a list/array.
   * Fields can be optional; missing fields just get default values.
3. **Nested messages**: Messages can contain other messages.
4. **Backward/forward compatibility**: You can add/remove fields without breaking old code, as long as you don’t reuse tag numbers.

---

## 5️⃣ Advantages

* Compact: smaller than JSON/XML.
* Fast: binary encoding is optimized.
* Cross-platform: supports many languages (Python, Java, Go, C++, JavaScript, etc.)
* Version-tolerant: easy to evolve data schemas.

---

## 6️⃣ When to use it

* Microservices communication (gRPC uses Protobuf by default)
* Mobile apps communicating with a server
* Storing large datasets in a compact format
* Any high-performance network application

---

Protobuf VS Thrift

Sure! Let’s carefully compare **Protocol Buffers (Protobuf)** and **Apache Thrift**, two popular frameworks for data serialization and RPC. I’ll break it down clearly.

---

## 1️⃣ What They Are

| Feature          | Protocol Buffers (Protobuf)              | Apache Thrift                             |
| ---------------- | ---------------------------------------- | ----------------------------------------- |
| Purpose          | Serialization + optional RPC (with gRPC) | Serialization + RPC framework             |
| Developer        | Google                                   | Facebook (now Apache)                     |
| Format           | Binary                                   | Binary (compact) and others               |
| Language Support | Many (Python, Java, Go, C++, etc.)       | Many (Python, Java, C++, Go, Ruby, etc.)  |
| Human-readable   | No (binary)                              | No (binary); JSON is supported optionally |

Both are designed to **serialize structured data efficiently** and support multiple programming languages. Thrift also **natively supports RPC**, whereas Protobuf does this mostly through **gRPC**.

---

## 2️⃣ Data Definition

### Protobuf

* Uses `.proto` files.
* Strongly typed.
* Supports nested messages and repeated fields.

```proto
message Person {
  int32 id = 1;
  string name = 2;
}
```

### Thrift

* Uses `.thrift` files.
* Strongly typed.
* Supports structs, enums, and services (for RPC).

```thrift
struct Person {
  1: i32 id,
  2: string name
}
```

* Notice that Thrift also lets you define **services** directly in the same file.

---

## 3️⃣ RPC Support

| Feature            | Protobuf                             | Thrift                                              |
| ------------------ | ------------------------------------ | --------------------------------------------------- |
| RPC built-in?      | No (gRPC adds RPC)                   | Yes                                                 |
| Service Definition | Through gRPC `.proto` service blocks | `.thrift` file service block                        |
| Transport/Protocol | Needs gRPC or custom transport       | Has transport + protocol built-in (HTTP, TCP, etc.) |

Example of a Thrift service:

```thrift
service UserService {
  Person getUserById(1:i32 id)
}
```

Protobuf requires **gRPC** to define services:

```proto
service UserService {
  rpc GetUserById (UserRequest) returns (UserResponse);
}
```

---

## 4️⃣ Serialization & Performance

| Feature          | Protobuf                 | Thrift                                     |
| ---------------- | ------------------------ | ------------------------------------------ |
| Encoding         | Binary (compact)         | Binary (compact), JSON, or other protocols |
| Size             | Very small               | Small, slightly bigger than Protobuf       |
| Speed            | Very fast                | Comparable, sometimes slightly slower      |
| Schema Evolution | Add/remove fields easily | Add/remove fields easily                   |

Both are **efficient binary formats**, but Protobuf tends to be **slightly smaller and faster** in practice.

---

## 5️⃣ Ecosystem & Use Cases

### Protobuf

* Mostly used with **gRPC** for modern microservices.
* Popular in Google, cloud-native systems.
* Strong ecosystem in Go, Python, Java.

### Thrift

* Used where **both serialization + RPC** is needed out-of-the-box.
* Often in legacy Facebook / Apache systems.
* Supports **more protocols/transports natively** (e.g., JSON, HTTP, TCP).

---

## 6️⃣ Pros & Cons

### Protobuf

**Pros:**

* Small, fast, widely used.
* Excellent cross-language support.
* Works very well with gRPC.

**Cons:**

* No native RPC support (needs gRPC).
* Binary format is not human-readable.

### Thrift

**Pros:**

* Built-in RPC and multiple protocols.
* Works well for legacy and diverse systems.

**Cons:**

* Slightly more complex.
* Less compact than Protobuf in some cases.
* Smaller community than Protobuf + gRPC.

---

### ✅ Summary

| Aspect                                         | Recommendation |
| ---------------------------------------------- | -------------- |
| Microservices with gRPC                        | Protobuf       |
| Lightweight serialization                      | Protobuf       |
| Need built-in RPC & multiple transport options | Thrift         |
| Large enterprise systems with legacy RPC       | Thrift         |

---


## Protocol Buffers (Protobuf)

Protobuf types include:

- double: double precision floating point number
- float: single precision floating point number
- int32: signed integer, uses variable-length encoding
- int64: signed integer, uses variable-length encoding
- uint32: unsigned integer, uses variable-length encoding
- uint64: unsigned integer, uses variable-length encoding
- sint32: signed integer, uses variable-length encoding, more efficient than int32
- sint64: signed integer, uses variable-length encoding, more efficient than int64
- fixed32: unsigned integer, always 4 bytes
- fixed64: unsigned integer, always 8 bytes
- sfixed32: signed integer, always 4 bytes
- sfixed64: signed integer, always 8 bytes
- bool: boolean value
- string: UTF-8 text string
- bytes: sequence of bytes
- enum: enumerated type
- message: nested message type
- map: map type
- Any: dynamic type

Protobuf schema definitions are defined in `.proto` files. The Protobuf compiler generates code in various languages from the `.proto` files.

### Encoding

We can encode the previous example record in Protobuf using the following schema in the `.proto` file:

```protobuf
message Person {
  required string user_name = 1;
  optional int64 favorite_number = 2;
  repeated string interests = 3;
}
```

The data encoded with this schema looks like this:
![protobuf](../assets/protobuf.png)

Protocol Buffers have an interesting aspect regarding its datatype handling. Unlike having a specific list or array datatype, it utilizes a `repeated` marker for fields, which serves as a third option alongside `required` and `optional`.

As depicted in the figure, a repeated field is simply represented by the same field tag appearing multiple times in the record. The advantage of this approach is that converting an optional (single-valued) field into a repeated (multi-valued) field is permissible. When new code reads old data, it interprets it as a list with either zero or one element, depending on whether the field was present. On the other hand, old code reading new data only perceives the last element of the list.

### gRPC

Nows, let's look at how to use the generated code to make remote procedure calls. We will use gRPC, which is a high-performance RPC framework built on top of Protobuf. gRPC is a client-server application where the client initiates an RPC call and waits for a response from the server. The server executes the requested operation and returns a response to the client.

The cell block below (explanation):

syntax = "proto3"
    .Specifices Protobuf version 3

message Person {
  string user_name = 1;
  optional int64 favorite_number = 2;
  repeated string interests = 3;
}
   . Message: Person
    . user_name ->required by default
    . optional favorite_number 
    . interests ->repeated indicates a list of strings

message CourseRequest {
  Person person = 1;
  string course = 2;
}
    . CourseRequest bundles a Person object with a course name
    . Used as the input type for gRPC service

service School {
  rpc teachCourse(CourseRequest) returns (Person) {}
}
    . Defines a gRPC service called School
    . teachCourse is an RPC method:
        . Takes CourseRequest as input
        . Returns a Person object

In [None]:
%%writefile ../schema/person.proto
syntax = "proto3";

message Person {
  string user_name = 1;
  optional int64 favorite_number = 2;
  repeated string interests = 3;
}

message CourseRequest {
  Person person = 1;
  string course = 2;
}


service School {
  rpc teachCourse(CourseRequest) returns (Person) {}
}

Then, run the following command in a terminal to generate the Python code:

```bash
python -m grpc_tools.protoc -I./schema --python_out=. --grpc_python_out=. ./schema/person.proto
```

This will generate the following files:

```bash
person_pb2.py
person_pb2_grpc.py
```

In [None]:
%%writefile ../person_protobuf_server.py
from concurrent import futures
import grpc
import person_pb2_grpc


class School(person_pb2_grpc.SchoolServicer):

  def teachCourse(self, request, context):
    request.person.interests.append(request.course)
    return request.person

server = grpc.server(futures.ThreadPoolExecutor(max_workers=2))
person_pb2_grpc.add_SchoolServicer_to_server(
    School(), server)
server.add_insecure_port('[::]:50051')
server.start()
server.wait_for_termination()

Then, run `python person_protobuf_server.py` in a new terminal. This will start the server.

In [None]:
import sys
sys.path.append('..')
import grpc
import person_pb2
import person_pb2_grpc


In [None]:
def teach_course(stub, person, course):
    person = stub.teachCourse(person_pb2.CourseRequest(person=person, course=course))
    return person


with grpc.insecure_channel('localhost:50051') as channel:
    martin = person_pb2.Person(user_name='Martin', favorite_number=1337, interests=["daydreaming", "hacking"])
    course = "coding"
    stub = person_pb2_grpc.SchoolStub(channel)
    martin = teach_course(stub, martin, course)
    print(martin.interests)

The flow is the same as **Thrift**—the main differences are syntax, tools, and some framework conventions. Let’s break it down:

---

## 1️⃣ Core Flow Comparison: Protobuf/gRPC vs Thrift

| Step | Protobuf / gRPC                                             | Thrift                                                         |
| ---- | ----------------------------------------------------------- | -------------------------------------------------------------- |
| 1    | Define **messages & services** in `.proto`                  | Define **structs & services** in `.thrift`                     |
| 2    | Compile `.proto` → generates code for your language         | Compile `.thrift` → generates code for your language           |
| 3    | Implement **server** (inherits from generated base class)   | Implement **server** (inherits from generated interface class) |
| 4    | Implement **client** → create stub / call RPC methods       | Implement **client** → create client / call RPC methods        |
| 5    | Data is **serialized & sent over the wire** (binary format) | Data is **serialized & sent over the wire** (binary format)    |
| 6    | RPC method returns structured objects                       | RPC method returns structured objects                          |

---

### 2️⃣ What’s Different

1. **Syntax**

   * `.proto` vs `.thrift`
   * Protobuf uses `message` and `service`; Thrift uses `struct` and `service`.
2. **Framework**

   * gRPC (Protobuf) uses stubs for client/server.
   * Thrift has its own **transport + protocol layers** and can run over TCP, HTTP, etc.
3. **Optional Fields**

   * `proto3` has default handling and optional flags.
   * Thrift has `optional`, `required`, and `default` values.
4. **Language Support & Ecosystem**

   * Both support multiple languages, but gRPC + Protobuf has stronger adoption in cloud/microservices.
5. **Streaming**

   * gRPC supports **streaming RPCs** natively.
   * Thrift also supports some streaming patterns, but more manual.

---

### 3️⃣ Example Flow in Your Code

1. **Define `Person` and `CourseRequest`** → `.proto` (like Thrift `.thrift`)
2. **Compile** → generates `person_pb2.py` + `person_pb2_grpc.py` (like Thrift-generated code)
3. **Server**:

   * Implements `teachCourse` → modifies `Person` object (like Thrift service implementation)
4. **Client**:

   * Creates a stub and calls `teachCourse` → gets returned `Person` object (like Thrift client)

> So yes, the **conceptual architecture is identical**, only the **code style, tooling, and protocols** differ.

---

If you want, I can draw a **side-by-side diagram showing Protobuf/gRPC vs Thrift flow**—it makes it extremely easy to visualize the similarity and differences.

Do you want me to do that?


> Add a new field `grade` (0-100) with an appropriate type annotation to the `Person` message. Then, add a new method `assignGrade` to the `School` service that takes a `GradeRequest` message (which consists of a `Person` record and a `grade` arguments), assigns the `grade` to the `Person` and returns the `Person`. Then call the method by passing "Martin" `person` and a grade number, and print his grade.