# Data Encoding, Decoding and Flow

Apache Thrift is an open-source framework for developing cross-language services and applications. It enables communication between systems that use different programming languages by providing a way to define data types and service interfaces in a simple, compact language. Thrift then generates the necessary code for a wide range of programming languages, allowing for seamless communication between systems that otherwise wouldn't be able to understand each other.

You can think of Apache Thrift as a translator that helps different systems (written in different programming languages) communicate with each other.
Just like a translator would take a message in one language and convert it into another language, Apache Thrift helps different parts of a system "talk" to each other by converting data structures and method calls from one programming language to another.

Here are the main components of Apache Thrift:

1. **Interface Definition Language (IDL)**: Thrift uses its own IDL to define data types and services. You describe your data structures and RPC (Remote Procedure Call) service methods using this language. For example, you can define complex data structures (like structs and enums) and service operations that can be called remotely.

2. **Code Generation**: After defining the data types and services in Thrift IDL, the framework generates source code in multiple languages (such as Java, C++, Python, Go, etc.). This code includes the data structures, serialization, and service handling mechanisms.

3. **Serialization/Deserialization**: Thrift allows for efficient binary serialization and deserialization of data, making it suitable for high-performance applications, especially when transferring large amounts of data across networks.

4. **Cross-Language Communication**: Thrift supports many programming languages, and each Thrift-generated library includes a transport layer, which handles sending and receiving data across the network. This allows systems written in different languages to communicate easily with each other.

5. **RPC (Remote Procedure Call)**: It provides an RPC framework that allows clients to call functions on a remote server as if they were local, with the underlying complexity of network communication abstracted away.

**Use Cases**:

* Building microservices that communicate across different languages.
* Creating high-performance, cross-platform systems where different components are written in different languages.
* Efficient communication in distributed systems, especially in environments where high scalability and performance are critical.



### How it works, step by step:

1. **Define a common language (IDL)**: You describe the data structures and services in **Thrift's Interface Definition Language (IDL)**. This is like creating a "universal dictionary" that all systems can understand.

2. **Generate the translation code**: Thrift generates code in multiple programming languages (like Python, Java, C++, etc.) from your IDL. This code includes the data structures, methods, and serialization/deserialization logic needed to translate between languages.

3. **Server (Speaker 1)**: On the server side, you implement the actual service methods, using the Thrift-generated code. The server will "speak" its own language (e.g., Python) but use Thrift to translate the messages from other languages.

4. **Client (Speaker 2)**: On the client side, you use Thrift-generated code to call the server's methods, and Thrift handles the translation. The client might be in a different language (e.g., Java), but Thrift allows it to communicate with the server by "translating" the data.

### Example:

* You have a **Python** server that needs to accept requests from a **Java** client.
* You define the data structures (like `User` objects) and methods (like `getUserDetails()`) in Thrift's IDL.
* Thrift generates the necessary Python and Java code to **serialize** the `User` data and **make remote calls** between the two languages.
* When the Java client sends a request to the Python server, Thrift handles the **translation** of the method call and data between the two languages.

So, in this case, Thrift acts as the "translator" that allows the client and server to understand each other, even if they speak different programming languages.

Does this help clarify the "translator" analogy? Would you like an example of how to use Thrift to set up a simple server-client communication?


## Apache Thrift

The thrift type system includes base types like _bool, byte, double, string and integer_ but also special types like _binary_ and _struct_ (like classes) and also containers (_list, set, map_) that correspond to commonly available interfaces in most programming languages.

Base types:

- bool: A boolean value (true or false)
- byte: An 8-bit signed integer
- i16: A 16-bit signed integer
- i32: A 32-bit signed integer
- i64: A 64-bit signed integer
- double: A 64-bit floating point number
- string: A text string encoded using UTF-8 encoding

Thrift type definitions are defined in `.thrift` files. The Thrift compiler generates code in various languages from the `.thrift` files.

### Encoding

Let's use the following example record (JSON or dictionary-like) to encode:

```json
{
  "userName": "Martin",
  "favoriteNumber": 1337,
  "interests": ["daydreaming", "hacking"]
}
```

We can encode the record in Thrift using the following schema in the `.thrift` file:

```thrift
struct Person {
  1: required string userName,
  2: optional i64 favoriteNumber,
  3: optional list<string> interests
}
```

Thrift comes with a code generation tool that takes a schema definition like the ones shown here, and produces classes that implement the schema in various programming languages. Our code can call this generated code to encode or decode records of the schema.

The data encoded with this schema looks like this:
![thrift_binary_protocol](../assets/thrift_binary_protocol.png)

Each field has a type annotation (to indicate whether it is a string, integer, list, etc.) and, where required, a length indication (length of a string, number of items in a list). The strings that appear in the data (‚ÄúMartin‚Äù, ‚Äúdaydreaming‚Äù, ‚Äúhacking‚Äù) are encoded as UTF-8.

There are no field names (userName, favoriteNumber, interests). Instead, the encoded data contains _field tags_, which are numbers (1, 2, and 3). Those are the numbers that appear in the schema definition. Field tags are like aliases for fields‚Äîthey are a compact way of saying what field we‚Äôre talking about, without having to spell out the field name.

Next, let's add a service. A service is a collection of method interfaces that can be called remotely. A service is defined in a `.thrift` file like this:

```thrift
service School {
    Person teachCourse(1: required Person person, 2: required string course)
}
```

The first line declares a service called `School`. The second line declares a method called `teachCourses`, which takes two arguments: a `Person` record and a `string`. The method returns a `Person` record.

### RPC - Remote Procedure Call

Remote Procedure Call - Calling a function on another computer as if it were a local function. The function will actually runs on the another machine and not your own.

Nows, let's look at how to use the generated code to make remote procedure calls. We will write codes for 2 sides of the server-client application- the client initiates an RPC call and waits for a response from the server. The server executes the requested operation and returns a response to the client.

Here, we use `%%writefile` magic command to write the code to a file instead of running it in the cell.

üéØ Why RPC matters

RPC hides the networking complexity.
You don't have to handle:
sockets
JSON parsing
HTTP handling
serialization
protocols
message framing
The RPC framework (Thrift, gRPC, etc.) handles all of that.
You just call a method, and it magically happens across the network.

In [None]:
%%writefile ../schema/person.thrift

struct Person {
  1: required string userName,
  2: optional i64 favoriteNumber,
  3: optional list<string> interests
}

service School {
    Person teachCourse(1: required Person person, 2: required string course)
}

The School Service-

A service defines RPC (remote procedure call) methods.
This is what clients can call on the server.

Method Name - teachCourse
Requirement Arguments - (1: required Person person, 2: required string course)
    Argument 1 -
        Field ID =1, Required, Data Type: Person struct (previously defined)
    Argument 2 -
        Field ID =2, Required, Type: string 
    Return type: person (Method returns a Person object)

üß† So what does this service do conceptually?
The client calls:

updatedPerson = client.teachCourse(person, "Math")

The server receives:
    . a Person object
    . the string "Math"

Then the server returns another Person object (maybe modified).

Example logic the server might implement:
    append the course to interests
    update the person‚Äôs learning progress
    return the updated record

The code block below is server-side code:
    . Loads the Thrift schema
    . Implements the service functions (teachCourse)
    . Starts a server that listens for client RPC calls
    . Waits for clients to call teachCourse(person, course)

It does not create a client.
It does not send data anywhere
It only receives requests and returns responses

In [None]:
%%writefile ../person_thrift_server.py
import thriftpy2
person_thrift = thriftpy2.load("./schema/person.thrift", module_name="person_thrift")

from thriftpy2.rpc import make_server

class School(object):
    def teachCourse(self, person, course):
        person.interests.append(course)
        return person

server = make_server(person_thrift.School, School(), client_timeout=None)
server.serve()

Then, run `python person_thrift_server.py` in a new terminal. This will start the server.

The code block below creates client side:

person_thrift = thriftpy2.load("../schema/person.thrift", module_name="person_thrift")
    Load the Thrift schema - this loads the structs + service definitions (Person, School) into Python

school = make_client(person_thrift.School, timeout=None)
    Prepare to build a client

In [None]:
import thriftpy2
person_thrift = thriftpy2.load("../schema/person.thrift", module_name="person_thrift")

from thriftpy2.rpc import make_client

school = make_client(person_thrift.School, timeout=None)

In [None]:
martin = person_thrift.Person(
    userName="Martin", favoriteNumber=1337, interests=["daydreaming", "hacking"]
)

In [None]:
martin.interests

In [None]:
martin = school.teachCourse(martin, "coding")

The cell block above- Is a remote procedure call (RPC) using Thrift.

    CLIENT SIDE
    . This call is generated by Thrift when you run make_client() -school- from the .thrift schema
    . the client doesn't actually execute the function locally
    . Insteasd, it packages the arguments (martin and "coding") into a message (serialized binary or JSON) and sends it over the network to the server.

    SERVER SIDE
    . The servier is running the implemented service class:
    
        class School(object):
        def teachCourse(self, person, course):
            person.interests.append(course)
            return person

    . The Thrift server receives the message, deserializes the arguments, and calls the actual function -teachCourse - on the server
    . The server executes the logic and returns the result.

    BACK TO CLIENT
    . Thrift serializes the returned -Person- object and sends it back to the client
    . The client deserializes it automatically
    . Variable -martin- now contains the updated object reutrned from the server.

Visual Flow

Client Code: martin = school.teachCourse(martin, "coding")
        ‚îÇ
        ‚îÇ  (serialize Person + course)
        ‚ñº
   ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
   ‚îÇ  Network/Socket‚îÇ
   ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
        ‚îÇ
        ‚ñº
Server: teachCourse(person, course) executes
        ‚îÇ
        ‚îÇ  (serialize updated Person)
        ‚ñº
   ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
   ‚îÇ  Network/Socket‚îÇ
   ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
        ‚îÇ
        ‚ñº
Client receives updated Person ‚Üí assigns to `martin`



In [None]:
martin.interests

> 1. Add a new field `grade` (0-100) with an appropriate type annotation to the `Person` struct. Then, add a new method `assignGrade` to the `School` service that takes a `Person` record and a `grade` arguments, assigns the `grade` to the `Person` and returns the `Person`. Then call the method by passing `martin` and a grade number, and print his grade.
>
> 2. Add a method `teachCourses` to School to add a list of courses instead of just one course. Then pass `martin` and a list of course-- `["cooking", "sewing"]` to the method, and print his new interests.