Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split generated serde code from transport handling by introducing middlewares #343

Open
fishcakez opened this issue May 29, 2018 · 12 comments
Labels
Milestone

Comments

@fishcakez
Copy link
Member

fishcakez commented May 29, 2018

Currently we generate the client and server code for TFramedTransport and there is somewhat tight coupling. However there are four different parts to this end to end pipeline:

  • Client, which generates a set of functions representing the service
  • Protocol serde
  • Transport
  • Server behavior, which generates the callbacks required to act as the service

To decouple, we provide a common data structure that is passed through the pipeline on each of the peers:

%Thrift.Message{name: String.t, type: :call | :oneway | :reply | :exception, seq_id: integer, payload: struct()}

On the client side the function for a call (or oneway) would create a %Thrift.Message{} with the method name, request type and payload as the request struct. This would be passed to the protocol layer, which replaces the request struct in the message with a new struct (e.g. %Thrift.Binary{}) containing the serialized request struct. The protocol then passes the message to the transport layer (e.g. Thrift.Binary.Framed.Client) which converts the message to data and sends it on the socket. The transport layer then receives the response, deserializes into a new Thrift.Message with serialized payload struct (e.g. %Thrift.Binary{}) and returns the message to the protocol layer. The protocol layer deseralizes the payload to the response struct, puts that in the message's payload and returns the message to the generated client function. Finally the client function handles the response struct and converts to the function's result,
i.e. {:ok, result} | {:error, Exception.t}.

This means our client side control flow is Client -> Protocol -> Transport -> Protocol -> Client. For the server side we reverse this to Transport -> Protocol -> Server -> Protocol -> Transport.

On the server side the transport layer (e.g. Thrift.Binary.Framed.Server) receives data from the socket, deserializes this into a message containing the method name, request type, sequence id and serialized payload (e.g. Thrift.Binary struct). The message is passed to the protocol layer, which deserializes the payload to the request struct and passes the message to the server layer. The server layer unrolls the request struct and dispatches it to the server's callback module, which handles the request and returns a response. The server layer turns this response into the response struct, puts it in the message struct with the new type (e.g. :reply) and returns it to the protocol layer. The protocol layer serializes the response struct into its own payload (e.g. %Thrift.Binary{}) and returns it to the transport layer. The transport layer serializes the message and sends it over the socket.

Lets take this through an example with the following schema:

namespace elixir Example

service Service
{
  string ping(),
}

The generated client will generate code approximately like:

defmodule Example.Service.Client do
  def ping(stack) do
    case Thrift.Pipeline.call(stack, %Thrift.Message{name: "ping", type: :call, payload: Example.Service.PinArgs.new()}) do
      %Thrift.Message{type: :reply, payload: %Example.Service.PingResponse{success: string}} ->
        {:ok, string}
      %Thrift.Message{type: :exception, payload: %Thrift.TApplicationException{} = error ->
        {:error, error}
    end
  end
end

However it is awkward to have to pass the stack around to all the clients so we will allow a stack and client to
be compiled into their own module:

Thrift.defclient(Example.MyClient, Example.Service.Client, stack)

And then create the Example.MyClient with ping/0 with function spec:

@spec ping() :: {:ok, String.t} | {:error, Thrift.TApplicationException.t}

The client stack is a list of modules and arguments:

[{Example.Service.Binary, []}, {Thrift.Binary.Framed, [pool: Example.MyClient.Pool]}

With Thrift.Pipeline.call/2 definition approximately:

def call([{mod, opts}], msg), do: mod.call(msg, opts)
def call([{mod, opts} | stack], msg), do: mod.call(msg, &call(stack, &1), opts)

With Example.Service.Binary.call/3 definition approximately (next being &Thrift.Binary.Framed.call(&1, opts)):

def call(%Thrift.Message{type: type payload: req} = req_msg, next, _) when type in [:call, :oneway] do
  %Thrift.Message{payload: rep} = rep_msg = next.(%Thrift.Message{req_msg | payload: serialize(req)})
  %Thrift.Message{rep_msg | payload: deserialize(rep)}
end

With Thrift.Binary.Framed.call/2 definition approximately:

def call(%Thrift.Message{type: fun} = req_msg, opts) when fun in [:call, :oneway] do
  pool = Keyword.fetch!(opts, :pool)
  apply(Thrift.Binary.Framed.Pool, fun, [pool, req, opts])
end

The Thrift.Binary.Framed.Pool would checkout a connection, send request, receive reply, checkin connection.

On the server side we also have a compiled module:

Thrift.Binary.Framed.defserver(Example.MyServer, stack)

It is started like:

Example.MyServer.start_link(opts)

The server stack is a similar list of modules:

[{Example.Service.Binary, []}, {Example.Service.Handler, {Example.MyHandler, opts}}]

The Thrift.Binary.Framed.Server would have a function to handle each request, approximately:

defp handle_framed_packet(data, stack) do
  msg = deserialize(data)
  try do
    stack
    |> Thrift.Pipeline.call(msg)
    |> serialize()
  rescue
    error ->
      serialize(%Thrift.Message{msg | type: :exception, payload: %Thrift.TApplicationException{message: error.message}})
  end
end

With Example.Service.Binary.call/3 definition approximately (next being &Example.Service.Handler(&1, {handler, opts})):

def call(%Thrift.Message{type: type payload: req} = req_msg, next, _) when type in [:call, :oneway] do
  %Thrift.Message{payload: rep} = rep_msg = next.(%Thrift.Message{req_msg | payload: deserialize(req)})
  %Thrift.Message{rep_msg | payload: serialize(rep)}
end

With Example.Service.Handler.call/2 definition approximately:

def call(%Thrift.Message{type: :call, payload: %Example.Service.PingArgs{}} = msg, {handler, opts}) do
  string = apply(handler, :ping, [opts])
  %Thrift.Message{msg | type: :reply, payload: %Example.Service.PingResponse{success: string}}
end

Once these are in place we can now support custom middleware as elements in the stack on client and server side. For example we could support Thrift.BinaryCompact (#333) as a protocol instead of our current Thrift.Binary. We could support both, or even additional custom protocols, on the same server as we could dispatch to the correct protocol module based on the "magic bytes" in the framed binary message.

Notably the middlewares are identical on client and server so a single middleware can be reused on both sides. For example to measure per method latency, which is awkward to do right now. There are many other common or generic middlewares that would be useful. Fortunately there is a protocol-agnostic middleware library at https://github.com/fishcakez/stack we can build on top of that supports retries, concurrency limits, request criticality, deadlines and distributed tracing out of the box.

We don't currently support a thrift protocol/transport that supports distributed tracing. There are open source thrift protocols that do, TTwitter from finagle and THeader from fbthrift, which could be supported via middlewares. These would require an additional headers field in the Thrift.Message struct:

%Thrift.Message{name: String.t, type: :call | :oneway | :reply | :exception, seq_id: integer, headers: %{optional(atom) => binary}, payload: struct()}

The middleware design also enables a convenient way to test clients and servers because we can write a client that calls through to the server side implementation without the protocol or transport:

stack = [{Example.Service.Handler, {Example.MyHandler, opts}}]
Thrift.defclient(Example.MyTestClient, Example.Service.Client, stack)

We would implement this in stages:

@fishcakez fishcakez added the rfc label May 29, 2018
@jparise
Copy link
Collaborator

jparise commented May 29, 2018

This makes good sense to me. I like the pipeline-based model, and I appreciate the testing benefits we'll see as a result of this (re)organization.

I don't think we'll see much compile-time (speed) benefit from the change, will we? It seems like we'll still be generating a good amount of code for the clients, servers, and serialization functions.

We may always want to consider some small namespace changes, such as Thrift.Protocol.Binary instead of Thrift.Binary, but that's tangential to the proposal itself.

@jparise
Copy link
Collaborator

jparise commented May 29, 2018

If we agree on this approach, should we also include this in the 2.0 milestone?

@fishcakez
Copy link
Member Author

I don't think we'll see much compile-time (speed) benefit from the change, will we? It seems like we'll still be generating a good amount of code for the clients, servers, and serialization functions.

I agree that it should be about the same.

If we agree on this approach, should we also include this in the 2.0 milestone?

I'd like to get #323 completed before moving forward with this.

@thecodeboss
Copy link
Collaborator

This sounds great, I love the idea of being able to plug custom middleware into the middle of the stack to do things like tracing or logging. I think this change overall makes the implementation more thrift-like, where you can plug in whatever protocols or transports you want.

@scohen
Copy link
Collaborator

scohen commented Jul 31, 2018

This all sounds good (and is the approach that Absinthe uses), but why tie this internal refactor to 2.0 being released? Seems like a good candidate for 2.1.

@fishcakez
Copy link
Member Author

fishcakez commented Jul 31, 2018

This all sounds good (and is the approach that Absinthe uses), but why tie this internal refactor to 2.0 being released? Seems like a good candidate for 2.1.

This isnt in the 2.0 milestone :) but this should be usable externally, it isnt internal only.

@scohen
Copy link
Collaborator

scohen commented Jul 31, 2018

Perfect.

I guess you're right, it will have some effects, like having to do defclient.

@jparise jparise added this to the 2.1 milestone Aug 6, 2018
@fishcakez
Copy link
Member Author

fishcakez commented Nov 27, 2018

After discussion with @thecodeboss we are going to try a different approach, at least for the first two tasks, which should allow lower level access to the thrift seralization and deserialization, and provide idiomatic support for multiple protocols. After completing it we should be in much better shape for 2.0, and could move the remaining steps to a different repo (to avoid more complex parts than needed in the thrift library itself).

Introduce a Thrift.Serializable protocol that all generated structs (Thrift.Generator.StructGenerator) will implement.

defprotocol Thrift.Serializable do
  def serialize(thrift_struct, protocol_struct)
  def deserialize(thrift_struct, protocol_struct)
end

The implementations (per thrift struct) of the protocol will delegate to a SerDe protocol per thrift struct with arguments reversed so that the second level of delegation is per protocol struct. This would be equivalent to:

defimpl Thrift.Serializable do
  def serialize(%name{} = thrift_struct, protocol_struct) do
     Module.concat(name, SerDe).serialize(protocol_struct, thrift_struct)
  end

  def deserialize(%name{} = thrift_struct, protocol_struct) do
    Module.concat(name, SerDe).deserialize(protocol_struct, thrift_struct)
  end
end

With eachSerDe defined like:

defprotocol NameSpace.Struct.SerDe do
  alias NameSpace.Struct
  @spec serialize(protocol_struct, Struct.t) :: protocol_struct when protocol_struct: var
  def serialize(protocol_struct, thrift_struct)
  
  @spec deserialize(protocol_struct) :: {Struct.t, protocol_struct} | :error when protocol_struct: var
  def deserialize(protocol_struct)

  @spec deserialize(protocol_struct, Struct.t) :: {Struct.t, protocol_struct} | :error when protocol_struct: var
  def deserialize(protocol_struct, thrift_struct)
end

Therefore each thrift protocol (i.e. TBinary, TCompact #333) would implement this protocol for each struct in a schema, and this would be the only code generated for those protocols (e.g. Thrift.Generator.StructBinaryProtocol).

To avoid unnecessary allocations and dispatching logic the generated code (per protocol) will make assumptions about the defined implementations of other structs (for its protocol), and call the implementation module directly with a binary in place of the protocol struct. This is an internal optimization that won't be exposed externally as the elixir Protocol will delegate based on the protocol struct, so that code can never be reached externally. Note we already do a call with same overhead and assumptions between structs today. This would be equivalent to:

def serialize(binary, %_struct{nested: nested_struct}) when is_binary(binary) do
  [Namespace.NestedStruct.SerDe.Thrift.Protocol.Binary.serialize(binary, nested_struct), 0]
end

def serialize(%Thrift.Protocol.Binary{payload: payload} = wrapper, struct) do
  %Thrift.Protocol.Binary{wrapper | payload: serialize(payload, struct)}
end

def deserialize(binary) when is_binary(binary) do
  deserialize_struct(binary, %Namespace.Struct{})
end

def deserialize(wrapper) do
  deserialize(wrapper, %NameSpace.Struct{})
end
  
def deserialize(%Thrift.Protocol.Binary{payload: payload} = wrapper, default_struct) do
  case deserialize_struct(payload, default_struct) do
    {struct, rest} ->
      {struct, %Thrift.Protocol.Binary{payload: rest}}
    :error ->
      :error
  end
end

defp deserialize_struct(<<_field_id::16, bin::binary>>, acc) do
  case Namespace.NestedStruct.SerDe.Thrift.Protocol.Binary.deserialize(bin) do
    {nested_struct, rest} ->
        deserialize_struct(rest, %{acc | nested: nested_struct})
    :error ->
      :error
  end
end

Note the extra deserialize/1 not used by Thrift.Serializable. This is used to dispatch between structs so that we can avoid having %nested_struct{} in the top level struct's implementation. This means the module dependency remains a runtime dependency and does not become a compile time (struct field) dependency. The same would hold for users of the direct SerDe protocols. It would be possible to call NameSpace.NestedStruct.new/0 but this allows use to avoid the extra function call.

In order to fully decouple transports from protocols we also need to implement SerDe for Thrift.TApplicationException, and introduce Thrift.Message, with its own SerDe protocol:

defmodule Thrift.Message do
  @enforce_keys [:type., :seq_id, :name, :payload]
  defstruct [:type., :seq_id, :name,. :payload]
  @type t() :: t(term())
  @type t(payload) :: %__MODULE__{payload: payload}

  defprotocol SerDe do
    alias Thrift.TApplicationException

    @spec serialize(payload, Message.t(payload | struct)) :: payload when payload: var
    def serialize(payload, msg)

    @spec deserialize(payload) :: {Message.t(payload), payload} | :error when payload: var
    def deserialize(payload)

    @spec deserialize(payload, Message.t(thrift_struct)) :: {Message.t(thrift_struct | TApplicationException.t), payload} | :error when thrift_struct: struct, payload: var
    @spec deserialize(payload, Message.t(nil)) :: {Message.t(payload), payload} | :error when payload: var
    def deserialize(payload, msg)
  end

  defimpl Thrift.Serializable do
    def serialize(msg, payload), do: SerDe.serialize(payload, msg)
    def deserialize(msg, payload), do: SerDe.deserialize(payload, msg)
  end
end

Note the parameterization of Thrift.Message and how that type indicates how to serailaze/deserialize a message. We can serialize a message and its payload at same time (or include a serialized payload already), also we can specify how to deserialize the message, either to have a serialized payload, or to deserialize the payload to a specific struct (response struct).

@thecodeboss
Copy link
Collaborator

This is a very well-thought-out approach, thanks for doing this @fishcakez.

I believe it adds a couple layers of complexity by now having multiple protocols to reason about (Thrift.Serializable as well as the actual thrift protocol), but the nice thing is that with this approach we would be decoupling protocols, transports, etc. I think that's a win.

For the last example, are you suggesting something like

defimpl Thrift.Message.SerDe, for: FramedTransport do
  def serialize(payload, msg), do: ...
  def deserialize(payload), do: ...
end

where each transport would implement the Thrift.Message.SerDe protocol?

@fishcakez
Copy link
Member Author

The protocols we generate code for (TBInary, TCompact) would implement the Thrift.Message.SerDe protocol. Unsure about other protocols/transports which wrap those.

@jparise
Copy link
Collaborator

jparise commented Nov 28, 2018

@fishcakez are you suggesting we pull this into the 2.0 milestone?

@fishcakez
Copy link
Member Author

@jparise yes but only the latest part, I actually started a bit towards this. Plan to have a PR by end of the week.

@jparise jparise pinned this issue Dec 22, 2018
@jparise jparise unpinned this issue Dec 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants