Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use SMF as a library to send/receive data within a conventional application #282

Closed
luomai opened this issue Oct 27, 2018 · 20 comments
Closed
Assignees
Labels

Comments

@luomai
Copy link

luomai commented Oct 27, 2018

Hello, I have been surveying using Seastar to build a network stack for an application. However, I find out that this usually requires completely rewriting the application to adopt the seastar. When I look at the SMF, I am wondering if this RPC library can be used as a communication library within a conventional application?

@luomai luomai changed the title Use SMF as a library to send/receive data within an alien application Use SMF as a library to send/receive data within an non-seastar application Oct 27, 2018
@luomai luomai changed the title Use SMF as a library to send/receive data within an non-seastar application Use SMF as a library to send/receive data within a conventional application Oct 27, 2018
@emaxerrno emaxerrno self-assigned this Oct 28, 2018
@emaxerrno
Copy link
Collaborator

hi @luomai

so the spec is quite simple - 16byte header.

We have a working go server and a working java server implementations.

Currently we only have one c++ implementation which uses seastar.
I don't see why one wouldn't be able to port it to other runtimes.

I'll be cleaning up a few issues tomorrow w/ the repo, but the c++ code is in use today and it works.

In a way you can use any language to talk to a c++ server written w/ SMF + Seastar.

I think any co-routine implementation you choose (say facebook::folly or boost::coro) will require a re-write of large parts to take advantage of the cooperative yielding.

in seastar they have alien::submit_to which you might be able to use to partially port an application over.

Note sure if this answers your question.

@emaxerrno
Copy link
Collaborator

also pls use the mailing list for questions: https://groups.google.com/forum/#!forum/smf-dev

I'm sure others could use it + google can index it better.

Feel free to file a an issue if it's related to source code.

@luomai
Copy link
Author

luomai commented Oct 30, 2018

Thank you very much for the quick reply. @senior7515

I will try to adopt smf in my application.

@luomai
Copy link
Author

luomai commented Oct 30, 2018

I am trying to use the auto-generated SMFStorageClient to call a remote SMFStorageService. In the demo, you create a seastar app to generate the traffic. I am wondering if I can directly use the SMFStorageClient in my program? Or, I have to use allien::submit_to to send the message into the seastar runtime and then I can call the SMFStorageClient? I cannot find a link in the document to clarify this.

I am using the sample client-side code in the doc; but get a segmentation fault error.

int main(int args, char **argv, char **env) {  
  smf::rpc_typed_envelope<smf_gen::demo::Request> req;
  req.data->name = "Hello, smf-world!";

  auto addr = seastar::ipv4_addr("127.0.0.1", 20776);
  auto client = smf_gen::demo::SmfStorageClient::make_shared(addr);

  client->Get(req.serialize_data()).then([ ](auto reply) {
        std::cout << reply->name() << std::endl;
  }); // Here I get the segmentation fault error.
}

@emaxerrno
Copy link
Collaborator

emaxerrno commented Oct 30, 2018 via email

@emaxerrno
Copy link
Collaborator

emaxerrno commented Oct 30, 2018 via email

@luomai
Copy link
Author

luomai commented Oct 30, 2018

No problem. Thanks for the quick reply anyway! 👍

We are trying to use SMF to replace the grpc as the underlying network stack for the Google TensorFlow. The runtime of TensorFlow is written in C++. This puts the use of go or java-binding out of the table.

What is the best way to implement a non-seastar C++ client? The simplest way is to use normal client socket to connect to the remote seastar server. The second way is to create a seastar client runtime and use alien::submit_to to send messages. What is the way you think can preserve the low-latency and high-throughput of SMF?

I will try to look at how to implement the non-seastar client in C++ given your advice and thoughts.

@emaxerrno
Copy link
Collaborator

if you can get access to the raw socket of gRPC then all you have to do is flush the 16byte header and the payload

That's it.

So the RPC mechanism of SMF is just very small & binary & no serialization - just a pointer cast.

That's where most of the gains come from.

have you looked at facebook::folly ?

that would be a pretty great base to start.

They have an AsyncSocket as well as a regular socket.

Easier to integrate w/ existing code bases

@luomai
Copy link
Author

luomai commented Oct 30, 2018

I am studying the go-binding in order to implement my non-seastar c++ client. The go client seems to use a go socket connection to send data. Does it mean that we lose the benefit of kernel by-passing and DPDK on the client side?

If I want to preserve the client-side by-passing, is alien:submit_to the only option? The alien:submit_to uses a message queue to send a message from an alien thread to a seastar engine. It seems to will incur extra overhead. An example of alien:submit_to is here

@emaxerrno
Copy link
Collaborator

no that won't help.

The only thing that will work is to invert your logic.

Wrap your application in a seastar app.

For the synchronous work, use alien::submit_to
and then go back to the seastar world.

That's how ceph is doing it.

That's the only way to keep the seastar benefits.

@crackcomm
Copy link
Contributor

@luomai that's true, currently golang implementation is using standard net library, I was experimenting with some event loops but w/o any performance gains. It will be a good idea to explore DPDK libraries in Go which I haven't yet done, I'm open to any suggestions.

@luomai
Copy link
Author

luomai commented Oct 30, 2018

@senior7515 I have thought how to wrap the tensorflow program in a seastar app. However, that looks difficult as tensorflow has its own threading models for both CPU and GPUs

When you say "no that won't help", do you mean either alien:run_on and alien:submit_to are only designed for synchronisation and shall not be used on the traffic-intensive data path?
If that is the case, using a conventional high-performance socket (e.g., folly Socket) on the client side is the way to go. Right?

Thanks in advance!

@crackcomm DPDK is a layer-2 library. I am not aware of a layer-3 go-based socket that can run on top DPDK by far. Maybe a possible way t adopt the f-stack project which exposes epoll/kqueue equivalent API but backed by DPDK. In the go land, we can use cgo to call this API.

@crackcomm
Copy link
Contributor

@luomai I have only seen https://github.com/intel-go/nff-go but not read the code at all so not sure if it contains what we want.

@emaxerrno
Copy link
Collaborator

@luomai you got it.

  1. Seastar is an intrusive building block - i wrote about it here: https://www.alexgallego.org/concurrency/smf/2017/12/16/future.html
    So you can't just take parts of seastar. It's all or nothing.

  2. If you don't plan on using seastar, but still want to use the RPC you'll need some socket parsing code.

The pseudo code is this:

# server
ptr = read 16 bytes
memcpy( rpc:header, ptr) 

# read payload
payload = read rpc:header:size()

# checksum 
xxhash64 (payload) == rpc:header:checksum


That's it.

on the client side

you do the inverse effectively.

@crackcomm
Copy link
Contributor

crackcomm commented Oct 31, 2018

@luomai from my scope of understanding which may be totally wrong,

You will receive a buffer from TensorFlow to write to it and it will be on GPU or CPU, in my understanding it can't be possibly always on GPU or always on CPU or always free, it needs to be managed by TensorFlow.

In this case what you can do is copy to GPU or CPU buffer from TensorFlow.
Possibly also use buffers with zero-copy if they are really small and operations are not performed on GPU.

@emaxerrno
Copy link
Collaborator

emaxerrno commented Oct 31, 2018

@luomai @lgarithm

I was thinking about this last night.

One thing that could be very easy to use DPDK acceleration is actually use SMF as a proxy.

so something like this

[ tensor flow]                           [tensor flow app on second computer]
     ↓                                                 ↑
     ↓                                                 ↑
[ smf proxy ] ---------> NETWORK ------------> [smf proxy ]

@emaxerrno
Copy link
Collaborator

in that way you can do this:

  1. do http post to the smf-proxy
  2. have the proxy use the smf protocol to send data to other hosts
  3. upon receiving the data, pass it to the tensorflow app on the other end.

Since smf is pretty efficient, you can actually just give it 1 core to start out with and a deterministic amount of memory.

BTW this is how many cloud providers do their networking between virtual machines & DPDK.
so in that sense is a proven architecture.

let me know if you guys need help getting started.

@lgarithm
Copy link

lgarithm commented Nov 1, 2018

@senior7515 Thank you for your advice.
(I really like this idea. To me, this is really like using smf as a service mesh for tensors)
We will explore this approach as part of our system design.

Currently our concern is that domain socket is not supported by seastar,
so we will have TCP sockets between smf proxy and our application.
I wonder how much it would reduce the benefit that smf gives us.

@emaxerrno
Copy link
Collaborator

+1 for smf as a service mesh for tensors label.

Test it, i would be surprised if it shows up in the profile as anything worth fixing specially w/ DPDK. 4KB requests are exactly 0-copy on smf side of things.

don't bee to concerned bout unix sockets, that can be easily patched.

if your project takes off and uses SMF i'll be happy to help you do it.

@liutongxuan
Copy link

@luomai
Just saw you question. We submit a PR to TensorFlow about the integration, if you interest the commit, please help to review and comment in the commit. Your suggestion and code changes are welcomed. Reference PR: tensorflow/tensorflow#27454

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants