Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transfer data using tarpc #299

Open
zy1994-lab opened this issue Mar 30, 2020 · 4 comments
Open

Transfer data using tarpc #299

zy1994-lab opened this issue Mar 30, 2020 · 4 comments
Labels

Comments

@zy1994-lab
Copy link

zy1994-lab commented Mar 30, 2020

tarpc provides a nice interface to program so I want to use tarpc to transfer data among different machines in a distributed cluster. Apart from the 'normal' use case of rpc which given as the following

type GetFut = Ready<Vec<u32>>;
fn get(self, _: context::Context) -> Self:: GetFut {
        // something here
}

I would like to also have the following method:

type PutFut = Ready<()>;
fn put(self, _: context::Context, data: Vec<u32>) -> Self:: PutFut {
        // something here
}

In this way, I suppose I can both get and send data to the server. The vector in my use case can be very large. My question are:

  1. Will the upload and download stream cause congestion in the same channel because the data size can be very large in both direction
  2. How would this put method compare to, say, using a handwritten TcpStream or something like MPI?

I used tarpc before but I don't really understand how it work under the hood. Can anyone kindly help me? Thanks so much!

@tikue tikue added the feature label Apr 8, 2020
@tikue
Copy link
Collaborator

tikue commented Apr 9, 2020

Hey! Sorry, I lost track of this. So in general, blob transfer is a hard problem that unary RPCs are not well suited to:

  1. It is expected that a single request is small enough to fit in memory. For very large blobs, this is not true. Ram usage could very easily spike if you're, say, transferring a blu-ray video in a single request.
  2. Load shedding is on a per-request basis: in tarpc, message deserialization typically happens before load shedding. Blobs can be arbitrarily large, which effectively breaks load shedding. A sufficiently smart transport layer could handle this better, perhaps by only reading X bytes before yielding back to the scheduler (tokio just had a blog post about stuff like this).
  3. If you send only a small chunk in each RPC, you won't have the above problems, but then you won't necessarily have a guarantee that the chunks are hitting the same backend, e.g. if your client round-robins to multiple backends. This could be a problem if you're streaming to a file on a specific server; it's less of a problem if you know there's only one backend you could be talking to.

@zy1994-lab
Copy link
Author

Thanks Tim. Right now I'm splitting a large file into small chunks and send them in multiple RPC requests. Because in my understanding, this won't cause too many troubles given the size of each RPC is reasonable. Some system like Timely Dataflow automatically break large payload into small batches during data transfer, is it possible to add this feature in tarps?

@zy1994-lab
Copy link
Author

BTW, what's the maximum payload/frame size I can send using tarpc? Can I config this number?

@tikue
Copy link
Collaborator

tikue commented Apr 12, 2020

Max payload/frame size is up to the transport to decide. For example, if you're using an in-memory channel that doesn't serialize requests or responses, you probably don't want to enforce a payload size. Many serde serializers support max serialization size, like bincode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants