Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report early worker failures back to master (and warn on possible R base ABI mismatch) #155

Closed
NathanSkene opened this issue Jul 5, 2019 · 3 comments

Comments

@NathanSkene
Copy link

Hi,

Firstly, many thanks for developing this great piece of software.

I've just been working on setting it up to connect via SSH to a SLURM cluster. I was finding that it would hang on "Sending common data ...". After checking the logs on the server I saw:

n*** Successfully loaded .Rprofile ***n
> clustermq:::ssh_proxy(ctl=54382, job=54802)
master ctl listening at: tcp://localhost:54382
forwarding local network from: tcp://longleaf-login2:9655
sent PROXY_UP to master ctl
Error in unserialize(ans) : 
  cannot read workspace version 3 written by R 3.6.0; need R 3.5.0 or newer
Calls: <Anonymous> -> <Anonymous> -> unserialize
Execution halted

Might be worth adding a script to check the version of R first, and enable it to fail more gracefully if the wrong version of R is loaded?

@mschubert
Copy link
Owner

mschubert commented Jul 7, 2019

Thank you for flagging this!

What I want to fix generally is that if the worker encounters an early error, it still sends this back to the master loop.

In this particular case, I'll see what I can do. It is a bit more complicated because

  1. Serialization is the backbone of all master-worker communication. If this breaks, we can not send messages, so sending an error message will also not work
  2. Some changes are just not documented between R versions. For instance, this is a breaking change on a stable (>=1.0) API, so it should occur only on major (x.0.0) version bumps. Yet, R routinely breaks functionality in minor versions, and it is impossible to anticipate which or when

So maybe the best approach is to display a warning if the SSH R has a different major or minor version than the master process.

Can you check if a message is serialized with your R<3.5.0, can it be unserialized with R>=3.6.0?

i.e.:

saveRDS(serialize(1:10, NULL), "test.rds") # on your server w/ R<3.5.0
unserialize(readRDS("test.rds")) # on your local machine with current R

@mschubert mschubert changed the title Add check for R version to avoid hanging on "Sending common data" Report early worker failures back to master (and warn on possible R base ABI mismatch) Jul 7, 2019
@NathanSkene
Copy link
Author

Thanks for looking into it!

I ran this line on the server running R 3.4.1:

saveRDS(serialize(1:10, NULL), "test.rds")

Downloaded the file and opened in locally with R 3.6.0. It worked fine.

@mschubert
Copy link
Owner

This will be addressed by #150.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants