Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q: process connections do not involve R connections, correct? #91

Closed
HenrikBengtsson opened this issue Dec 12, 2017 · 15 comments
Closed

Comments

@HenrikBengtsson
Copy link

HenrikBengtsson commented Dec 12, 2017

I just like to confirm that processx is not limited by the maximum number of connections R can have open, i.e. NCONNECTIONS=128. I've played around with things such as p <- process$new(..., stdout = "|") and p$get_output_connection() and it appears not to be, but could you please confirm?

EDIT: stdout not stdin

@gaborcsardi
Copy link
Member

The CRAN version uses R connections, so that is limited.

The GH version uses it's own connection implementation, so that is not limited. It is only limited by the OS limits for the number of open file descriptors.

Plus on windows poll() is limited to 64 connections. To be fixed by using IOCP (#81), but that is not an easy change....

@gaborcsardi
Copy link
Member

Btw. I guess you mean stdout=... because stdin is not supported...

@gaborcsardi
Copy link
Member

Btw. it is also easy to try :)

px <- replicate(150, process$new("sleep", "10", stdout = "|"))
sapply(px, function(x) x$kill())

@HenrikBengtsson
Copy link
Author

Great. Yes, I played around with it like that, but I wanted to make sure I didn't miss anything.

Among other things, I think this is a big advantage when it comes to parallelism; machines with > 125 cores are soon to be commonly available where people are starting to the limit with classical SOCK-clusters of the parallel package. Of course, it's not too hard to bump up the limit in R itself - but it requires some convincing :/

Any ETAs for CRAN releases, or do you play it by ear?

@gaborcsardi
Copy link
Member

CRAN is ignoring my processx updates, basically, so I can't promise anything. Nevertheless I keep submitting them: https://win-builder.r-project.org/incoming_pretest

@HenrikBengtsson
Copy link
Author

Oh... that's unfortunate. Since there are no obvious errors, hopefully it's just that t/he/y are busy right now.

PS. I only knew about ftp://cran.r-project.org/incoming/ - didn't know more details are available under https://win-builder.r-project.org/incoming_pretest/ - useful.

@HenrikBengtsson
Copy link
Author

Follow up: It looks like you've merged in some of the processx internals to callr - is that correct? If so, do comments in this thread also apply to callr? Specifically,

Plus on windows poll() is limited to 64 connections. To be fixed by using IOCP (#81), but that is not an easy change....

@gaborcsardi
Copy link
Member

Follow up: It looks like you've merged in some of the processx internals to callr - is that correct? If so, do comments in this thread also apply to callr? Specifically,

Yes, the same code is in callr now, so all this applies. I'll rewrite the windows IO with IOCP soonish, in the next 1-2-3 months, and then this limitation will go away.

@HenrikBengtsson
Copy link
Author

Awesome. Thxs.

@HenrikBengtsson
Copy link
Author

Just for the record (in case someone stumbles upon this thread): Since processx 3.1.0 (2018-05-15) there is no longer a limit of "64 connections" on Windows. From NEWS of processx 3.1.0:

Allow polling more than 64 connections on Windows, by using IOCP instead of WaitForMultipleObjects() (#81, #106).

@rsettlage
Copy link

Any news on Linux? I have a 128 core machine and through experimentation determined that makeCluster(124) was largest I could create.

@HenrikBengtsson
Copy link
Author

processx/callr never had a limit on Linux - it was Windows. So, you should be good to go using as many parallel {callr} processes as you'd like.

I saw your PR on future (thxs), so if you relate this to PSOCK workers vs callr workers, then yes you can use the future.callr backend to parallelize on your local machine with how many callr workers you'd like.

@rsettlage
Copy link

Sure, no problem. I am waiting for the day the future package full covers Rmpi. ;)

As to this, I had reached the point where I am sure this was an R issue earlier today. The issue I was having was starting workers using the doParallel package:

cl <- makeCluster(32)
stopCluster(cl)
cl <- makeCluster(64)
stopCluster(cl)
cl <- makeCluster(96)
stopCluster(cl)
cl <- makeCluster(128)
Error in socketAccept(socket = socket, blocking = TRUE, open = "a+b", :
all connections are in use
cl <- makeCluster(124)

@HenrikBengtsson
Copy link
Author

FYI, I've collected info and references on the 125 connection limit in R in HenrikBengtsson/Wishlist-for-R#28. If you're into building R from source, then you'll see there that it's just a single line of code you need to tweak to increase this limit.

@rsettlage
Copy link

I saw that. It seems everything we are doing on this AMD machine is build from source. Any hints on getting performance from R on an AMD Epyc machine?? ;) I used EasyBuild for this, I am going to have to see what the inerts of that module contain for sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants