Error in unserialize(socklist[[n]]) : error reading from connection #44

OlexiyPukhov · 2022-10-24T17:17:13Z

I am dealing with a large dataset (27000 x 140) and when I first try without parallel computation of kernalSHAP, there is no movement in the progress bar (it stays at 0). When I turn on parallel processing using what you have described on the home page of your repo, I get the error:

Error in unserialize(socklist[[n]]) : error reading from connection

What should I do? I am using windows, core i7 12700k.

mayer79 · 2022-10-24T17:35:26Z

Hello and sorry you run into a problem.

I probably cannot solve the error coming from parallel computing on Windows. But maybe I can help you with the single-threaded problem (no movement in error bar):

SHAP analyses are usually done by decomposing 200-2000 predictions. Maybe you can start with an X consisting of 500 randomly sampled rows.
What is the size of your background data set? Scott Lundberg proposes to use small sets of only a few rows (up to 100). Also here I'd suggest to use a bg_X of 20. If the speed is acceptable, you can increase to 100.
I guess you are not using exact = TRUE?
Are you using the current CRAN version?

What do you observe?

OlexiyPukhov · 2022-10-24T22:45:43Z

Thank you for the swift answer. I was able to solve the problem thanks to your ideas and some tinkering. Initially, my bg_x and x were the same, both being 27k x 140. I set x to be a subset of 2000 rows and bg_x to be 500 with parallel processing for 12 threads enabled with exact = FALSE. Processing finished after about 5 minutes.

Will the fact that I am now using a smaller dataset for x and bg_x change my results relative to using my full dataset for x and bg_x? I am able to use the full dataset for x and bg_x with treeSHAP.

mayer79 · 2022-10-25T05:34:20Z

Sweet! Thanks for testing. With 500 rows, your background data is still very large, but 5 minutes is quite acceptable. Using even larger X and bg_X will change the result, but only slightly. I usually decompose between 1000 and 2000 predictions, even with TreeSHAP.

TreeSHAP is indeed magnitudes faster than KernelSHAP but can only be used for tree-based methods, while KernelSHAP works for all model classes. For trees, I would not use KernelSHAP in practice.

By default, in your case, exact = FALSE, so you don't need to specify it explicitly.

OlexiyPukhov · 2022-10-25T11:47:16Z

Since I'm doing research, a longer processing time is acceptable but I suppose for production TreeSHAP would be preferred. Thanks again!

OlexiyPukhov closed this as completed Oct 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in unserialize(socklist[[n]]) : error reading from connection #44

Error in unserialize(socklist[[n]]) : error reading from connection #44

OlexiyPukhov commented Oct 24, 2022 •

edited

mayer79 commented Oct 24, 2022

OlexiyPukhov commented Oct 24, 2022 •

edited

mayer79 commented Oct 25, 2022 •

edited

OlexiyPukhov commented Oct 25, 2022

Error in unserialize(socklist[[n]]) : error reading from connection #44

Error in unserialize(socklist[[n]]) : error reading from connection #44

Comments

OlexiyPukhov commented Oct 24, 2022 • edited

mayer79 commented Oct 24, 2022

OlexiyPukhov commented Oct 24, 2022 • edited

mayer79 commented Oct 25, 2022 • edited

OlexiyPukhov commented Oct 25, 2022

OlexiyPukhov commented Oct 24, 2022 •

edited

OlexiyPukhov commented Oct 24, 2022 •

edited

mayer79 commented Oct 25, 2022 •

edited