-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get data timeout, key=root:110:ALLGATHER #70
Comments
以上是semi2k协议的,我后来换成aby3又提示:
目前用的3个计算节点,都是8c16g |
Hi @mingo0117 ,
|
ray 集群,3台机器,都是8c 16g
|
Hi @mingo0117 , 看上去SPU device没有建立成功,但你之前应该是成功的,现在请确保
|
请参考 #66 |
Hi @mingo0117, 请试着用以下方式强制重建spu device:
感谢。 |
嗯,spu device重建的问题不存在了,我还是想聊下最开始说的这个。我自己解决了,很有意思,但是不明白为什么:
这个只要把数据源改成官网的案例,就没问题:
然后神奇的事情发生了,数据一直在跑,机器CPU、内存都打满了,最后抛出Get data timeout的报错,更换协议也只是换了一个内存的报错。似乎是pandas读取的问题。后来我看load_breast_cancer源码,没有用pandas,于是我还是换成了np去读取:
问题解决了... 30多秒出了结果。这是为什么呢?pandas这种读取,我用明文的方式做lr是完全可以的,所以一直没有怀疑到这个读取方式上 |
很有趣的现象,可以发一下完整的代码/复现过程吗?感谢! |
以上 |
看 load_train_dataset 这里就行,区别就是注释的地方,您可以随便找个20列的测试集在三方集群试试 |
方便用np.array_equal判断一下np.loadtxt和pd.read_csv两个函数读取的数据有没有差异? |
Hi @mingo0117 , As far as we know, serialization is costly in ray. So please use numpy IO API at most time. Please refer to https://docs.ray.io/en/releases-1.11.1/ray-core/serialization.html#serialization. If you have to use Pandas for IO purpose, please check https://docs.ray.io/en/latest/data/modin/index.html#using-pandas-on-ray-modin as well. Thanks. |
谢谢。再请教一个问题:spu是否有协议或者计算相关的日志?如何打开?想看看里面发生了什么 |
Hi @mingo0117 , 首先,你需要通过设置spu的config来开启相应的log:
然后,你需要在secretflow init的时候打开log_to_driver,类似于
|
明白了,非常感谢 |
* Fix build * Update yacl again
Issue Type
Others
Source
binary
Secretflow Version
latest
OS Platform and Distribution
ubuntu 18.04
Python version
3.8.13
Bazel version
No response
GCC/Compiler version
No response
What happend and What you expected to happen.
Reproduction code to reproduce the issue.
The text was updated successfully, but these errors were encountered: