Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

添加证书认证后flow test报错 #8

Closed
NookLook2014 opened this issue Apr 21, 2022 · 1 comment
Closed

添加证书认证后flow test报错 #8

NookLook2014 opened this issue Apr 21, 2022 · 1 comment

Comments

@NookLook2014
Copy link

前置条件

用Ansible FATE-1.7.0单边部署了三个结点,详细配置如下:
210结点,Exchange角色,通过 /bin/bash deploy/deploy.sh keys生成了证书,开启了服务端和客户端认证。
route_table.json的配置信息如下:
{
"route_table":
{
"211":
{
"default":[
{
"is_secure": true,
"ip": "10.32.122.211",
"port": 9371
}
]
},
"213":
{
"default":[
{
"is_secure": true,
"ip": "10.32.122.213",
"port": 9371
}
]
},
.....
eggroll.properties的中相关的配置信息为:
eggroll.core.security.client.ca.crt.path=/data/projects/data/fate/keys/exchange-client-ca.pem
eggroll.core.security.client.crt.path=/data/projects/data/fate/keys/exchange-client-client.pem
eggroll.core.security.client.key.path=/data/projects/data/fate/keys/exchange-client-client.key

eggroll.core.security.ca.crt.path=/data/projects/data/fate/keys/exchange-ca.pem
eggroll.core.security.crt.path=/data/projects/data/fate/keys/exchange-server.pem
eggroll.core.security.key.path=/data/projects/data/fate/keys/exchange-server.key

213结点,Host角色,将210结点的证书拷贝到了对应目录下,示例如下:
scp deploy/keys/exchange/ca.pem app@10.32.122.213:/data/projects/data/fate/keys/host-client-ca.pem
.....
另外,其route_table.json的配置信息如下:
{
"route_table":
{
"default":
{
"default":[
{
"is_secure": true,
"ip": "10.32.122.210",
"port": 9370
}
]
},
"213":
{
"default":[
{
"ip": "10.32.122.213",
"port": 9370
}
],
"fateflow":[
{
"ip": "10.32.122.213",
"port": 9360
}
]
}
},
"permission":
{
"default_allow": true
}
}

eggroll.properties中相关的配置信息为:
eggroll.rollsite.lan.insecure.channel.enabled=true
eggroll.rollsite.secure.port=9371

eggroll.core.security.client.ca.crt.path=/data/projects/data/fate/keys/host-client-ca.pem
eggroll.core.security.client.crt.path=/data/projects/data/fate/keys/host-client-client.pem
eggroll.core.security.client.key.path=/data/projects/data/fate/keys/host-client-client.key

211结点,Guest角色,,将210结点的证书拷贝到了对应目录下,示例如下:
scp deploy/keys/exchange/ca.pem app@10.32.122.211:/data/projects/data/fate/keys/guest-client-ca.pem
.....
另外,其route_table.json的配置信息如下:
{
"route_table":
{
"default":
{
"default":[
{
"is_secure": true,
"ip": "10.32.122.210",
"port": 9371
}
]
},
"211":
{
"default":[
{
"ip": "10.32.122.211",
"port": 9370
}
],
"fateflow":[
{
"ip": "10.32.122.211",
"port": 9360
}
]
}
},
"permission":
{
"default_allow": true
}
}

eggroll.properties中的相关配置为:
eggroll.rollsite.lan.insecure.channel.enabled=true
eggroll.rollsite.secure.port=9371

eggroll.core.security.client.ca.crt.path=/data/projects/data/fate/keys/guest-client-ca.pem
eggroll.core.security.client.crt.path=/data/projects/data/fate/keys/guest-client-client.pem
eggroll.core.security.client.key.path=/data/projects/data/fate/keys/guest-client-client.key

测试

所有结点的fate-rollsite 服务均重新启动。
在211结点,执行:
source /data/projects/fate/bin/init.sh
flow test toy -gid 211 -hid 213

执行结果报错,错误信息为:
(venv) app@cestc211:/data/projects/fate/eggroll/conf$ flow test toy -gid 211 -hid 213
{
"jobId": "202204211045098230110",
"retcode": 103,
"retmsg": "Traceback (most recent call last):\n File "/data/projects/fate/fateflow/python/fate_flow/scheduler/dag_scheduler.py", line 124, in submit\n raise Exception("create job failed", response)\nException: ('create job failed', {'guest': {211: {'data': {'components': {'secure_add_example_0': {'need_run': True}}}, 'retcode': 0, 'retmsg': 'success'}}, 'host': {213: {'retcode': <RetCode.FEDERATED_ERROR: 104>, 'retmsg': 'Federated schedule error, Please check rollSite and fateflow network connectivityrpc request error: <_Rendezvous of RPC that terminated with:\n\tstatus = StatusCode.UNAVAILABLE\n\tdetails = "UNAVAILABLE: \n[Roll Site Error TransInfo] \n location msg=UNAVAILABLE: io exception \n stack info=io.grpc.StatusRuntimeException: UNAVAILABLE: io exception\n\tat io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:240)\n\tat io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:221)\n\tat io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:140)\n\tat com.webank.ai.eggroll.api.networking.proxy.DataTransferServiceGrpc$DataTransferServiceBlockingStub.unaryCall(DataTransferServiceGrpc.java:348)\n\tat com.webank.eggroll.rollsite.EggSiteServicer.unaryCall(EggSiteServicer.scala:138)\n\tat com.webank.ai.eggroll.api.networking.proxy.DataTransferServiceGrpc$MethodHandlers.invoke(DataTransferServiceGrpc.java:406)\n\tat io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:172)\n\tat io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)\n\tat io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)\n\tat io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)\n\tat io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)\n\tat io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)\n\tat io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:817)\n\tat io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)\n\tat io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\nCaused by: io.grpc.netty.shaded.io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) failed: Connection refused: /10.32.122.213:9371\nCaused by: java.net.ConnectException: finishConnect(..) failed: Connection refused\n\tat io.grpc.netty.shaded.io.netty.channel.unix.Errors.throwConnectException(Errors.java:124)\n\tat io.grpc.netty.shaded.io.netty.channel.unix.Socket.finishConnect(Socket.java:243)\n\tat io.grpc.netty.shaded.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:660)\n\tat io.grpc.netty.shaded.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:637)\n\tat io.grpc.netty.shaded.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:524)\n\tat io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:473)\n\tat io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:383)\n\tat io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1044)\n\tat io.grpc.netty.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat io.grpc.netty.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat java.lang.Thread.run(Thread.java:748)\n \n\nexception trans path: 10.32.122.210(${id}) --> 10.32.122.211(211)"\n\tdebug_error_string = "{"created":"@1650509120.047216058","description":"Error received from peer ipv4:10.32.122.211:9370","file":"src/core/lib/surface/call.cc","file_line":1055,"grpc_message":"UNAVAILABLE: \\n[Roll Site Error TransInfo] \\n location msg=UNAVAILABLE: io exception \\n stack info=io.grpc.StatusRuntimeException: UNAVAILABLE: io exception\\n\\tat io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:240)\\n\\tat io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:221)\\n\\tat io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:140)\\n\\tat com.webank.ai.eggroll.api.networking.proxy.DataTransferServiceGrpc$DataTransferServiceBlockingStub.unaryCall(DataTransferServiceGrpc.java:348)\\n\\tat com.webank.eggroll.rollsite.EggSiteServicer.unaryCall(EggSiteServicer.scala:138)\\n\\tat com.webank.ai.eggroll.api.networking.proxy.DataTransferServiceGrpc$MethodHandlers.invoke(DataTransferServiceGrpc.java:406)\\n\\tat io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:172)\\n\\tat io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)\\n\\tat io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)\\n\\tat io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)\\n\\tat io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)\\n\\tat io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)\\n\\tat io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:817)\\n\\tat io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)\\n\\tat io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)\\n\\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\\n\\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\\n\\tat java.lang.Thread.run(Thread.java:748)\\nCaused by: io.grpc.netty.shaded.io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) failed: Connection refused: /10.32.122.213:9371\\nCaused by: java.net.ConnectException: finishConnect(..) failed: Connection refused\\n\\tat io.grpc.netty.shaded.io.netty.channel.unix.Errors.throwConnectException(Errors.java:124)\\n\\tat io.grpc.netty.shaded.io.netty.channel.unix.Socket.finishConnect(Socket.java:243)\\n\\tat io.grpc.netty.shaded.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:660)\\n\\tat io.grpc.netty.shaded.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:637)\\n\\tat io.grpc.netty.shaded.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:524)\\n\\tat io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:473)\\n\\tat io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:383)\\n\\tat io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1044)\\n\\tat io.grpc.netty.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\\n\\tat io.grpc.netty.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\\n\\tat java.lang.Thread.run(Thread.java:748)\\n \\n\\nexception trans path: 10.32.122.210(${id}) --> 10.32.122.211(211)","grpc_status":14}"\n>'}}})\n"
}

其他信息

如果不使用证书(is_secure: false, port:9370),能够正常运行。

期望

使用证书下,正确的配置

谢谢!

@Mr-lq7
Copy link

Mr-lq7 commented Sep 21, 2022

所以这个问题解决了吗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants