Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker p2p 部署,psi:check_allowed_values failed.? #319

Closed
gxcuit opened this issue May 21, 2024 · 7 comments
Closed

docker p2p 部署,psi:check_allowed_values failed.? #319

gxcuit opened this issue May 21, 2024 · 7 comments
Assignees

Comments

@gxcuit
Copy link

gxcuit commented May 21, 2024

update: 可能是psi 0.0.1 的问题?我换成了0.0.4 依旧有问题

--

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/site-packages/secretflow/kuscia/entry.py", line 547, in <module>
    main()
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/secretflow/kuscia/entry.py", line 527, in main
    res = comp_eval(sf_node_eval_param, storage_config, sf_cluster_config)
  File "/usr/local/lib/python3.10/site-packages/secretflow/component/entry.py", line 164, in comp_eval
    res = comp.eval(
  File "/usr/local/lib/python3.10/site-packages/secretflow/component/component.py", line 1092, in eval
    reader = EvalParamReader(instance=param, definition=definition)
  File "/usr/local/lib/python3.10/site-packages/secretflow/component/eval_param_reader.py", line 129, in __init__
    self._preprocess()
  File "/usr/local/lib/python3.10/site-packages/secretflow/component/eval_param_reader.py", line 226, in _preprocess
    raise EvalParamError(f"attr {full_name}: check_allowed_values failed.")
secretflow.component.eval_param_reader.EvalParamError: attr protocol: check_allowed_values failed

origin:

Hi, 我在根据文档

进行部署

授权没问题,如图所示:

Alice:
image

Bob:
image


当我在Alice 执行scripts/user/create_example_job.sh , 显示失败,并且只有Bob有pod 信息,Alice 没有

请问如何排查?
image


[root@root-kuscia-autonomy-alice kuscia]# kubectl get kt -n cross-domain secretflow-task-20240521151841-single-psi -o yaml
apiVersion: kuscia.secretflow/v1alpha1
kind: KusciaTask
metadata:
  annotations:
    kuscia.secretflow/initiator: alice
    kuscia.secretflow/interconn-bfia-parties: ""
    kuscia.secretflow/interconn-kuscia-parties: bob
    kuscia.secretflow/interconn-self-parties: alice
    kuscia.secretflow/job-id: secretflow-task-20240521151841
    kuscia.secretflow/self-cluster-as-initiator: "true"
    kuscia.secretflow/task-alias: single-psi
  creationTimestamp: "2024-05-21T07:18:43Z"
  generation: 1
  labels:
    kuscia.secretflow/controller: kuscia-job
    kuscia.secretflow/job-uid: 898d1c99-99f4-43f8-b314-b2d8bd9fd9af
  name: secretflow-task-20240521151841-single-psi
  namespace: cross-domain
  ownerReferences:
  - apiVersion: kuscia.secretflow/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: KusciaJob
    name: secretflow-task-20240521151841
    uid: 898d1c99-99f4-43f8-b314-b2d8bd9fd9af
  resourceVersion: "6882"
  uid: 4636416d-2930-4fb0-af84-aedc8993cd6a
spec:
  initiator: alice
  parties:
  - appImageRef: secretflow-image
    domainID: alice
    template:
      spec: {}
  - appImageRef: secretflow-image
    domainID: bob
    template:
      spec: {}
  scheduleConfig: {}
  taskInputConfig: '{"sf_datasource_config":{"alice":{"id":"default-data-source"},"bob":{"id":"default-data-source"}},"sf_cluster_desc":{"parties":["alice","bob"],"devices":[{"name":"spu","type":"spu","parties":["alice","bob"],"config":"{\"runtime_config\":{\"protocol\":\"REF2K\",\"field\":\"FM64\"},\"link_desc\":{\"connect_retry_times\":60,\"connect_retry_interval_ms\":1000,\"brpc_channel_protocol\":\"http\",\"brpc_channel_connection_type\":\"pooled\",\"recv_timeout_ms\":1200000,\"http_timeout_ms\":1200000}}"},{"name":"heu","type":"heu","parties":["alice","bob"],"config":"{\"mode\":
    \"PHEU\", \"schema\": \"paillier\", \"key_size\": 2048}"}],"ray_fed_config":{"cross_silo_comm_backend":"brpc_link"}},"sf_node_eval_param":{"domain":"data_prep","name":"psi","version":"0.0.1","attr_paths":["input/receiver_input/key","input/sender_input/key","protocol","precheck_input","bucket_size","curve_type"],"attrs":[{"ss":["id1"]},{"ss":["id2"]},{"s":"ECDH_PSI_2PC"},{"b":true},{"i64":"1048576"},{"s":"CURVE_FOURQ"}]},"sf_input_ids":["alice-table","bob-table"],"sf_output_ids":["psi-output"],"sf_output_uris":["psi-output.csv"]}'
status:
  allocatedPorts:
  - domainID: bob
    namedPort:
      secretflow-task-20240521151841-single-psi-0/client-server: 30650
      secretflow-task-20240521151841-single-psi-0/fed: 30652
      secretflow-task-20240521151841-single-psi-0/global: 30653
      secretflow-task-20240521151841-single-psi-0/node-manager: 30648
      secretflow-task-20240521151841-single-psi-0/object-manager: 30649
      secretflow-task-20240521151841-single-psi-0/spu: 30651
  - domainID: alice
    namedPort:
      secretflow-task-20240521151841-single-psi-0/client-server: 32008
      secretflow-task-20240521151841-single-psi-0/fed: 32010
      secretflow-task-20240521151841-single-psi-0/global: 32011
      secretflow-task-20240521151841-single-psi-0/node-manager: 32012
      secretflow-task-20240521151841-single-psi-0/object-manager: 32013
      secretflow-task-20240521151841-single-psi-0/spu: 32009
  completionTime: "2024-05-21T07:19:01Z"
  conditions:
  - lastTransitionTime: "2024-05-21T07:18:43Z"
    status: "True"
    type: ResourceCreated
  - lastTransitionTime: "2024-05-21T07:18:48Z"
    status: "True"
    type: Running
  - lastTransitionTime: "2024-05-21T07:19:01Z"
    status: "False"
    type: Success
  lastReconcileTime: "2024-05-21T07:19:01Z"
  message: The remaining no-failed party task counts 1 are less than the threshold
    2 that meets the conditions for task success. pending party[], running party[alice],
    successful party[], failed party[bob]
  partyTaskStatus:
  - domainID: bob
    phase: Failed
  - domainID: alice
    phase: Failed
  phase: Failed
  podStatuses:
    alice/secretflow-task-20240521151841-single-psi-0:
      createTime: "2024-05-21T07:18:43Z"
      namespace: alice
      nodeName: root-kuscia-autonomy-alice
      podName: secretflow-task-20240521151841-single-psi-0
      podPhase: Failed
      readyTime: "2024-05-21T07:18:48Z"
      startTime: "2024-05-21T07:18:45Z"
  serviceStatuses:
    alice/secretflow-task-20240521151841-single-psi-0-fed:
      createTime: "2024-05-21T07:18:43Z"
      namespace: alice
      portName: fed
      portNumber: 32010
      readyTime: "2024-05-21T07:18:48Z"
      scope: Cluster
      serviceName: secretflow-task-20240521151841-single-psi-0-fed
    alice/secretflow-task-20240521151841-single-psi-0-global:
      createTime: "2024-05-21T07:18:43Z"
      namespace: alice
      portName: global
      portNumber: 32011
      readyTime: "2024-05-21T07:18:48Z"
      scope: Domain
      serviceName: secretflow-task-20240521151841-single-psi-0-global
    alice/secretflow-task-20240521151841-single-psi-0-spu:
      createTime: "2024-05-21T07:18:43Z"
      namespace: alice
      portName: spu
      portNumber: 32009
      readyTime: "2024-05-21T07:18:48Z"
      scope: Cluster
      serviceName: secretflow-task-20240521151841-single-psi-0-spu
  startTime: "2024-05-21T07:18:43Z"
@zimu-yuxi
Copy link
Collaborator

可以看下此处文档,贴一下bob容器里的stdout下的相关日志

@gxcuit
Copy link
Author

gxcuit commented May 21, 2024

可以看下此处文档,贴一下bob容器里的stdout下的相关日志

@zimu-yuxi Hi, 可能是psi 0.0.1 的问题?我换成了0.0.4 依旧有问题

下面是Alice 的stdout

[root@root-kuscia-autonomy-alice kuscia]# less var/pods/f841a2b9-f6ba-4844-a063-478134e4a18e/
containers/ volumes/
[root@root-kuscia-autonomy-alice kuscia]# less var/stdout/pods/alice_secretflow-task-20240521153955-single-psi-0_f841a2b9-f6ba-4844-a063-478134e4a18e/secretflow/0.log
2024-05-21T15:40:12.379422651+08:00 stdout F 2024-05-21 07:40:12,377|alice|WARNING|secretflow|entry.py:comp_eval:160|
2024-05-21T15:40:12.379436747+08:00 stdout F --
2024-05-21T15:40:12.379450587+08:00 stdout F *cluster_config*
2024-05-21T15:40:12.379463715+08:00 stdout F
2024-05-21T15:40:12.379477372+08:00 stdout F desc {
2024-05-21T15:40:12.379491913+08:00 stdout F   parties: "alice"
2024-05-21T15:40:12.379506596+08:00 stdout F   parties: "bob"
2024-05-21T15:40:12.379521256+08:00 stdout F   devices {
2024-05-21T15:40:12.37953576+08:00 stdout F     name: "spu"
2024-05-21T15:40:12.379550454+08:00 stdout F     type: "spu"
2024-05-21T15:40:12.379564551+08:00 stdout F     parties: "alice"
2024-05-21T15:40:12.379598866+08:00 stdout F     parties: "bob"
2024-05-21T15:40:12.379619506+08:00 stdout F     config: "{\"runtime_config\":{\"protocol\":\"REF2K\",\"field\":\"FM64\"},\"link_desc\":{\"connect_retry_t
imes\":60,\"connect_retry_interval_ms\":1000,\"brpc_channel_protocol\":\"http\",\"brpc_channel_connection_type\":\"pooled\",\"recv_timeout_ms\":1200000,\"
http_timeout_ms\":1200000}}"
2024-05-21T15:40:12.379634403+08:00 stdout F   }
2024-05-21T15:40:12.379648443+08:00 stdout F   devices {
2024-05-21T15:40:12.379662374+08:00 stdout F     name: "heu"
2024-05-21T15:40:12.379676244+08:00 stdout F     type: "heu"
2024-05-21T15:40:12.379689958+08:00 stdout F     parties: "alice"
2024-05-21T15:40:12.379703799+08:00 stdout F     parties: "bob"
2024-05-21T15:40:12.379718032+08:00 stdout F     config: "{\"mode\": \"PHEU\", \"schema\": \"paillier\", \"key_size\": 2048}"
2024-05-21T15:40:12.379751406+08:00 stdout F   }
2024-05-21T15:40:12.379767817+08:00 stdout F   ray_fed_config {
2024-05-21T15:40:12.379782434+08:00 stdout F     cross_silo_comm_backend: "brpc_link"
2024-05-21T15:40:12.379796355+08:00 stdout F   }
2024-05-21T15:40:12.379810382+08:00 stdout F }
2024-05-21T15:40:12.379824425+08:00 stdout F public_config {
2024-05-21T15:40:12.379838536+08:00 stdout F   ray_fed_config {
2024-05-21T15:40:12.379852393+08:00 stdout F     parties: "alice"
2024-05-21T15:40:12.379866097+08:00 stdout F     parties: "bob"
2024-05-21T15:40:12.379880047+08:00 stdout F     addresses: "0.0.0.0:23895"
2024-05-21T15:40:12.379899615+08:00 stdout F     addresses: "secretflow-task-20240521153955-single-psi-0-fed.bob.svc:80"
2024-05-21T15:40:12.379914388+08:00 stdout F   }
2024-05-21T15:40:12.379928395+08:00 stdout F   spu_configs {
2024-05-21T15:40:12.379942276+08:00 stdout F     name: "spu"
2024-05-21T15:40:12.379956146+08:00 stdout F     parties: "alice"
2024-05-21T15:40:12.379970077+08:00 stdout F     parties: "bob"
2024-05-21T15:40:12.379997581+08:00 stdout F     addresses: "0.0.0.0:23894"
2024-05-21T15:40:12.380014091+08:00 stdout F     addresses: "http://secretflow-task-20240521153955-single-psi-0-spu.bob.svc:80"
2024-05-21T15:40:12.380028318+08:00 stdout F   }
:
2024-05-21T15:40:12.379362915+08:00 stderr F Traceback (most recent call last):
2024-05-21T15:40:12.380170599+08:00 stderr F   File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
2024-05-21T15:40:12.380253835+08:00 stderr F     return _run_code(code, main_globals, None,
2024-05-21T15:40:12.380277882+08:00 stderr F   File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
2024-05-21T15:40:12.380308237+08:00 stderr F     exec(code, run_globals)
2024-05-21T15:40:12.380329343+08:00 stderr F   File "/usr/local/lib/python3.10/site-packages/secretflow/kuscia/entry.py", line 547, in <module>
2024-05-21T15:40:12.380343921+08:00 stderr F     main()
2024-05-21T15:40:12.380358451+08:00 stderr F   File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
2024-05-21T15:40:12.381894114+08:00 stderr F     return self.main(*args, **kwargs)
2024-05-21T15:40:12.38191974+08:00 stderr F   File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1078, in main
2024-05-21T15:40:12.381935435+08:00 stderr F     rv = self.invoke(ctx)
2024-05-21T15:40:12.381949558+08:00 stderr F   File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
2024-05-21T15:40:12.381990079+08:00 stderr F     return ctx.invoke(self.callback, **ctx.params)
2024-05-21T15:40:12.38200632+08:00 stderr F   File "/usr/local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
2024-05-21T15:40:12.382426265+08:00 stderr F     return __callback(*args, **kwargs)
2024-05-21T15:40:12.382449593+08:00 stderr F   File "/usr/local/lib/python3.10/site-packages/secretflow/kuscia/entry.py", line 527, in main
2024-05-21T15:40:12.382860338+08:00 stderr F     res = comp_eval(sf_node_eval_param, storage_config, sf_cluster_config)
2024-05-21T15:40:12.382881929+08:00 stderr F   File "/usr/local/lib/python3.10/site-packages/secretflow/component/entry.py", line 164, in comp_eval
2024-05-21T15:40:12.383062231+08:00 stderr F     res = comp.eval(
2024-05-21T15:40:12.383082699+08:00 stderr F   File "/usr/local/lib/python3.10/site-packages/secretflow/component/component.py", line 1092, in eval
2024-05-21T15:40:12.383860158+08:00 stderr F     reader = EvalParamReader(instance=param, definition=definition)
2024-05-21T15:40:12.383887036+08:00 stderr F   File "/usr/local/lib/python3.10/site-packages/secretflow/component/eval_param_reader.py", line 129, in __init__
2024-05-21T15:40:12.384171727+08:00 stderr F     self._preprocess()
2024-05-21T15:40:12.384193982+08:00 stderr F   File "/usr/local/lib/python3.10/site-packages/secretflow/component/eval_param_reader.py", line 226, in _preprocess
2024-05-21T15:40:12.384462986+08:00 stderr F     raise EvalParamError(f"attr {full_name}: check_allowed_values failed.")
2024-05-21T15:40:12.384484273+08:00 stderr F secretflow.component.eval_param_reader.EvalParamError: attr protocol: check_allowed_values failed.

@gxcuit gxcuit changed the title docker p2p 部署,只有bob 有pod信息? docker p2p 部署,psi:check_allowed_values failed.? May 21, 2024
@zimu-yuxi
Copy link
Collaborator

大概知道问题了,稍等,我们内部看下。感谢

@gshilei
Copy link
Contributor

gshilei commented May 21, 2024

Hi @gxcuit

你可以参考v0.8.0b0分支文档进行相关操作.

以下是文档中快速体验的教程:

1. 指定Kuscia_Image
 export KUSCIA_IMAGE=secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/kuscia:0.8.0b0
2. 准备部署脚本kuscia.sh
 docker pull $KUSCIA_IMAGE && docker run --rm $KUSCIA_IMAGE cat /home/kuscia/scripts/deploy/kuscia.sh > kuscia.sh && chmod u+x kuscia.sh
3. 安装kuscia
./kuscia.sh center
 4. 创建并启动作业(两方 PSI 任务)。
 docker exec -it ${USER}-kuscia-master scripts/user/create_example_job.sh
5. 查看作业状态。
 docker exec -it ${USER}-kuscia-master kubectl get kj -n cross-domain

使用该分支文档中的教程时,请保证kuscia镜像的版本为secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/kuscia:0.8.0b0

此外,因为secretflow还处于不断迭代的过程,所以组件的参数和版本在不同的secretflow版本之间存在差异。因此,针对这种情况,我们从 v0.8.0b0 版本开始,会逐渐对 Kuscia 的版本和 secretflow 版本进行管理。 直接体现是正式分支的 v0.8.0b0 文档中,我们使用的是具体的 Kuscia镜像版本,而不是latest版本。

@gshilei gshilei self-assigned this May 21, 2024
@gxcuit
Copy link
Author

gxcuit commented May 21, 2024

Hi @gxcuit

你可以参考v0.8.0b0分支文档进行相关操作.

以下是文档中快速体验的教程:

1. 指定Kuscia_Image
 export KUSCIA_IMAGE=secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/kuscia:0.8.0b0
2. 准备部署脚本kuscia.sh
 docker pull $KUSCIA_IMAGE && docker run --rm $KUSCIA_IMAGE cat /home/kuscia/scripts/deploy/kuscia.sh > kuscia.sh && chmod u+x kuscia.sh
3. 安装kuscia
./kuscia.sh center
 4. 创建并启动作业(两方 PSI 任务)。
 docker exec -it ${USER}-kuscia-master scripts/user/create_example_job.sh
5. 查看作业状态。
 docker exec -it ${USER}-kuscia-master kubectl get kj -n cross-domain

使用该分支文档中的教程时,请保证kuscia镜像的版本为secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/kuscia:0.8.0b0

此外,因为secretflow还处于不断迭代的过程,所以组件的参数和版本在不同的secretflow版本之间存在差异。因此,针对这种情况,我们从 v0.8.0b0 版本开始,会逐渐对 Kuscia 的版本和 secretflow 版本进行管理。 直接体现是正式分支的 v0.8.0b0 文档中,我们使用的是具体的 Kuscia镜像版本,而不是latest版本。


@gshilei Hi,您好。有几个问题

  1. Kuscia 0.8.0b0 用的是secretflow 哪个版本的? 1.6.0b0 不是还没发布吗?secretflow-lite-anolis8:1.6.0b0 镜像未找到 #312
  2. 我参考快速体验,貌似没问题。参考的是通过docker 部署p2p 出现的问题

@gshilei
Copy link
Contributor

gshilei commented May 21, 2024

@gxcuit 用的 secretflow 版本为1.6.0b0,镜像今天发布的。
可以通过下面命令查看 secretflow 具体版本:
docker exec -it ${USER}-kuscia-master kubectl get appimage secretflow-image -o yaml | grep "image:" -A2

参考的是通过docker 部署p2p 出现的问题
-- 可以参考v0.8.0b0文档重新尝试下。

@gxcuit
Copy link
Author

gxcuit commented May 21, 2024

@gxcuit 用的 secretflow 版本为1.6.0b0,镜像今天发布的。 可以通过下面命令查看 secretflow 具体版本: docker exec -it ${USER}-kuscia-master kubectl get appimage secretflow-image -o yaml | grep "image:" -A2

参考的是通过docker 部署p2p 出现的问题
-- 可以参考v0.8.0b0文档重新尝试下。

Thanks! 我试一下

@gshilei gshilei closed this as completed May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants