Revert "Revert "Start XRT server in a separate process in CircleCI CPU test"" #3584

JackCaoG · 2022-05-19T01:28:30Z

Reverts #3536

…U test (#3519)" (#3536)" This reverts commit 093d443.

JackCaoG · 2022-05-19T17:13:58Z

Need to set the port for GPU CI and also handle the pjrt test in CPU test.

JackCaoG · 2022-05-21T00:25:10Z

Need to look into cpu CI failure

JackCaoG · 2022-05-23T23:08:02Z

not able to repo the stuck on my dev machine. downloading docker used by ci

JackCaoG · 2022-05-24T00:57:15Z

able to repo using circle CI docker, looking.

JackCaoG · 2022-05-24T03:28:42Z

Seems like for multi cpu cc test multiple grpc server is started

2022-05-24 03:23:04.964812: I  159322 tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:438] Started server with target: grpc://localhost:37785
4
2022-05-24 03:23:04.975578: I  159321 tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:272] Initialize GrpcChannelCache for job localservice -> {0 -> localhost:55535}
2022-05-24 03:23:04.976036: I  159321 tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:438] Started server with target: grpc://localhost:55535
4
2022-05-24 03:23:05.000570: I  159319 tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:272] Initialize GrpcChannelCache for job localservice -> {0 -> localhost:36029}
2022-05-24 03:23:05.002110: I  159319 tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:438] Started server with target: grpc://localhost:36029
2022-05-24 03:23:05.002230: I  159319 tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:1489] Creating mesh service bound to de4155e68d22:34453
2022-05-24 03:23:05.007282: I  159320 tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:272] Initialize GrpcChannelCache for job localservice -> {0 -> localhost:32791}
2022-05-24 03:23:05.008858: I  159320 tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:438] Started server with target: grpc://localhost:32791

This won't work with the single xrt_server approach in this pr.

JackCaoG · 2022-05-24T03:37:51Z

Well I guess this make sense, we have ~8 cc op tests, each of them will start 4 grpc server with random port. Every time we have a port conflict test will fail. One thing I can do is limit these cc ops tests on cpu to only be run on pytorch/xla CI. It is very unlikely upstream will break pt/xla cc op too.

JackCaoG · 2022-05-24T03:38:54Z

I will close pr and open a new one to only run cc op test on torch/xla CI.

JackCaoG added 2 commits May 18, 2022 18:28

Revert "Revert "Start XRT server in a separate process in CircleCI CP…

cb95287

…U test (#3519)" (#3536)" This reverts commit 093d443.

Update common.sh

8c13b0e

Remove pjrt test for now

298c118

JackCaoG closed this May 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Revert "Revert "Start XRT server in a separate process in CircleCI CPU test"" #3584

Revert "Revert "Start XRT server in a separate process in CircleCI CPU test"" #3584

JackCaoG commented May 19, 2022

Uh oh!

JackCaoG commented May 19, 2022

Uh oh!

JackCaoG commented May 21, 2022

Uh oh!

JackCaoG commented May 23, 2022

Uh oh!

JackCaoG commented May 24, 2022

Uh oh!

JackCaoG commented May 24, 2022

Uh oh!

JackCaoG commented May 24, 2022

Uh oh!

JackCaoG commented May 24, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Revert "Revert "Start XRT server in a separate process in CircleCI CPU test"" #3584

Revert "Revert "Start XRT server in a separate process in CircleCI CPU test"" #3584

Conversation

JackCaoG commented May 19, 2022

Uh oh!

JackCaoG commented May 19, 2022

Uh oh!

JackCaoG commented May 21, 2022

Uh oh!

JackCaoG commented May 23, 2022

Uh oh!

JackCaoG commented May 24, 2022

Uh oh!

JackCaoG commented May 24, 2022

Uh oh!

JackCaoG commented May 24, 2022

Uh oh!

JackCaoG commented May 24, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants