-
Notifications
You must be signed in to change notification settings - Fork 559
Make the topology fetch API do not require the XRT computation client… #1003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… to be setup. Use a single init for TPU mesh setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we tested this works on a pod slice?
I did on Borg. |
Bad news, I get the error message: Full client-side logs: https://gist.github.com/jysohn23/d6631b3cde065500ebf9a4d462492ef4. Have you tried testing with a slice that hadn't already been initialized previously? With a previously initialized (N^2 inits) TPU slice, trying this single init way had worked for me previously which was misleading. Would you mind reproing the failure on a fresh slice? (not that it matters, but I tried on v3-32 and v3-512). |
|
Yes, freshly handed out by Borg.
I pushed the CL this afternoon, are you sure it has been picked up by
nightly?
…On Thu, Sep 5, 2019 at 21:24 Jin Young Sohn ***@***.***> wrote:
Have we tested this works on a pod slice?
I did on Borg.
Bad news, I get the error message:
2019-09-06 04:18:22.859156: E tensorflow/compiler/xla/xla_client/xla_util.cc:72] The TPU system has not been initialized.
Full client-side logs:
https://gist.github.com/jysohn23/d6631b3cde065500ebf9a4d462492ef4.
Have you tried testing with a slice that hadn't already been initialized
previously? With a previously initialized (N^2 inits) TPU slice, trying
this single init way had worked for me previously which was misleading.
Would you mind reproing the failure on a fresh slice? (not that it matters,
but I tried on v3-32 and v3-512).
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#1003?email_source=notifications&email_token=ADR5YXBXVKLBAPPFA32VPPLQIHLQPA5CNFSM4IUBK6DKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6BVXFI#issuecomment-528702357>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADR5YXD6Y2FSCXRWHQGLTP3QIHLQPANCNFSM4IUBK6DA>
.
|
Yes, I triggered an on-demand nightly after your PR was merged and verified |
|
PR?
You meant CL?
There need to be the TF mesh CL in the TFRC servers on the TPU VMs.
…On Thu, Sep 5, 2019 at 21:45 Jin Young Sohn ***@***.***> wrote:
Yes, freshly handed out by Borg. I pushed the CL this afternoon, are you
sure it has been picked up by nightly?
Yes, I triggered an on-demand nightly after your PR was merged and
verified pip freeze and we don't see the manual configs one by one on
each worker.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#1003?email_source=notifications&email_token=ADR5YXBSON33JWU2R5VXEPTQIHN5NA5CNFSM4IUBK6DKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6BWVFI#issuecomment-528706197>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADR5YXA3WFJ63XOS4IEPLCDQIHN5NANCNFSM4IUBK6DA>
.
|
Oh, yes you are right, thanks for reminding me; I forgot about the TFRC update of course..! I'll try this tomorrow. |
… to be setup.
Use a single init for TPU mesh setup.