Skip to content

Conversation

@will-cromar
Copy link
Collaborator

No description provided.

@JackCaoG
Copy link
Collaborator

OK This is one of those annoying error that only show up when we build with XLA_CUDA. What I usually do is

  1. Pull CI docker in TPUVM and build with XLA_CUDA=1
  2. find the offending CL and see if I can rebase to a commit before that
  3. if not, maybe revert the pr in a patch and fix it internally

This is what I do and why we ended up with the temporary [ffp_gpu.diff]

@will-cromar
Copy link
Collaborator Author

From the error here:

tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:3498:43: error: ‘CUDNN_TENSOR_REORDERING_INT8x32’ was not declared in this scope
                                         ? CUDNN_TENSOR_REORDERING_INT8x32
                                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I'm going to take an educated guess that this is the problem commit: tensorflow/tensorflow@3e24055

I'll try patching that out while I set up a GPU build environment.

@JackCaoG
Copy link
Collaborator

CI error is unrelated merging for now.

@JackCaoG JackCaoG merged commit dbe596c into master Feb 14, 2023
JackCaoG added a commit that referenced this pull request Feb 16, 2023
* Fix HLO dumping (#4619)

* Update TF pin to 2/13 (#4615)

* Update TF pin to 2/13

* Fix pinned commit

* Add patch to revert TF 3e24055

* Add comment to new patch

* Fix patch command in TPU CI (#4623)

* Skip execution for extract_compiled_graph (#4612)

* Only warm up cache for dynamo extract_graph step

* Add missing config

* Make sure warm up run does not cause place holder to be created

* Fix tests

* Disable failing `test_operations.py` tests on TPU (#4622)

* Disable `test_operations.py` tests failing on TPU

* Add to TPU CI

* Bazel (#4528)

* Replace tensorflow with a bazel external repository

* Basic migration to bazel for xla_client.

* Revert to blob

* Add vscode config.

* Update newlines

* Merge with pjrt client test build changes.

* Migrate tests to new build

* Format test and plugin

* Order imports

* Conditionally apply tf patches; apply pt patches always.

* Format python

* configure formatters

* Mirror TF pin update an fixes in bazel.

* Support local and sandboxed build based on flags

* Add cloud cache URLs for llvm.

* Merge with upstream

* Update TF pin

* Fix patching regression

* Revert "Bazel (#4528)" (#4631)

This reverts commit 3a90f5a.

---------

Co-authored-by: JackCaoG <59073027+JackCaoG@users.noreply.github.com>
Co-authored-by: Will Cromar <wcromar@google.com>
Co-authored-by: stgpetrovic <stgpetrovic@gmail.com>
JackCaoG pushed a commit that referenced this pull request Feb 16, 2023
* Update TF pin to 2/13

* Fix pinned commit

* Add patch to revert TF 3e24055

* Add comment to new patch
chandrasekhard2 pushed a commit that referenced this pull request Feb 22, 2023
* Update TF pin to 2/13

* Fix pinned commit

* Add patch to revert TF 3e24055

* Add comment to new patch
chandrasekhard2 pushed a commit that referenced this pull request Feb 22, 2023
* Update TF pin to 2/13

* Fix pinned commit

* Add patch to revert TF 3e24055

* Add comment to new patch
mateuszlewko pushed a commit that referenced this pull request Mar 15, 2023
* Update TF pin to 2/13

* Fix pinned commit

* Add patch to revert TF 3e24055

* Add comment to new patch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants