Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RetinaNet TPU Compatibility [looking for a long term fix instead of #1021] #1018

Closed
LukeWood opened this issue Nov 15, 2022 · 20 comments
Closed

Comments

@LukeWood
Copy link
Contributor

We can enable this in predict().

This might be a big performance delta, but is much lower priority than TPU compatibility.

@bhack
Copy link
Contributor

bhack commented Nov 20, 2022

Don't we want to use this ticket to investigate more #1021 (comment)?

@LukeWood
Copy link
Contributor Author

Sure, we can further investigate.

@LukeWood LukeWood reopened this Nov 20, 2022
@bhack
Copy link
Contributor

bhack commented Nov 20, 2022

@ianstenbit if you can share your gs:// pre-populated it could be useful as re-run the extraction is super-slow.

@ianstenbit
Copy link
Contributor

@bhack sorry for the delay on this.
I've moved this data to data_dir="gs://keras-cv/datasets", where you should have access. I've also updated my repro colab to reflect the change.

Also -- extraction on GCS hangs (see tensorflow/datasets#4115), so make sure to pass download=False to your tfds.load call.

@ianstenbit ianstenbit changed the title RetinaNet decode predictions in predict() does not use graph mode. RetinaNet TPU Compatibility [looking for a long term fix instead of #1021] Nov 21, 2022
@bhack
Copy link
Contributor

bhack commented Nov 21, 2022

@ianstenbit Thanks

@bhack
Copy link
Contributor

bhack commented Nov 21, 2022

I don't think that the public url is gs://keras-cv/datasets. What we need to use?

@ianstenbit
Copy link
Contributor

When I pull the public URL of e.g the dataset info, I am seeing this:

https://storage.googleapis.com/keras-cv/datasets/voc/2007/4.0.0/dataset_info.json

So I suppose you could try https://storage.googleapis.com/keras-cv/datasets -- but I would expect the gs version to work, so I'm not sure

@bhack
Copy link
Contributor

bhack commented Nov 21, 2022

What is the value of:

Once public access has been granted, Copy URL appears for each object in the public access column. You can click this button to get the public URL for the object.

@bhack
Copy link
Contributor

bhack commented Nov 21, 2022

It is not working:

RuntimeError: PermissionDeniedError: Failed to construct dataset voc: Error executing an HTTP request: HTTP response code 401 with body '{
  "error": {
    "code": 401,
    "message": "Anonymous caller does not have storage.objects.get access to the Google Cloud Storage object. Permission 'storage.objects.get' denied on resource (or it may not exist).",
    "errors": [
      {
        "message": "Anonymous caller does not have storage.objects.get access to the Google Cloud Storage object. Permission 'storage.objects.get' denied on resource (or it may not exist).",
        "domain": "global",
        "reason": "required",
        '
	 when reading metadata of gs://kerascv-dataset/dataset/voc/2007

Can you navigate it in a incognito browser tab?

@ianstenbit
Copy link
Contributor

Yes, I am able to reach it when I don't have an auth token (e.g. incognito mode)

Are you not able to reach those files in incognito mode?

@ianstenbit
Copy link
Contributor

One other thing you can do is load+extract the VOC dataset locally and then upload the resulting tfds folder to a bucket that you have control over. (That's how I got these records in GCS in the first place, since extraction fails on TFDS.load for gcs)

@bhack
Copy link
Contributor

bhack commented Nov 21, 2022

One other thing you can do is load+extract the VOC dataset locally and then upload the resulting tfds folder to a bucket that you have control over. (That's how I got these records in GCS in the first place, since extraction fails on TFDS.load for gcs)

Yes I know but it is faster if I can reuse your own.

https://storage.googleapis.com/keras-cv/datasets/voc/ in incognito mode:

<Error>
<Code>AccessDenied</Code>
<Message>Access denied.</Message>
<Details>
Anonymous caller does not have storage.objects.get access to the Google Cloud Storage object. Permission 'storage.objects.get' denied on resource (or it may not exist).
</Details>
</Error>

@ianstenbit
Copy link
Contributor

Interesting. That works from my work device as well as my personal device.

I'm thinking that tfds.load probably requires object list permission in addition to access to the underlying files. I don't have the power to give that permission on any of these GCS buckets, so I'm not sure if there's anything else I can do.

FWIW you shouldn't need VOC specifically to repro the issue.
You could create a fake bbox dataset with this method and use that as train_ds and eval_ds.

If you'd like I can update the repro colab to use that instead of VOC

@bhack
Copy link
Contributor

bhack commented Nov 21, 2022

Thank you but I cannot find a free TPU anymore now..

@bhack
Copy link
Contributor

bhack commented Nov 21, 2022

Also on tf-nightly I still see:

(0) INVALID_ARGUMENT: {{function_node __inference_train_function_133447}} TF to XLA legalization failed: <unknown>:0: error: loc(fused["SigmoidGrad:", "gradient_tape/FocalLoss/Sigmoid/SigmoidGrad"]): 'mhlo.constant' op result #0 must be statically shaped tensor of floating-point or pred (AKA boolean or 1-bit integer) or 8/16/32/64-bit signless integer or 8/16/32/64-bit unsigned integer or complex type with 32-bit float or 64-bit float elements values, but got 'tensor<?x?x20xf32>'
<unknown>:0: note: loc(fused["SigmoidGrad:", "gradient_tape/FocalLoss/Sigmoid/SigmoidGrad"]): see current operation: %5996 = "mhlo.constant"() {value = dense<1.000000e+00> : tensor<?x?x20xf32>} : () -> tensor<?x?x20xf32>

I cannot dump on the local filesystem or GCS:

os.environ["TF_DUMP_GRAPH_PREFIX"]= "gs://keras_cv/dataset/" # or /tmp/
os.environ["TF_XLA_FLAGS"]="--tf_xla_clustering_debug --tf_xla_auto_jit=2"
os.environ["XLA_FLAGS"]="--xla_dump_hlo_as_text --xla_dump_to=/tmp/generated"

@smit-hinsu How we could dump on Colab TPU (I still see tensorflow/tensorflow#49702)?

@smit-hinsu
Copy link

Sorry, I don't have an idea on that.

@bhack
Copy link
Contributor

bhack commented Nov 21, 2022

@smit-hinsu Do you know someone who could help on this failed legalization?

@bhack
Copy link
Contributor

bhack commented Nov 21, 2022

We have a problem with:
tensorflow/tensorflow@76079c0#diff-20844a653ebc1cb9b4b2adb26672259647f08dc1e78a06458d2e753a1449336bR2070

But in our case we have tensor<?x?x20xf32> in the sigmoid grad where the constrain is #0 must be statically shaped tensor

if self.from_logits:
y_pred = tf.nn.sigmoid(y_pred)

@bhack
Copy link
Contributor

bhack commented Nov 22, 2022

Tracking this upstream at tensorflow/tensorflow#58645

@tanzhenyu tanzhenyu self-assigned this Dec 24, 2022
@tanzhenyu
Copy link
Contributor

Done through recent series of refactoring

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants