Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve mean/std swap for VITDet backbone #2087

Merged
merged 1 commit into from
Sep 28, 2023

Conversation

ianstenbit
Copy link
Contributor

@tirthasheshpatel does your SAM demo perform as-expected with this change?

Copy link
Contributor

@tirthasheshpatel tirthasheshpatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ianstenbit for the quick fix! Quoting from #2086 (comment):

Here's the demo I usually run to test that the model is close to the original model at facebookresearch/segment_anything.

Just one non-blocking suggestion: What do you think about keeping the normalization step an opt-in rather than always having it on? This way the users can choose the preprocessing they want on their images.

@ianstenbit
Copy link
Contributor Author

Here's the demo

@tirthasheshpatel what is the purpose of making this optional? You mean the mean/stddev bit I assume?
Throughout KerasCV we've standardized as 0-255 being the default input format, so unless there's a compelling reason I'd like to stick with the standard.

@ianstenbit ianstenbit marked this pull request as ready for review September 26, 2023 15:24
@tirthasheshpatel
Copy link
Contributor

what is the purpose of making this optional? You mean the mean/stddev bit I assume?

Yes, I was referring to the mean/stddev part. The reason for doing that is to give users more control over input preprocessing (which might be useful for training). But I am not against changing the 0-255 input precedent (having that is great!). Just something like an imagenet_rescaling=True argument that'd let the users control whether they want that or not. What do you think?

@ianstenbit
Copy link
Contributor Author

what is the purpose of making this optional? You mean the mean/stddev bit I assume?

Yes, I was referring to the mean/stddev part. The reason for doing that is to give users more control over input preprocessing (which might be useful for training). But I am not against changing the 0-255 input precedent (having that is great!). Just something like an imagenet_rescaling=True argument that'd let the users control whether they want that or not. What do you think?

I'm not sure what the value of this would be -- is there a reason that a user would care how the model internally does rescaling, as long as the API is clear about what is expected from the caller?

@ianstenbit
Copy link
Contributor Author

/gcbrun

@tirthasheshpatel
Copy link
Contributor

I'm not sure what the value of this would be -- is there a reason that a user would care how the model internally does rescaling, as long as the API is clear about what is expected from the caller?

I just thought there would be cases where the ImageNet rescaling would mess a bit with padded inputs (making borders non-zero). Although there is a way to avoid that (using mean as the padding value), just that some users might find it unintuitive.

Thinking in the wild here :) Please feel free to make a decision based on your best judgment!

@ianstenbit
Copy link
Contributor Author

I'm not sure what the value of this would be -- is there a reason that a user would care how the model internally does rescaling, as long as the API is clear about what is expected from the caller?

I just thought there would be cases where the ImageNet rescaling would mess a bit with padded inputs (making borders non-zero). Although there is a way to avoid that (using mean as the padding value), just that some users might find it unintuitive.

Thinking in the wild here :) Please feel free to make a decision based on your best judgment!

Ahh I see your point here -- I think in this case it's okay. Zero padding will still end up being represented as the same pixel values as all-black pixels, which is logically the same as if no rescaling were done, it's just that the values aren't literally 0.

@tirthasheshpatel
Copy link
Contributor

Zero padding will still end up being represented as the same pixel values as all-black pixels, which is logically the same as if no rescaling were done, it's just that the values aren't literally 0.

Ah OK, makes sense. Thanks for clarifying! Feel free to merge!

@ianstenbit ianstenbit merged commit 0edf88c into keras-team:master Sep 28, 2023
8 of 9 checks passed
@ianstenbit ianstenbit deleted the vitdet-mean-std branch September 28, 2023 15:38
ghost pushed a commit to y-vectorfield/keras-cv that referenced this pull request Nov 16, 2023
yuvraj-wale pushed a commit to yuvraj-wale/keras-cv that referenced this pull request Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants