-
Notifications
You must be signed in to change notification settings - Fork 25.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixes clip interpolate #30783
base: main
Are you sure you want to change the base?
fixes clip interpolate #30783
Conversation
dd72bd1
to
6ed7b47
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
A few general comments:
interpolate_pos_encoding
should be a boolean- It should be possible to call the model with this flag i.e.
model(**inputs, interpolate_pos_encoding)
- All the docstrings should be updated to include this argument
- Print statements should be removed
- Tests should make sure that the process image being passed to the model is not the default size
@@ -1099,6 +1136,7 @@ def forward( | |||
output_attentions: Optional[bool] = None, | |||
output_hidden_states: Optional[bool] = None, | |||
return_dict: Optional[bool] = None, | |||
interpolate_pos_encoding: Optional[bool] = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value should be True or False, but not None
interpolate_pos_encoding: Optional[bool] = False, | |
interpolate_pos_encoding: bool = False, |
image_processor = BridgeTowerProcessor.from_pretrained(model_name) | ||
|
||
image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png") | ||
inputs = image_processor(text="what's in the image", images=image, return_tensors="pt").to(torch_device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
image_processor = ChineseCLIPProcessor.from_pretrained(model_name) | ||
|
||
image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png") | ||
inputs = image_processor(text="what's in the image", images=image, return_tensors="pt").to(torch_device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ssme here
# to visualize self-attention on higher resolution images. | ||
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32").to(torch_device) | ||
|
||
image_processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32", size=480) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Three comments:
- This is returning the processor, not the image_processor
size
should be a dictionary e.g.size={"shortest_edge": 480}
here- This won't test the interpolation, because the image processor crops after resizing.
crop_size
also has to be overriden
image_processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32", size=480) | |
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32", size=480) |
processor = AutoProcessor.from_pretrained("microsoft/kosmos-2-patch14-224", padding_side="left") | ||
|
||
image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png") | ||
inputs = processor(text="what's in the image", images=image, return_tensors="pt").to(torch_device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here - parameters affecting output size have to be updated
# to visualize self-attention on higher resolution images. | ||
model = XCLIPModel.from_pretrained("microsoft/xclip-base-patch32").to(torch_device) | ||
|
||
image_processor = XCLIPProcessor.from_pretrained("microsoft/xclip-base-patch32", size=480) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here:
- returns a processor
- needs to override crop size
- size and crop_size should be dicts
What does this PR do?
solves : #30579
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@amyeroberts
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.