Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixes clip interpolate #30783

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nileshkokane01
Copy link
Contributor

@nileshkokane01 nileshkokane01 commented May 13, 2024

What does this PR do?

solves : #30579

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@amyeroberts

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

A few general comments:

  • interpolate_pos_encoding should be a boolean
  • It should be possible to call the model with this flag i.e. model(**inputs, interpolate_pos_encoding)
  • All the docstrings should be updated to include this argument
  • Print statements should be removed
  • Tests should make sure that the process image being passed to the model is not the default size

tests/models/x_clip/test_modeling_x_clip.py Show resolved Hide resolved
tests/models/altclip/test_modeling_altclip.py Show resolved Hide resolved
@@ -1099,6 +1136,7 @@ def forward(
output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
return_dict: Optional[bool] = None,
interpolate_pos_encoding: Optional[bool] = False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value should be True or False, but not None

Suggested change
interpolate_pos_encoding: Optional[bool] = False,
interpolate_pos_encoding: bool = False,

image_processor = BridgeTowerProcessor.from_pretrained(model_name)

image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png")
inputs = image_processor(text="what's in the image", images=image, return_tensors="pt").to(torch_device)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

image_processor = ChineseCLIPProcessor.from_pretrained(model_name)

image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png")
inputs = image_processor(text="what's in the image", images=image, return_tensors="pt").to(torch_device)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ssme here

# to visualize self-attention on higher resolution images.
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32").to(torch_device)

image_processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32", size=480)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three comments:

  • This is returning the processor, not the image_processor
  • size should be a dictionary e.g. size={"shortest_edge": 480} here
  • This won't test the interpolation, because the image processor crops after resizing. crop_size also has to be overriden
Suggested change
image_processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32", size=480)
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32", size=480)

processor = AutoProcessor.from_pretrained("microsoft/kosmos-2-patch14-224", padding_side="left")

image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png")
inputs = processor(text="what's in the image", images=image, return_tensors="pt").to(torch_device)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here - parameters affecting output size have to be updated

# to visualize self-attention on higher resolution images.
model = XCLIPModel.from_pretrained("microsoft/xclip-base-patch32").to(torch_device)

image_processor = XCLIPProcessor.from_pretrained("microsoft/xclip-base-patch32", size=480)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here:

  • returns a processor
  • needs to override crop size
  • size and crop_size should be dicts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants