-
Notifications
You must be signed in to change notification settings - Fork 581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try this to increase resolution w/o finetuning (Instruction) #62
Comments
Another resolution (1536 x 1152) was tested with multiplication factor of (512/1536) for t with appropriate results. So far this works well for objects that have "good number" of features. |
How to modify the parameter t? def get_cos_sin(self, D, seq_len, device, dtype): |
if you're going from 512 to 1024, you would do: |
@KyunHwan thanks |
Using the default setup, large input images were being resized to 512 x 384 (using DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth). But I wanted results with higher resolution (1024 x 768). So I followed "Extending Context Window of Large Language Models via Position Interpolation" by Meta and changed only the default image_size value of 512 to 1024 inside demo.py and multiplied the variable t inside get_cos_sin method of RoPE2D of croco/models/pos_embed.py by (512/1024). This gave pretty good results, though finetuning is most likely required for better results.
The text was updated successfully, but these errors were encountered: