Input image size #636
Replies: 2 comments
-
Try the model ViT-L-14-336 which resizes to 336x336 and see if that works for your use case |
Beta Was this translation helpful? Give feedback.
-
@MarkIsDoingIt Such is life in the world of NN, still stuck with relatively small images at current compute levevls. The original models were trained at the stated image sizes, so that extra 'detail' that would be lost from yours won't be of much use to them anways, they never saw such detail in training... you'd need to train from scratch at 1024x1024 and that is incredibly expensive. The 336x336 L/14 is good suggestion and I trained some convnext large and base models w/ 320x320. The convnext models that were trained with augreg can handle larger image sizes than train a bit better than most, and the convnext models will accept any image size as they are fully convolutional, you'll see significant drop in accuracy by 1024x1024 though since feature scales will be very different. Moving to discussions for future ref. |
Beta Was this translation helpful? Give feedback.
-
I want to use a pretrained vision transformer from clip to extract feature from images. My original image size is 10241024. What is the largest input image size for any clip pretrained version? Resize the image to classical 224224 will have a loss of information. Thanks!
Beta Was this translation helpful? Give feedback.
All reactions