Custom dataset #635
-
Hello, I have a bit of a technical question - I have a dataset of 20 million image caption pairs with a healthy distribution of pngs, jpegs, jpgs and tiffs. Should I convert all of them to JPEG before converting them to a webdataset? Also they're all of different sizes - should I resize them to 224? I'd rather crop them to 224 during training. |
Beta Was this translation helpful? Give feedback.
Replies: 8 comments 1 reply
-
You have indeed 2 choices
1. Prepare data during data loading
2. Prepare data in advance
The benefit of preparing data in advance are
1. Saving compute during training: by having uniform sample and shard sizes.
2. Reducing chances of bad data creeping in
I'd recommend you
* convert it all to jpg
* resize all images bigger than smallest length 224 to 224xH or Wx224
You can totally do it all during data loading but it may make things more
complicated
…On Thu, Aug 24, 2023, 22:35 Cyril Zakka, MD ***@***.***> wrote:
Hello,
I have a bit of a technical question - I have a dataset of 20 million
image caption pairs with a healthy distribution of pngs, jpegs, jpgs and
tiffs. Should I convert all of them to JPEG before converting them to a
webdataset? Also they're all of different sizes - should I resize them to
224? I'd rather crop them to 224 during training.
—
Reply to this email directly, view it on GitHub
<#605>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAR437SJ5TFUVOUVLGSN7P3XW63JLANCNFSM6AAAAAA35R7DBY>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
If you do decide to do everything at data loading time, be sure to change this code https://github.com/mlfoundations/open_clip/blob/main/src/training/data.py#L174, so |
Beta Was this translation helpful? Give feedback.
-
@rom1504 Thank you! Do you mean that if the width of an image is greater than 224 pixels, and the width is the larger dimension, the image should be resized to have a width of 224 pixels, and the height should be adjusted proportionally to maintain the aspect ratio. And similarly for if height is > 224? i.e.
|
Beta Was this translation helpful? Give feedback.
-
no smallest length, not larger length eg put an that way if you then center resize to 224x224 you won't need to upscale only crop |
Beta Was this translation helpful? Give feedback.
-
I'm sorry do you mind clarifying the last part a bit? |
Beta Was this translation helpful? Give feedback.
-
I wrote a pair of transforms a little while back that lets you slide between fully shortest and fully longest with a float input, and then a crop or pad fn that fills and crops as needed, works for both square and non-square target sizes... |
Beta Was this translation helpful? Give feedback.
-
It's usually best to resize your dataset to something a step away from your target, to give you a bit of room for augmentation and training at different resolutions w/o rewriting your dataset for each run. This is a tradeoff decision. If you leave images large though, it can be VERY expensive to resize on the fly, if you had 20MP digital camera images, really large medical scans, it just hammers your CPU to downscale to 224, etc. Moving to discussions for future reference. |
Beta Was this translation helpful? Give feedback.
-
Thanks @rwightman! I ended up going with something similar to resize longest - I just had trouble understanding the term. Also I did notice a small bug here: open_clip/src/open_clip/transform.py Line 34 in f692ec9 I think it should be: self.fn = min if fn == 'min' else max instead
|
Beta Was this translation helpful? Give feedback.
@cyrilzakka
I wrote a pair of transforms a little while back that lets you slide between fully shortest and fully longest with a float input, and then…