You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I'm using DINOv2 to pretrain a ViT on a dataset significantly larger than ImageNet22k (between 100M and 1B jpg images). I sticked to the ImageNet22k dataset class for handling and loading data, i.e. utilizing a combination of tarball files for storing images and a single npy file for metadata (start and end offsets + information to know in which tarball file a given image is located). I put the code snippet below.
Unfortunately, I am facing very slow data loading times:
Large tarball files: some tarballs I work with containing as many as 6M images. I suspect this increases RAM usage, which could explain the to slow data loading times -- or even out-of-memory errors -- I face.
To mitigate this issue, I split the large tarballs into smaller ones (of 1Gb). Despite offering some relief by reducing the memory footprint during data loading, this solution doesn't scale well with the batch size : the bigger the batch size, the more tarball files to open/close concurrently, which seems to add significant overhead as it slows the data loading process.
I've tried looking into alternative tools (WebDataset, TorchData), but wasn't successful. I am therefore reaching out for any advice, or alternative strategies to handle large-scale vision datasets. Thank you!
I ran into a similar issue and it really comes down to your devices computational power. One solution if you are running out of memory that I initially used is to write to the Hard Drive and back having it act as pseudo ram in a sense. This is incredibly slow though. If working with that large of a dataset I highly recommend offloading your processing onto a supercluster if you have the ability to do so. If you are simply trying to create and load the dataset I would recommend using hugging face to store and load the dataset in pieces, do not load all at once but rather in batches. Please let me know if I anything I said prior is not clear or not applicable to your situation.
Hi, I'm using DINOv2 to pretrain a ViT on a dataset significantly larger than ImageNet22k (between 100M and 1B jpg images). I sticked to the ImageNet22k dataset class for handling and loading data, i.e. utilizing a combination of tarball files for storing images and a single npy file for metadata (start and end offsets + information to know in which tarball file a given image is located). I put the code snippet below.
Unfortunately, I am facing very slow data loading times:
Large tarball files: some tarballs I work with containing as many as 6M images. I suspect this increases RAM usage, which could explain the to slow data loading times -- or even out-of-memory errors -- I face.
To mitigate this issue, I split the large tarballs into smaller ones (of 1Gb). Despite offering some relief by reducing the memory footprint during data loading, this solution doesn't scale well with the batch size : the bigger the batch size, the more tarball files to open/close concurrently, which seems to add significant overhead as it slows the data loading process.
I've tried looking into alternative tools (WebDataset, TorchData), but wasn't successful. I am therefore reaching out for any advice, or alternative strategies to handle large-scale vision datasets. Thank you!
Dataset code
The text was updated successfully, but these errors were encountered: