-
Notifications
You must be signed in to change notification settings - Fork 889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
out of memory(cpu) when finetuning with 50M text image pairs #1129
Comments
I'll test with pseudo 50M text image pairs. How much cpu RAM do your system have? |
@kohya-ss Thanks, about 1007GiB RAM every machine. |
Thank you! To be honest, the script is not intended to handle that large a volume of images... However, it should work with appropriate options... If you cache the latents to the memory with |
If we cache the latents to disk, will the same 13TB capacity be used on disk? Is there any way to speed up the loading of data into the script? Most GPU servers use hourly billing and it is inefficient to spend 2 days just loading images into RAM without even starting training |
It should be. Therefore I think it may be better to disable the latent caching in the large scale finetuning. The script doesn't load the image if the caching is disabled or the cache is created on the disk in advance, but the script checks the size of images. It will take a long time (but 2 days are too long, so I think the latents may be cached.) |
You can always add multiprocessing for loading files to speed things up. You can add it anywhere you are loading the images/captions/metadata
train_util.pyfrom multiprocessing import Pool
...
logger.info("loading image sizes (threadpooled).")
pool = Pool() # Create a multiprocessing Pool
iterator = pool.imap(load_image_size, self.image_data.items()) # process data_inputs iterable with pool
for key, image_size in tqdm(iterator, total=len(self.image_data.items()), smoothing=0.01):
self.image_data[key].image_size = image_size # fetch image sizes (ran in threadpool to load more quickly)
def load_image_size(image_data) -> Tuple[str, Any]:
image_key, image_info = image_data
image_size = None
if image_info.image_size is None:
image = Image.open(image_info.absolute_path)
image_size = image.size
else:
# return current size if it already exists
return image_key, image_info.image_size
return image_key, image_size synchronous image loading is a pain, so the above change was the largest gain in time saved. Edit: off the top of my head, I think it now takes 20 minutes to load a 3m size dataset if images are pre-cached. (nvme drive) |
thanks for your nice job, there is a "out of memory(cpu)" error when finetuning with 50M text image pairs, after loading the data(takes about 2 days), the process died in the beginning of the training. we found the reason is out of memory(cpu), for details please see the following screenshot. For now I am trying to split the dataset into 10 patches and loading one of these 10 patches every epoch. Is there any solution else to support huge dataset such as 100M text image pairs?
The text was updated successfully, but these errors were encountered: