out of memory(cpu) when finetuning with 50M text image pairs #1129

jucic · 2024-02-21T09:28:58Z

thanks for your nice job, there is a "out of memory(cpu)" error when finetuning with 50M text image pairs, after loading the data(takes about 2 days), the process died in the beginning of the training. we found the reason is out of memory(cpu), for details please see the following screenshot. For now I am trying to split the dataset into 10 patches and loading one of these 10 patches every epoch. Is there any solution else to support huge dataset such as 100M text image pairs?

kohya-ss · 2024-02-22T03:24:45Z

I'll test with pseudo 50M text image pairs. How much cpu RAM do your system have?

jucic · 2024-02-22T06:22:01Z

I'll test with pseudo 50M text image pairs. How much cpu RAM do your system have?

@kohya-ss Thanks, about 1007GiB RAM every machine.

kohya-ss · 2024-02-22T13:19:18Z

Thank you! To be honest, the script is not intended to handle that large a volume of images... However, it should work with appropriate options...

If you cache the latents to the memory with --cache_latents without --cache_latents_to_disk, the amount of memory used by latents will be HWC * sizeof(float) * num of images = 1281284 * 4 * 50M = 13TB. So I guess you are not caching the latents or use --cache_latents_to_disk. Is this correct?

dill-shower · 2024-02-22T14:39:46Z

If you cache the latents to the memory with --cache_latents without --cache_latents_to_disk, the amount of memory used by latents will be H_W_C * sizeof(float) * num of images = 128_128_4 * 4 * 50M = 13TB.

If we cache the latents to disk, will the same 13TB capacity be used on disk?

Is there any way to speed up the loading of data into the script? Most GPU servers use hourly billing and it is inefficient to spend 2 days just loading images into RAM without even starting training

kohya-ss · 2024-02-22T15:08:55Z

If we cache the latents to disk, will the same 13TB capacity be used on disk?

It should be. Therefore I think it may be better to disable the latent caching in the large scale finetuning.

The script doesn't load the image if the caching is disabled or the cache is created on the disk in advance, but the script checks the size of images. It will take a long time (but 2 days are too long, so I think the latents may be cached.)

thojmr · 2024-02-23T17:40:15Z

You can always add multiprocessing for loading files to speed things up.

You can add it anywhere you are loading the images/captions/metadata

image size loading (like below)
processing metadata
checking cache validity

train_util.py

from multiprocessing import Pool
...
logger.info("loading image sizes (threadpooled).")
pool = Pool()                                            # Create a multiprocessing Pool
iterator = pool.imap(load_image_size, self.image_data.items())  # process data_inputs iterable with pool
for key, image_size in tqdm(iterator, total=len(self.image_data.items()), smoothing=0.01):
    self.image_data[key].image_size = image_size

# fetch image sizes  (ran in  threadpool to load more quickly)
def load_image_size(image_data) -> Tuple[str, Any]:
    image_key, image_info = image_data
    image_size = None

    if image_info.image_size is None:
        image = Image.open(image_info.absolute_path)
        image_size = image.size
    else:
        # return current size if it already exists
        return image_key, image_info.image_size

    return image_key, image_size

synchronous image loading is a pain, so the above change was the largest gain in time saved.

Edit: off the top of my head, I think it now takes 20 minutes to load a 3m size dataset if images are pre-cached. (nvme drive)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

out of memory(cpu) when finetuning with 50M text image pairs #1129

out of memory(cpu) when finetuning with 50M text image pairs #1129

jucic commented Feb 21, 2024 •

edited

Loading

kohya-ss commented Feb 22, 2024

jucic commented Feb 22, 2024 •

edited

Loading

kohya-ss commented Feb 22, 2024

dill-shower commented Feb 22, 2024 •

edited

Loading

kohya-ss commented Feb 22, 2024 •

edited

Loading

thojmr commented Feb 23, 2024 •

edited

Loading

out of memory(cpu) when finetuning with 50M text image pairs #1129

out of memory(cpu) when finetuning with 50M text image pairs #1129

Comments

jucic commented Feb 21, 2024 • edited Loading

kohya-ss commented Feb 22, 2024

jucic commented Feb 22, 2024 • edited Loading

kohya-ss commented Feb 22, 2024

dill-shower commented Feb 22, 2024 • edited Loading

kohya-ss commented Feb 22, 2024 • edited Loading

thojmr commented Feb 23, 2024 • edited Loading

train_util.py

jucic commented Feb 21, 2024 •

edited

Loading

jucic commented Feb 22, 2024 •

edited

Loading

dill-shower commented Feb 22, 2024 •

edited

Loading

kohya-ss commented Feb 22, 2024 •

edited

Loading

thojmr commented Feb 23, 2024 •

edited

Loading