Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running on CIFAR 10 #57

Open
DushyantSahoo opened this issue Jul 8, 2022 · 14 comments
Open

Running on CIFAR 10 #57

DushyantSahoo opened this issue Jul 8, 2022 · 14 comments

Comments

@DushyantSahoo
Copy link

Hi,

I am trying to train and sample using CIFAR 10 dataset. Below is the code for it.

from keras.datasets import mnist
import torch
from denoising_diffusion_pytorch import Unet, GaussianDiffusion, Trainer
import numpy as np
import tensorflow as tf

model = Unet(
    dim = 16,
    dim_mults = (1, 2, 4)
)

diffusion = GaussianDiffusion(
    model,
    image_size = 32,
    timesteps = 1000,   # number of steps
    loss_type = 'l1'    # L1 or L2
)

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
x_train = np.asarray(x_train)
x_train = x_train.astype(np.float16)
new_train = torch.from_numpy(np.swapaxes(x_train,1,3))
training_images = torch.randn(8, 3, 128, 128) # images are normalized from 0 to 1

trainer = Trainer(
    diffusion,
    new_train,
    train_batch_size = 128,
    train_lr = 1e-4,
    train_num_steps = 70000,         # total training steps
    gradient_accumulate_every = 2,    # gradient accumulation steps
    ema_decay = 0.9999,                # exponential moving average decay
    amp = True                        # turn on mixed precision
)
trainer.train()

I modified Trainer such that it could take the dataset. The original Trainer had the below code

self.ds = Dataset(folder, self.image_size, augment_horizontal_flip = augment_horizontal_flip)
dl = DataLoader(self.ds, batch_size = train_batch_size, shuffle = True, pin_memory = True, num_workers = cpu_count())

which I modified to

my_dataset = TensorDataset(data) # create your datset
dl = DataLoader(data, batch_size = train_batch_size, shuffle = True, pin_memory = True, num_workers = cpu_count())
self.dl = cycle(dl)

The training error in the above case goes to inf after 20k iterations. If I stop before that and sample from it, the images are bunch of random colors. Is there any script which I can use to generate samples from CIFAR10?

Thank You

@LangdonYu
Copy link

Maybe you can convert the cifar10 data set to png format images and store them in a file, and then train according to the author's second method

@HalcyonForest
Copy link

Maybe you can convert the cifar10 data set to png format images and store them in a file, and then train according to the author's second method

And how to solve the problem with the fact that there is a separate folder for each class? I unpacked the dataset into png, but when I try to teach the model I get an error: Value error: num_samples should be a positive integer, but got num_samples=0. This indicates that model can't read the dataset. I hoped that turning cifar10 into png will help with this problem but it didn't

@Allencheng97
Copy link

Hi, did you solve this problem?

@HalcyonForest
Copy link

Hi, did you solve this problem?

@Allencheng97

Yes, I forked this repo and changed torch.Dataset() to torch.CIFAR10() (line ~~ 721-719)

@Allencheng97
Copy link

Hi, did you solve this problem?

@Allencheng97

Yes, I forked this repo and changed torch.Dataset() to torch.CIFAR10() (line ~~ 721-719)
Thanks!

@greens007
Copy link

I have found what caused it. You should set amp to False when training on cifar10. When I did this, the model can converge and generate normal pictures instead of a bunch of random colours.

@DevJake
Copy link

DevJake commented Aug 24, 2022

I have found what caused it. You should set amp to False when training on cifar10. When I did this, the model can converge and generate normal pictures instead of a bunch of random colours.

Would it be possible to share your code? I'm having some issues getting my version to actually converge, despite also using cifar10. Thanks!

@SilvesterYu
Copy link

I have found what caused it. You should set amp to False when training on cifar10. When I did this, the model can converge and generate normal pictures instead of a bunch of random colours.

Would it be possible to share your code? I'm having some issues getting my version to actually converge, despite also using cifar10. Thanks!

Hi, is your problem resolved? I am facing a similar issue

@DevJake
Copy link

DevJake commented Sep 21, 2022

My use case for this library was slightly different than its original purpose... My version is considerably modified. My version does resolve the issue relating to training on CIFAR-10, although how much value its modifications will be to you may vary.

You can check out my repository at DevJake/EEG-diffusion-pytorch. Let me know if it's of use!

@LangdonYu
Copy link

Hi, did you solve this problem?

@Allencheng97

Yes, I forked this repo and changed torch.Dataset() to torch.CIFAR10() (line ~~ 721-719)

Yes, your method is effective, I heard from my lab that converting images to png format for training is a bit accuracy-damaging

@DevJake
Copy link

DevJake commented Oct 12, 2022

I wonder why that is... I would've figured the subtle compression applied by JPG format would cause potential data loss and thus losses in performance. Equally, it might act as a form of very low-level image augmentation by adding in artefacts. Got anything more on your findings? I'd be interested to know how it was determined

@kaka45inablink
Copy link

kaka45inablink commented Mar 29, 2023

#57 (comment)

I face same problem with ffhq, fixed when setting amp=False.
But why?@greens007

@baizhenzheng
Copy link

@LangdonYu
how should I modify it specifically, I did not find torch.Dataset() in 721 in the denoising_diffusion_pytorch.py file. Thanks!

@tsWen0309
Copy link

@LangdonYu how should I modify it specifically, I did not find torch.Dataset() in 721 in the denoising_diffusion_pytorch.py file. Thanks!

Did you solve this problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants