Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

Index out of error #115

Closed
Zumbalamambo opened this issue Feb 14, 2021 · 8 comments
Closed

Index out of error #115

Zumbalamambo opened this issue Feb 14, 2021 · 8 comments
Assignees
Labels
bug / fix Something isn't working help wanted Extra attention is needed waiting on author
Milestone

Comments

@Zumbalamambo
Copy link

It throws the following error on training

self_supervised/simclr/simclr_module.py", line 249, in optimizer_step
    param_group["lr"] = self.lr_schedule[self.trainer.global_step]
IndexError: index 900 is out of bounds for axis 0 with size 900
@Zumbalamambo Zumbalamambo added bug / fix Something isn't working help wanted Extra attention is needed labels Feb 14, 2021
@aribornstein
Copy link
Contributor

@ananyahjha93

@akihironitta
Copy link
Contributor

Maybe the same as Lightning-Universe/lightning-bolts#436?

@Zumbalamambo
Copy link
Author

Still the same problem even after I have set the max_epochs :(

@ananyahjha93
Copy link
Contributor

@Zumbalamambo are you using the simclr/swav script from bolts? If yes, can you post the num samples in your dataset, your batch size, accelerator count and then max epochs?

@edenlightning edenlightning added this to the 0.2 milestone Mar 22, 2021
@pengbohua
Copy link

pengbohua commented Apr 18, 2021

I met the same issue when I tried to reproduce SwAV pretraining on CIFAR10.

# data
batch_size = 2048
dm = CIFAR10DataModule(data_dir='./data/', batch_size=batch_size, normalize=True)
# loaders are contained in the DataModule which are self consistent

parser = argparse.ArgumentParser('SwAV CIFAR-10')
parser = SwAV.add_model_specific_args(parser)

args = parser.parse_args('')

# model
args.gpus = 1
args.arch = 'resnet18'
args.hidden_mlp = 1024
args.max_epochs = 100
args.dataset = dm
args.batch_size = batch_size
args.size_crops = [32, 16]
args.maxpool1 = False
args.nmb_crops = [2, 1]
args.gaussian_blur = False
args.num_samples = dm.num_samples
dm.train_transforms = SwAVTrainDataTransform(
    size_crops=args.size_crops,
    nmb_crops=args.nmb_crops,
    gaussian_blur=args.gaussian_blur
)

dm.val_transforms = SwAVEvalDataTransform(
    size_crops=args.size_crops,
    nmb_crops=args.nmb_crops,
    gaussian_blur=args.gaussian_blur
)
dm.test_transforms = SwAVEvalDataTransform(
    size_crops=args.size_crops,
    nmb_crops=args.nmb_crops,
    gaussian_blur=args.gaussian_blur
)
print('hypers', args)

#logger 
from pytorch_lightning.loggers import TensorBoardLogger, CSVLogger

csv_logger = CSVLogger("/content/drive/MyDrive/contrastive_learning/Swav/logs", name="SwAV-CIFAR10")
model = SwAV(
**args.__dict__
)


# fit
trainer = pl.Trainer(max_epochs=args.max_epochs, gpus=1, precision=16, logger=csv_logger, callbacks=[EarlyStopping(monitor='val_loss')])
trainer.fit(model, datamodule=dm)


#error message
usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/training_loop.py in optimizer_step(self, optimizer, opt_idx, batch_idx, train_step_and_backward_closure)
    431             on_tpu=self.trainer._device_type == DeviceType.TPU and _TPU_AVAILABLE,
    432             using_native_amp=using_native_amp,
--> 433             using_lbfgs=is_lbfgs,
    434         )
    435 

/usr/local/lib/python3.7/dist-packages/pl_bolts/models/self_supervised/swav/swav_module.py in optimizer_step(self, epoch, batch_idx, optimizer, optimizer_idx, optimizer_closure, on_tpu, using_native_amp, using_lbfgs)
    329         # adjust LR of optim contained within LARSWrapper
    330         for param_group in optimizer.param_groups:
--> 331             param_group["lr"] = self.lr_schedule[self.trainer.global_step]
    332 
    333         # from lightning

IndexError: index 1900 is out of bounds for axis 0 with size 1900

@edenlightning edenlightning modified the milestones: 0.2, 0.3 Apr 19, 2021
@edgarriba
Copy link
Contributor

@Zumbalamambo are you still having those issues ?
BTW, what version of flash do you use ?

@tarunn2799
Copy link

@edgarriba I'm facing the same issue, when I'm trying to train a custom dataset. I'm running 4 gpus, and a batch size of 2048.

@ethanwharris ethanwharris modified the milestones: 0.3, 0.3.x Jun 9, 2021
@Borda Borda modified the milestones: 0.3.x, 0.4 Aug 3, 2021
@ananyahjha93
Copy link
Contributor

This has been fixed in bolts master.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug / fix Something isn't working help wanted Extra attention is needed waiting on author
Projects
None yet
Development

No branches or pull requests

10 participants