R/W generated dataset to/from disk #738

zhixiangteoh · 2023-05-03T21:08:47Z

Two-step process.

Generate; e.g.,

python case_studies/721_decoder_speedup/main.py mode=generate cached_simulator.file_data_capacity=100 simulator.n_batches=5

generator.py instantiates and uses a SimulatedDataset object to render images in cfg.simulator.prior.batch_size minibatches, then processes these generated minibatches by concatenation and flattening, before writing out to .pkl files via pickle.

Train; e.g.,

python case_studies/721_decoder_speedup/main.py mode=train training.use_cached_simulator=true cached_simulator.file_data_capacity=100 simulator.n_batches=5

training.use_cached_simulator=true config causes the training step to use datamodule=instantiate(cfg.cached_simulator) in PyTorch Lightning's trainer.fit.

Addresses #721.

codecov · 2023-05-04T20:44:09Z

Codecov Report

Merging #738 (9c664f1) into master (63d9f35) will increase coverage by 0.07%.
The diff coverage is 92.80%.

@@            Coverage Diff             @@
##           master     #738      +/-   ##
==========================================
+ Coverage   88.33%   88.41%   +0.07%     
==========================================
  Files          13       14       +1     
  Lines        1132     1252     +120     
==========================================
+ Hits         1000     1107     +107     
- Misses        132      145      +13

Flag	Coverage Δ
unittests	`88.41% <92.80%> (+0.07%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
bliss/train.py	`62.92% <66.66%> (-0.30%)`	⬇️
bliss/generate.py	`92.95% <92.95%> (ø)`
bliss/simulator/simulated_dataset.py	`90.38% <95.83%> (-4.36%)`	⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

case_studies/721_decoder_speedup/config.yaml

bliss/simulator/simulated_dataset.py

bliss/generate.py

bliss/train.py

bliss/generate.py

jeff-regier

Looks good! Please see requested changes inline.

jeff-regier · 2023-05-05T13:57:47Z

Also, please make sure training iterations / second is at least 5.5.

bliss/generate.py

zhixiangteoh · 2023-05-08T20:44:29Z

Also, please make sure training iterations / second is at least 5.5.

Currently, averaging to about 4 training iterations / second (in each epoch, starts out slower, then speeds up to > 5 it/s).

bliss/simulator/simulated_dataset.py

DiskDataset renders images (in disk_batch_size) chunks and writes/reads them to/from disk via pickle. Encoder uses this DiskDataset in ImagePrior.batch_size batches. Currently does not handle multiple workers—potentially want multiple workers to write/read at the same time.

Also fix formatting.

`case_studies/721_decoder_speedup/main.py mode=generate` renders images via decoder and writes to pkl files on disk. Later `.../main.py mode=train` uses cached image data for training.

DiskDataset renders images (in disk_batch_size) chunks and writes/reads them to/from disk via pickle. Encoder uses this DiskDataset in ImagePrior.batch_size batches. Currently does not handle multiple workers—potentially want multiple workers to write/read at the same time.

Also fix formatting.

`case_studies/721_decoder_speedup/main.py mode=generate` renders images via decoder and writes to pkl files on disk. Later `.../main.py mode=train` uses cached image data for training.

Shuffles disk-cached dataset on each epoch via `random.shuffle` when creating iterator in `__iter__`.

zhixiangteoh · 2023-05-10T15:21:54Z

Also, please make sure training iterations / second is at least 5.5.

With configuration as in case_studies/721_diskrw_gsimages/config.yaml, running

python case_studies/721_diskrw_gsimages/main.py 'mode=train'

gives median 6.1 it/s.

bliss/simulator/simulated_dataset.py

jeff-regier · 2023-05-10T15:28:15Z

bliss/simulator/simulated_dataset.py

+        return DataLoader(self.valid, batch_size=None, num_workers=0)
+
+    def test_dataloader(self):
+        return DataLoader(self, batch_size=None, num_workers=0)


Is the test set the same as the training set? That seems like a problem. We should probably have separate files that contain the test set.

tests/conftest.py

tests/testing_config.yaml

jeff-regier · 2023-05-10T15:32:34Z

With configuration as in case_studies/721_diskrw_gsimages/config.yaml, running
python case_studies/721_diskrw_gsimages/main.py 'mode=train'
gives median 6.1 it/s.

Excellent!

jeff-regier

Looks great!

zhixiangteoh self-assigned this May 3, 2023

zhixiangteoh force-pushed the 721-diskrw-gsimages branch 2 times, most recently from bbfaf2f to 4933705 Compare May 4, 2023 20:35

zhixiangteoh requested a review from jeff-regier May 4, 2023 20:44

zhixiangteoh changed the title ~~[WIP] R/W generated dataset to/from disk~~ R/W generated dataset to/from disk May 4, 2023

jeff-regier linked an issue May 4, 2023 that may be closed by this pull request

speed up the ImageDecoder #721

Closed

6 tasks