speed up the ImageDecoder #721

jeff-regier · 2023-04-19T18:22:37Z

ImageDecoder is quite slow currently. Even with 32 workers, it can't keep up with the encoder. GalSim's ScatteredImageBuilder may less us write batches of light sources more efficiently. It could also be useful to look at how imSim interacts with GalSim.

If we can't speed up ImageDecoder by at least 10x, then as an alternative we can generate simulated training images ahead of time and write them to disk. We may also want to use data augmentation in this case to make more of each image we generate: apply random 90-degree rotations and small translations. Such data augmentation would need to be reflected in the tile catalog too.

Steps to using cached images:

Modify case_studies/summer_template/main.py to support a new mode: generate
Create a new file named bliss/generate.py that is in some sense analogous to predict.py and train.py. It would contain a function called generate(...) that takes cfg as an argument
When called generate(cfg) should create a SimulatedDataset object (using the instantiate function provided by hydra, as in train.py), generate a lot of data, serialized the data, and write the data to a file whose name is specified in the cfg object.
case_studies/summer_template/config.yaml should probably have a new top-level entry called cached_simulator (analogous to simulator, but with many fewer fields). This is where we'd store that path the filename (or directory) that contains the cache images
simulated_dataset.py should contain an additional class called CachedSimulatedDataset, with a constructor that takes a filename (or directory path) of the cached images and loads the file into memory.
CachedSimulatedDataset won't use any workers because it's just looking up loaded data. It will provide minibatches that are sampled at random from the available cached images.

The text was updated successfully, but these errors were encountered:

jeff-regier assigned zhixiangteoh May 1, 2023

zhixiangteoh mentioned this issue May 3, 2023

R/W generated dataset to/from disk #738

Merged

jeff-regier linked a pull request May 4, 2023 that will close this issue

R/W generated dataset to/from disk #738

Merged

jeff-regier closed this as completed in #738 May 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speed up the ImageDecoder #721

speed up the ImageDecoder #721

jeff-regier commented Apr 19, 2023 •

edited by zhixiangteoh

Loading

speed up the ImageDecoder #721

speed up the ImageDecoder #721

Comments

jeff-regier commented Apr 19, 2023 • edited by zhixiangteoh Loading

jeff-regier commented Apr 19, 2023 •

edited by zhixiangteoh

Loading