You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ImageDecoder is quite slow currently. Even with 32 workers, it can't keep up with the encoder. GalSim's ScatteredImageBuilder may less us write batches of light sources more efficiently. It could also be useful to look at how imSim interacts with GalSim.
If we can't speed up ImageDecoder by at least 10x, then as an alternative we can generate simulated training images ahead of time and write them to disk. We may also want to use data augmentation in this case to make more of each image we generate: apply random 90-degree rotations and small translations. Such data augmentation would need to be reflected in the tile catalog too.
Steps to using cached images:
Modify case_studies/summer_template/main.py to support a new mode: generate
Create a new file named bliss/generate.py that is in some sense analogous to predict.py and train.py. It would contain a function called generate(...) that takes cfg as an argument
When called generate(cfg) should create a SimulatedDataset object (using the instantiate function provided by hydra, as in train.py), generate a lot of data, serialized the data, and write the data to a file whose name is specified in the cfg object.
case_studies/summer_template/config.yaml should probably have a new top-level entry called cached_simulator (analogous to simulator, but with many fewer fields). This is where we'd store that path the filename (or directory) that contains the cache images
simulated_dataset.py should contain an additional class called CachedSimulatedDataset, with a constructor that takes a filename (or directory path) of the cached images and loads the file into memory.
CachedSimulatedDataset won't use any workers because it's just looking up loaded data. It will provide minibatches that are sampled at random from the available cached images.
The text was updated successfully, but these errors were encountered:
ImageDecoder
is quite slow currently. Even with 32 workers, it can't keep up with the encoder. GalSim'sScatteredImageBuilder
may less us write batches of light sources more efficiently. It could also be useful to look at how imSim interacts with GalSim.If we can't speed up
ImageDecoder
by at least 10x, then as an alternative we can generate simulated training images ahead of time and write them to disk. We may also want to use data augmentation in this case to make more of each image we generate: apply random 90-degree rotations and small translations. Such data augmentation would need to be reflected in the tile catalog too.Steps to using cached images:
case_studies/summer_template/main.py
to support a new mode:generate
bliss/generate.py
that is in some sense analogous topredict.py
andtrain.py
. It would contain a function calledgenerate(...)
that takescfg
as an argumentgenerate(cfg)
should create aSimulatedDataset
object (using theinstantiate
function provided by hydra, as intrain.py
), generate a lot of data, serialized the data, and write the data to a file whose name is specified in thecfg
object.case_studies/summer_template/config.yaml
should probably have a new top-level entry calledcached_simulator
(analogous tosimulator
, but with many fewer fields). This is where we'd store that path the filename (or directory) that contains the cache imagessimulated_dataset.py
should contain an additional class calledCachedSimulatedDataset
, with a constructor that takes a filename (or directory path) of the cached images and loads the file into memory.CachedSimulatedDataset
won't use any workers because it's just looking up loaded data. It will provide minibatches that are sampled at random from the available cached images.The text was updated successfully, but these errors were encountered: