This repository was archived by the owner on Sep 11, 2023. It is now read-only.

Description
When I started nowcasting_dataset, the intention was to use nowcasting_dataset to generate batches on-the-fly during ML training from separate Zarr stores for the satellite data, NWPs, and PV. But that turned out to be too slow and fragile :) So, we swapped to using nowcasting_dataset to pre-prepare batches ahead-of-time, and save them to disk. During ML training, we just need to load the batches from disk, and we're good-to-go. (Pre-preparing batches has a number of other advantages, too).
But, this development history means that nowcasting_dataset still uses PyTorch (e.g. using the PyTorch DataLoader to run multiple processes). The code may become cleaner and faster and more flexible if we strip out PyTorch, and instead (maybe) use concurrent.futures.ProcessPoolExecutor to use multiple processes.
TODO: