Skip to content
This repository was archived by the owner on Sep 11, 2023. It is now read-only.
This repository was archived by the owner on Sep 11, 2023. It is now read-only.

Remove PyTorch from the code #86

@JackKelly

Description

@JackKelly

When I started nowcasting_dataset, the intention was to use nowcasting_dataset to generate batches on-the-fly during ML training from separate Zarr stores for the satellite data, NWPs, and PV. But that turned out to be too slow and fragile :) So, we swapped to using nowcasting_dataset to pre-prepare batches ahead-of-time, and save them to disk. During ML training, we just need to load the batches from disk, and we're good-to-go. (Pre-preparing batches has a number of other advantages, too).

But, this development history means that nowcasting_dataset still uses PyTorch (e.g. using the PyTorch DataLoader to run multiple processes). The code may become cleaner and faster and more flexible if we strip out PyTorch, and instead (maybe) use concurrent.futures.ProcessPoolExecutor to use multiple processes.

TODO:

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions