-
-
Notifications
You must be signed in to change notification settings - Fork 6
Move all data normalisation to nowcasting_dataloader #231
Comments
I would lean towards normalizing it when making it a Tensor, just so we have the raw data stored, and if we want to try different normalizations, such as MetNet's which is a bit different, we can easily do that. And normalization shouldn't take too long when loading on the fly |
This all sounds great! I agree, maybe |
This also saves us from having to compute means and standard deviations across, potentially, hundreds of TB of raw data. Instaed we can just compute the statistics for the ~1 TB |
Yeah, I think just the training batches, as otherwise it would still leak some data about the validation and test sets |
yea good point, perhaps we need a script to run through all the training batches and calculate the mean and or std |
Or maybe its ok just to have a rough number for this. Its only normalised for the purpose of the ML model right? |
I think a script would be good (that runs through the training batches). I think some types of ML models can cope with badly normalised data; but, to enable fair comparison between all types of models, I think we should probably try to normalise as well as we can 🙂 |
I've taken the liberty of changing the title to "Move all data normalisation to nowcasting_dataloader", because it might be good to tidy up all the normalisation :) |
There is also normalization in |
This method might only work for one batch? Is it better to normalise over the whole training set |
Yeah, it would probably be better to normalize over the whole training set, rather than the batch. |
I think it's probably easier to reason about the code if the normalisation is done in one place only (e.g. in |
yea agree, thats what the two above PRs are doing |
Cool beans, thank you, sorry I'm a bit behind the curve! |
all good |
Need to move gsp and pv normalization over there too |
Is all the normalisation code moved from |
Gsp and PV is not |
GSP is probably quite easy to do, but PV might be a longer. |
Ill do GSP first |
SGTM, thanks! |
Detailed Description
Currently satellite normalisation can happen in two places
Context
Possible Implementation
Only have option to do it once.
The text was updated successfully, but these errors were encountered: