Skip to content
This repository was archived by the owner on Jun 2, 2025. It is now read-only.

Conversation

@dfulu
Copy link
Member

@dfulu dfulu commented May 31, 2024

This new datapipe is for generating batches, where in each batch there are 317 samples for all of the 317 regional GSPs at the same init time t0. This is much faster (at least 10x faster) than our current method of creating similar batches.

This will allow us to create batches for the summation model and to run backtests much faster.

INCLUDED

  • New pipeline for concurrent batches
  • Minor refactoring and linting

TODO:

  • Bug fixes
  • Tests

TODO in PVNet library after merging

  • Use new pipeline within concurrent batch creation script
  • Use new pipeline within UK backtest script

@dfulu dfulu requested review from jacobbieker and peterdudfield and removed request for jacobbieker May 31, 2024 12:12
@codecov
Copy link

codecov bot commented May 31, 2024

Codecov Report

Attention: Patch coverage is 52.28571% with 167 lines in your changes missing coverage. Please review.

Project coverage is 75.24%. Comparing base (d55139e) to head (ab5b649).
Report is 129 commits behind head on main.

Current head ab5b649 differs from pull request most recent head e79560b

Please upload reports for the commit e79560b to get more accurate results.

Files Patch % Lines
ocf_datapipes/training/pvnet_all_gsp.py 0.00% 137 Missing ⚠️
ocf_datapipes/select/select_spatial_slice.py 84.25% 17 Missing ⚠️
ocf_datapipes/select/pick_t0_times.py 61.11% 7 Missing ⚠️
ocf_datapipes/load/gsp/utils.py 77.27% 5 Missing ⚠️
ocf_datapipes/select/pick_locations.py 96.42% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #320      +/-   ##
==========================================
- Coverage   75.65%   75.24%   -0.42%     
==========================================
  Files         126      128       +2     
  Lines        5994     6208     +214     
==========================================
+ Hits         4535     4671     +136     
- Misses       1459     1537      +78     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dfulu dfulu marked this pull request as ready for review June 3, 2024 14:19
self.shuffle = shuffle

def _yield_all_iter(self, xr_dataset):
# Get the spatial coords
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you say what this iter does?

@@ -0,0 +1,510 @@
"""Create the training/validation datapipe for training the PVNet Model"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this say something about all gsps?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tmight be worth a top summary of this differnet function in here too. Just bullet points of the function name and what they do would be great

return datapipe


if __name__ == "__main__":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this, i think its already in a test?

if from_coords == "lon_lat":
x, y = lon_lat_to_osgb(x, y)

# else the from_coords=="osgb" and we don't need to convert
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there worth putting and error message, if we dont have the message. or a logger statement if no converstaion is needed

locidx = xr_data[location_idx_name].values

# Create a KDTree
tree = KDTree(list(zip(lat, lon)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does KDTree do?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. This is old code that I just refactored into functions rather than only have it inside a datapipe. I was refactoring the functions I use so I decided just to do the rest

Copy link
Contributor

@peterdudfield peterdudfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ive left a few comments, but everything else looks good

@peterdudfield
Copy link
Contributor

This new datapipe is for generating batches, where in each batch there are 317 samples for all of the 317 regional GSPs at the same init time t0. This is much faster (at least 10x faster) than our current method of creating similar batches.

This will allow us to create batches for the summation model and to run backtests much faster.

INCLUDED

  • New pipeline for concurrent batches
  • Minor refactoring and linting

TODO:

  • Bug fixes
  • Tests

TODO in PVNet library after merging

  • Use new pipeline within concurrent batch creation script
  • Use new pipeline within UK backtest script

How much is it worth testing this TODO's before merging?
Im not sure here

@dfulu
Copy link
Member Author

dfulu commented Jun 4, 2024

How much is it worth testing this TODO's before merging? Im not sure here

I've already got it working locally in the concurrent batch creation script in PVNet, so I'm confident enough to merge

@dfulu dfulu merged commit ee1396f into main Jun 4, 2024
@dfulu dfulu deleted the pvnet_concurrent_datapipe branch June 4, 2024 10:14
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants