-
Notifications
You must be signed in to change notification settings - Fork 7
add id to metadata #599
add id to metadata #599
Conversation
JackKelly
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good.
One thought is:
In the near future (this month?), we plan to create batches across the entire geospatial extent of the satellite data (over the ocean; over Europe; over Africa; etc.). (These training examples won't have PV data, of course. Instead they'll be used to pre-train the model to predict the next few frames of satellite imagery).
And, separately, I also plan to create batches which are centered on individual PV systems (and where the model will know nothing about GSPs).
And we plan to produce start using PV data from mainland Europe and elsewhere.
In all three of these scenarios, there won't be a "GSP ID". As such, is the approach proposed in this PR is compatible with our plans to create batches which aren't associated with a GSP? When there isn't a GSP then I suppose we could just fill the id columns with NaNs or something like that, but that maybe feels like a bit of a hack?
Also, if we do stick with including the GSP ID, then I'd advocate for changing the name to gsp_id rather than just id, just to be really explicit?
Yea I agree it feels a bit of a hack. But yea that would be an option i.e set id to NaN or None. This is all work to get the forecast MVP working. Its because there is no GSP data to load, but I think we do need the GSP id for the MVP. One option would be to have some static GSP data, and do a hack when making batches/examples. Perhaps doing this: #600, would help both tasks. Perhaps Ill give this a go, and itll actually make things tidier |
|
Would be something like this, and then |
|
Cool beans. But, um, isn't the GSP ID (and PV system ID) already in each batch? I haven't checked the code recently but I definitely remember feeding both the GSP ID and PV system ID into my ML models back in December 🙂 Or am I getting confused? |
No your right! Just that means for the live models we would need the data sources: gsp in order for the system to work. Even though the model might not use gsp data. Im making good progress with #600 so hopefully itll all come together |
Codecov Report
Continue to review full report at Codecov.
|
|
Comments for chat with @JackKelly and @jacobbieker
|
jacobbieker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Just one small comment. Good job!
Co-authored-by: Jacob Bieker <jacob@openclimatefix.org>
|
@JackKelly im gona merge this, to push on the MVP, but please free feel to add comments, and ill do those changes |
Pull Request
Description
Add id to metadata, this is optional so can be None.
Fixes #598 #600
How Has This Been Tested?
normal unittests
Checklist: