-
Notifications
You must be signed in to change notification settings - Fork 7
add start and end date #523
Conversation
Codecov Report
@@ Coverage Diff @@
## main #523 +/- ##
==========================================
- Coverage 86.78% 86.72% -0.07%
==========================================
Files 37 37
Lines 2338 2350 +12
==========================================
+ Hits 2029 2038 +9
- Misses 309 312 +3
Continue to review full report at Codecov.
|
JackKelly
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good! thanks!
please also remove the logger.warning("{GSP,PV}DataSource is using hard-coding start_dt and end_dt") on:
- line 70 of gsp_data_source.py
- line 52 of pv_data_source.py
nowcasting_dataset/config/model.py
Outdated
| ), | ||
| ) | ||
|
|
||
| start_datetime: datetime = Field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed, given that the StartEndDatetimeMixin will inject a start_datetime and end_datetime fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that these are the general or default ones, and then in PV and GSP, separate ones can be set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, that raises an interesting question:
What's the use-case where we'd want differente start_datetimes and end_datetimes for each datasource? Or should start_datetime and end_datetime always be "global"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps if one data source is dodge before a certain date, but we still want data from before this date for the batches, but with this data source missing. Perhaps this is a bit far fetched ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point about avoiding dodgy data...
But Manager starts by computing the intersection of all the t0_datetimes available across all the DataSources. So if one data source starts really late, then all the data sources will start really late.
so, yeah, I'm starting to wonder if maybe we should keep things simple and just have a single, global start_datetime and end_datetime? What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe its simplier to remove this bit - simplier is often better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea two different ways to do it
- Currently its just a mxinin in PV and GSP
- other option is to have a global one which is move into PV GSP through a validation method.
I think 1 is simple pydantic model. Can always change this in the future if we find we are changing things.
Pull Request
Description
add configuration for start and end datetimes for PV and GSP models
Fixes #425
How Has This Been Tested?
normal unittests
Checklist: