Skip to content
This repository was archived by the owner on Sep 11, 2023. It is now read-only.

Conversation

@peterdudfield
Copy link
Contributor

@peterdudfield peterdudfield commented Dec 3, 2021

Pull Request

Description

add configuration for start and end datetimes for PV and GSP models

Fixes #425

How Has This Been Tested?

normal unittests

  • No
  • Yes

Checklist:

  • My code follows OCF's coding style guidelines
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked my code and corrected any misspellings

@peterdudfield peterdudfield self-assigned this Dec 3, 2021
@peterdudfield peterdudfield added this to the v16 dataset milestone Dec 3, 2021
@codecov-commenter
Copy link

codecov-commenter commented Dec 3, 2021

Codecov Report

Merging #523 (f06c064) into main (1f77cc6) will decrease coverage by 0.06%.
The diff coverage is 85.71%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #523      +/-   ##
==========================================
- Coverage   86.78%   86.72%   -0.07%     
==========================================
  Files          37       37              
  Lines        2338     2350      +12     
==========================================
+ Hits         2029     2038       +9     
- Misses        309      312       +3     
Impacted Files Coverage Δ
nowcasting_dataset/config/model.py 91.39% <81.25%> (-1.31%) ⬇️
...asting_dataset/data_sources/gsp/gsp_data_source.py 90.96% <100.00%> (-0.06%) ⬇️
...wcasting_dataset/data_sources/pv/pv_data_source.py 97.14% <100.00%> (-0.02%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1f77cc6...f06c064. Read the comment docs.

@peterdudfield peterdudfield marked this pull request as ready for review December 3, 2021 13:52
Copy link
Contributor

@JackKelly JackKelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! thanks!

please also remove the logger.warning("{GSP,PV}DataSource is using hard-coding start_dt and end_dt") on:

  • line 70 of gsp_data_source.py
  • line 52 of pv_data_source.py

),
)

start_datetime: datetime = Field(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed, given that the StartEndDatetimeMixin will inject a start_datetime and end_datetime fields?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that these are the general or default ones, and then in PV and GSP, separate ones can be set

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, that raises an interesting question:

What's the use-case where we'd want differente start_datetimes and end_datetimes for each datasource? Or should start_datetime and end_datetime always be "global"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps if one data source is dodge before a certain date, but we still want data from before this date for the batches, but with this data source missing. Perhaps this is a bit far fetched ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point about avoiding dodgy data...

But Manager starts by computing the intersection of all the t0_datetimes available across all the DataSources. So if one data source starts really late, then all the data sources will start really late.

so, yeah, I'm starting to wonder if maybe we should keep things simple and just have a single, global start_datetime and end_datetime? What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe its simplier to remove this bit - simplier is often better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea two different ways to do it

  1. Currently its just a mxinin in PV and GSP
  2. other option is to have a global one which is move into PV GSP through a validation method.

I think 1 is simple pydantic model. Can always change this in the future if we find we are changing things.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement a start_date and end_date setting in the config YAML

4 participants