-
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need to request the "updated" (day-behind) PV_Live later in the day #219
Comments
Thanks Jack for all this work. See this code for when the national GSP data is pulled. There is a split between the National and GSP, as they are run at different times by Solar Sheffield. You can also notice this code we only runs code when the BST hour is correct. This is define here so only runs if between 10-11 UK time. This is confirmed in cloudwatch here and here. So National PVlive values are being pulled at 10.45 UK time. My understanding was that PVlive National was ran at 10.30, so 10.45 should be fine. I agree there is definately a problem, and caused by the clock change |
One quick solution is we just pull the National data every 5 mins from PVlive for one day, and just see when it changes. This could be done manually one day |
That's a great idea! I'll do that (locally on my PC) ASAP (probably tomorrow). |
Here's my very simple script for downloading data from PV_Live every 15 minutes and saving locally. |
Idea is to change it to 10.45 UTC |
Here's a detailed analysis of the data I've been downloading from PV_Live every 15 mins. |
After discussion with Peter in Slack, and after Jack's analysis of the downloaded PV_Live data, we've decided to download the "updated PV_Live national" at 11:00 UTC. This is also when the new PV_Live README suggests we downloaded the updated estimate. |
this on dev, now, so at 11 UTC we will be able to see the difference, at least for yestrday |
Yea this worked, and def different was 700 MW instead of 200MW. Ill deploy this on production |
Sounds good! Will this confuse "the adjuster" (for the next few days, while the OCF DB updates itself? Maybe it'll be fine...) |
It won't confuse the adjuster, but it will take 7 days for the results to fully improve. |
SGTM! |
This is now down, and we see differences and its BST right now |
Describe the bug
Since the start of 2023's British Summer Time (2023-03-26), the OCF production database (in rows where
regime='day-behind'
) contains the "wrong" values. Specifically, the DB (whereregime='day-behind'
) contains values which are not the same as values pulled today directly from PV_Live for that same period.Instead, since 2023-03-26, the OCF DB contains almost identical values for
regime='day-behind'
andregime='in-day'
for PV_Live.I think the problem is fairly simple: I think that, since the 2023 transition to BST, our code runs too early in the day. Our code tries to fetch the "updated" PV_Live before the updated PV_Live is ready from Sheffield. So our code thinks it's getting the "updated" estimates from PV_Live. But it's actually just getting (almost) a duplicate of the "in-day" PV_Live!
I'll provide a bunch of analysis of the data further down this bug report. But before I get there, I should mention some implications of this bug:
Implications of mistakenly getting "in-day" PV_Live when we think we're getting "updated" PV_Live:
I think this bug could (at least partially) explain why we sometimes get surprised by the size of our forecast errors in production!
Our models are trained to predict "updated" PV_Live (as they should be!). But, the production DB contains PV_Live data labelled as "updated" PV_Live data, but which is actually "in-day" PV_Live! And "intraday" PV_Live often underestimates national PV by ~1 GW (at the peak for the day)! So, when we compute our performance metrics in production, we're comparing our forecasts against the wrong version of PV_Live! So our forecasts look like they're > 1 GW wrong, when actually our forecasts might be doing a great job of predicting the updated PV_Live, but the problem is that we're (unfairly) comparing our predictions to "intraday" PV_Live!
And this bug might help explain why "the adjuster" doesn't seem to help in an R&D testing environment, but does help in production. In production, maybe "the adjuster" is currently making our forecasts what it (falsely) believes is the 'truth', so maybe - since the start of BST - the adjuster is making our predictions more like intra-day PV_Live (which is bad, because intra-day PV_Live isn't very accurate!)
Data analysis
Sol kindly pulled the entire OCF DB table into a CSV for me yesterday.
And, for comparison, I re-downloaded PV_Live data directly from the PV_Live API this morning.
Below is a good period of data (7 days in August 2022), when the system was behaving itself. This is what the data should look like :). Note that the "Updated national (from OCF DB)" (orange line) and the "Updated national (from PV Live API)" (green line) are perfectly aligned (so perfectly aligned that the only way to see the orange line under the green line is by the orange 'x' marker!)
Below is a bad period of data (7 days in April 2023). Note that the "Updated national (from OCF DB)" no longer aligns with the "updated" PV_Live I downloaded today fresh from the PV_Live API:
Below is a zoom into one of the "bad" days above. I honestly have no idea why the "Updated national (from OCF DB)" is different from both other lines. Maybe Sheffield Solar gradually update their estimate, so we're seeing a partially updated estimate?!
Below is an analysis of the entire timeseries in the OCF DB. This plot shows the maximum absolute error per day of the "updated" estimate in the OCF DB, compared with the data I downloaded from the PV_Live API today. Several things to note from this plot:
Finally, here are the absolute errors for comparisons of the other two pairs of datasets. Perhaps the main thing to note here is the bottom subplot, which shows that the difference between updated PV_Live and actual PV_Live is often ~1 GW, and can be > 1.6 GW.
Related
The text was updated successfully, but these errors were encountered: