Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to request the "updated" (day-behind) PV_Live later in the day #219

Closed
JackKelly opened this issue Apr 28, 2023 · 13 comments
Closed

Need to request the "updated" (day-behind) PV_Live later in the day #219

JackKelly opened this issue Apr 28, 2023 · 13 comments
Labels
bug Something isn't working

Comments

@JackKelly
Copy link
Member

JackKelly commented Apr 28, 2023

Describe the bug

Since the start of 2023's British Summer Time (2023-03-26), the OCF production database (in rows where regime='day-behind') contains the "wrong" values. Specifically, the DB (where regime='day-behind') contains values which are not the same as values pulled today directly from PV_Live for that same period.

Instead, since 2023-03-26, the OCF DB contains almost identical values for regime='day-behind' and regime='in-day' for PV_Live.

I think the problem is fairly simple: I think that, since the 2023 transition to BST, our code runs too early in the day. Our code tries to fetch the "updated" PV_Live before the updated PV_Live is ready from Sheffield. So our code thinks it's getting the "updated" estimates from PV_Live. But it's actually just getting (almost) a duplicate of the "in-day" PV_Live!

I'll provide a bunch of analysis of the data further down this bug report. But before I get there, I should mention some implications of this bug:

Implications of mistakenly getting "in-day" PV_Live when we think we're getting "updated" PV_Live:

I think this bug could (at least partially) explain why we sometimes get surprised by the size of our forecast errors in production!

Our models are trained to predict "updated" PV_Live (as they should be!). But, the production DB contains PV_Live data labelled as "updated" PV_Live data, but which is actually "in-day" PV_Live! And "intraday" PV_Live often underestimates national PV by ~1 GW (at the peak for the day)! So, when we compute our performance metrics in production, we're comparing our forecasts against the wrong version of PV_Live! So our forecasts look like they're > 1 GW wrong, when actually our forecasts might be doing a great job of predicting the updated PV_Live, but the problem is that we're (unfairly) comparing our predictions to "intraday" PV_Live!

And this bug might help explain why "the adjuster" doesn't seem to help in an R&D testing environment, but does help in production. In production, maybe "the adjuster" is currently making our forecasts what it (falsely) believes is the 'truth', so maybe - since the start of BST - the adjuster is making our predictions more like intra-day PV_Live (which is bad, because intra-day PV_Live isn't very accurate!)

Data analysis

Sol kindly pulled the entire OCF DB table into a CSV for me yesterday.

And, for comparison, I re-downloaded PV_Live data directly from the PV_Live API this morning.

Below is a good period of data (7 days in August 2022), when the system was behaving itself. This is what the data should look like :). Note that the "Updated national (from OCF DB)" (orange line) and the "Updated national (from PV Live API)" (green line) are perfectly aligned (so perfectly aligned that the only way to see the orange line under the green line is by the orange 'x' marker!)

image

Below is a bad period of data (7 days in April 2023). Note that the "Updated national (from OCF DB)" no longer aligns with the "updated" PV_Live I downloaded today fresh from the PV_Live API:

image

Below is a zoom into one of the "bad" days above. I honestly have no idea why the "Updated national (from OCF DB)" is different from both other lines. Maybe Sheffield Solar gradually update their estimate, so we're seeing a partially updated estimate?!

image

Below is an analysis of the entire timeseries in the OCF DB. This plot shows the maximum absolute error per day of the "updated" estimate in the OCF DB, compared with the data I downloaded from the PV_Live API today. Several things to note from this plot:

  1. The error was high for the first couple of months of the service, when the service grabbed the updated PV_Live at 10:45am UTC.
  2. Then the errors (almost) completely go away after git commit c4ee7cc on 2022-08-03 which pushed the update time to 11:30am UTC.
  3. Things work well for (most of) the remainder of winter.
  4. Things start to get a bit worse after the end of BST on 2022-10-30.
  5. Things get a lot worse after the start of BST on 2023-03-26.

image

Finally, here are the absolute errors for comparisons of the other two pairs of datasets. Perhaps the main thing to note here is the bottom subplot, which shows that the difference between updated PV_Live and actual PV_Live is often ~1 GW, and can be > 1.6 GW.

image

Related

@JackKelly JackKelly added the bug Something isn't working label Apr 28, 2023
@peterdudfield
Copy link
Contributor

Thanks Jack for all this work.

See this code for when the national GSP data is pulled. There is a split between the National and GSP, as they are run at different times by Solar Sheffield.

You can also notice this code we only runs code when the BST hour is correct. This is define here so only runs if between 10-11 UK time. This is confirmed in cloudwatch here and here. So National PVlive values are being pulled at 10.45 UK time.

My understanding was that PVlive National was ran at 10.30, so 10.45 should be fine. I agree there is definately a problem, and caused by the clock change

@peterdudfield
Copy link
Contributor

One quick solution is we just pull the National data every 5 mins from PVlive for one day, and just see when it changes. This could be done manually one day

@JackKelly
Copy link
Member Author

One quick solution is we just pull the National data every 5 mins from PVlive for one day, and just see when it changes. This could be done manually one day

That's a great idea! I'll do that (locally on my PC) ASAP (probably tomorrow).

@JackKelly
Copy link
Member Author

Here's my very simple script for downloading data from PV_Live every 15 minutes and saving locally.

@peterdudfield
Copy link
Contributor

Idea is to change it to 10.45 UTC

@JackKelly
Copy link
Member Author

Here's a detailed analysis of the data I've been downloading from PV_Live every 15 mins.

@JackKelly
Copy link
Member Author

Idea is to change it to 10.45 UTC

After discussion with Peter in Slack, and after Jack's analysis of the downloaded PV_Live data, we've decided to download the "updated PV_Live national" at 11:00 UTC. This is also when the new PV_Live README suggests we downloaded the updated estimate.

@peterdudfield
Copy link
Contributor

this on dev, now, so at 11 UTC we will be able to see the difference, at least for yestrday

@peterdudfield
Copy link
Contributor

Yea this worked, and def different was 700 MW instead of 200MW. Ill deploy this on production

@JackKelly
Copy link
Member Author

Sounds good!

Will this confuse "the adjuster" (for the next few days, while the OCF DB updates itself? Maybe it'll be fine...)

@peterdudfield
Copy link
Contributor

It won't confuse the adjuster, but it will take 7 days for the results to fully improve.
To speed this up, we could update the gsp_yields values in the database for the last seven days.
Probably not a bad thing to do anyway, have a script that can over write then gsp_yield values

@JackKelly
Copy link
Member Author

we could update the gsp_yields values in the database for the last seven days

SGTM!

@peterdudfield
Copy link
Contributor

This is now down, and we see differences and its BST right now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Status: No status
Development

No branches or pull requests

2 participants