Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update OGE for to work with PUDL v.2022.11.30 and integrate 2021 data #259

Merged
merged 54 commits into from
Dec 30, 2022

Conversation

grgmiller
Copy link
Collaborator

@grgmiller grgmiller commented Nov 22, 2022

This PR closes #258 by updating OGE to work with PUDL v2022.11.30, and integrating 2021 data.

NOTE: We should review and merge #246 into this branch before merging this branch into development

@grgmiller grgmiller marked this pull request as ready for review December 17, 2022 18:09
@grgmiller grgmiller changed the title Update OGE for 2021 data Update OGE for to work with PUDL v.2022.11.30 release Dec 17, 2022
@grgmiller grgmiller marked this pull request as draft December 17, 2022 18:14
@grgmiller grgmiller changed the title Update OGE for to work with PUDL v.2022.11.30 release Update OGE for to work with PUDL v.2022.11.30 and integrate 2021 data Dec 17, 2022
@grgmiller
Copy link
Collaborator Author

grgmiller commented Dec 17, 2022

Remaining steps to do:

  • Once Catalyst publishes the data package to Zenodo, update the download link and try running the pipeline from start to finish including the download.
  • Merge Add hourly data for all individual plants #246
  • Fix issue with negative EFs in consumed outputs
  • Try re-installing environment and make sure it works with the gridemissions solver issue (gridemissions seems to be breaking #261)
  • Add a check that there is output data for all BAs expected, and that output csvs are not empty
  • Address warning: A:\miniconda3\envs\open_grid_emissions\lib\site-packages\geopandas\_compat.py:106: UserWarning: The Shapely GEOS version (3.11.1-CAPI-1.17.1) is incompatible with the GEOS version PyGEOS was compiled with (3.10.3-CAPI-1.16.1). Conversions between both will be slow.
  • Update documentation

@grgmiller
Copy link
Collaborator Author

When running the pipeline with the newly-downloaded EIA-930 data, it seems that there may have been some retroactive revisions to the 2020 data, because now when outputting consumed emission factors, we are getting an error message about there being negative emission factors when exporting monthly consumed factors for SEC. The root cause of this issue is described by #214, and was patched by #221 (in which we created a list of BAs with this issue and told the pipeline to use the reported demand values from EIA-930 instead of calculating net demand from generation and interchange). SEC was not previously included in this list, but now seems to be exhibiting these same symptoms.

The larger fix to this is captured by #220, but in the meantime, there are a couple ways we can go about patching this.

  1. Investigate whether there is a new data quality issue in the EIA-930 timeseries for SEC that we need to correct in eia930.manual_930_adjust()
  2. Add SEC to the BA_930_INCONSISTENCY list in src.consumed(). However, since this issue seems to be confined to 2020 data for SEC, I'd like to see us update BA_930_INCONSISTENCY so that we are only applying this patch for years where it is necessary, rather than applying it to these BAs in all years. One way to do this would be to make BA_930_INCONSISTENCY a dictionary, where the key is the year and the value is a list of the BAs that need this patch in that year.

@grgmiller
Copy link
Collaborator Author

grgmiller commented Dec 23, 2022

So we are currently running into an issue where the consumed emission calculation in step 18 is returning missing values for all hours in all regions through April of 2021 (for the 2021 year run). I think I've traced the source of this issue to
consumed.HourlyConsumed.run(). Specifically, when we get to the following section of code:

# Run
  try:
      consumed_emissions, _ = consumption_emissions(E, G, ID)
  except np.linalg.LinAlgError:
      # These issues happen at boundary hours (beginning and end of year)
      # where we don't have full data for all BAs
      # print(f"WARNING: singular matrix on {date}")
      consumed_emissions = np.full(len(self.regions), np.nan)

for the first 2,900 datetimes in the for loop (corresponding with first four months), the consumption_emissions() function is returning the Singular Matrix linalg error that triggers the except clause. Here is the full traceback:

File a:\GitHub\open-grid-emissions\notebooks\work_in_progress\../../../open-grid-emissions/src\consumed.py:179, in consumption_emissions(F, P, ID)
    176             # force this to be zero so the linear system makes sense
    177             b[i] = 0.0
--> 179 X = np.linalg.solve(A, b)
    181 for j in perturbed:
    182     if X[j] != 0.0:

File <__array_function__ internals>:180, in solve(*args, **kwargs)

File a:\miniconda3\envs\open_grid_emissions\lib\site-packages\numpy\linalg\linalg.py:400, in solve(a, b)
    398 signature = 'DD->D' if isComplexType(t) else 'dd->d'
    399 extobj = get_linalg_error_extobj(_raise_linalgerror_singular)
--> 400 r = gufunc(a, b, signature=signature, extobj=extobj)
    402 return wrap(r.astype(result_t, copy=False))

File a:\miniconda3\envs\open_grid_emissions\lib\site-packages\numpy\linalg\linalg.py:89, in _raise_linalgerror_singular(err, flag)
     88 def _raise_linalgerror_singular(err, flag):
---> 89     raise LinAlgError("Singular matrix")

LinAlgError: Singular matrix

The comment on this code suggests that this error would only be raised when we don't have full data for all of the BAs. However, I'm not sure why there would not be full data now compared to the current release of OGE. It seems like the source of this new issue either has to result from 1) a change to the raw downloaded data from EIA or 2) a change to the way we are processing the data. However, I'm not noticing any changes in the code that would have changed this, so I'm kind of stumped about the cause.

To further trace the source, I actually tried running the consumed emission calculation using the main branch, but I am still getting the same singular matrix error in the same places. This suggests that the issue is either with the source EIA-930 data or with our cleaning of it.

Another thing we may want to investigate: Is this maybe a result of our manual timestamp cleaning? Gailin I know that you looked into this already, but I'm wondering if something was corrected in the 930 balance files that we aren't catching, and that's leading to issues with several months of the data?

There are a few things related to the cleaning of the 930 data that I'm noticing that I had questions about (may or may not be related to the above issue):

  • I notice that the EIA-930 data cleaning for 2021 (step 12 of the data pipeline) is cleaning data all the way back to July 2020. I know that we need some data pre 1/1/2021, but I'm assuming we don't need to go back that far. Would one day (12/31/2020) be sufficient? Further filtering this should in theory at least speed up the data cleaning step by 33%.
  • It looks like in step 18, we are loading the cleaned file (eia930_elec.csv), but not necessarily removing the imputed 1 values or filtering to only 2021 values (unless I am missing this). I do note this step:
    # In some cases, we have zero generation but non-zero transmission
    # usually due to imputed zeros during physics-based cleaning being set to 1.0
    # but sometimes due to ok values being set to 1.0ß
    to_fix = (ID.sum(axis=1) > 0) & (G == 0)
    ID[:, to_fix] = 0
    ID[to_fix, :] = 0
    
    but I'm not sure if this is accomplishing the same thing or not

@gailin-p

@grgmiller
Copy link
Collaborator Author

One other thing I'm noticing about this issue: in 2021, it is affecting all data from 1/1/2021 - 4/30/2021. In 2020, it is only affecting data for the month of March. It's strange that the impact is so neatly cutoff by month, which makes me wonder if there is a clue there - is there some step that we are doing that affects data on a month by month basis?

@gailin-p
Copy link
Collaborator

The above issue is called by inconsistent transmission interchange and generation in HGMA in 2021 Jan - April. For BAs where there is no generation and no export, the consumption_emissions function (from gridemissions) perturbs the matrix to make it invertible, however, it doesn't do the same when a BA has zero generation but non-zero export (a situation that's physically impossible and therefore guaranteed not to happen in the gridemissions pipeline, where consumed emissions calculations come after physics-based data cleaning).

OGE generation for HGMA is zero until May 2021:
Screen Shot 2022-12-23 at 11 05 50 AM

But in both raw and post-gridemissions 930 data, there is non-zero import/export from HGMA in those months:
Screen Shot 2022-12-23 at 11 09 20 AM
Screen Shot 2022-12-23 at 11 09 35 AM

Crucially, the raw data shows net import to HGMA, which is physically possible with zero generation, but after physics-based cleaning, the 930 data shows net export from HGMA, which is not possible with zero generation.

There are three easy fixes here, two of which are general (but might also let problem data sneak through in the future), one of which is specific:

  • Expand the perpetuation in the calc to also perturb when a matrix has a zero row
  • Use linalg.lstsq instead of linalg.solve, which is equivalent when there's an exact solution but finds a best fit otherwise
  • Use 930 demand data instead of our generation + 930 interchange for HGMA in the problem months; this is the targeted fix

@gailin-p
Copy link
Collaborator

gailin-p commented Dec 23, 2022

As a side-note, because the interchange post physics-based cleaning is >1, it won't be set to zero in our filter for imputed ones. Also, the interchange is 1 in the balance files direct from EIA, so it's not noise introduced by gridemissions.

@gailin-p
Copy link
Collaborator

I notice that the EIA-930 data cleaning for 2021 (step 12 of the data pipeline) is cleaning data all the way back to July 2020. I know that we need some data pre 1/1/2021, but I'm assuming we don't need to go back that far. Would one day (12/31/2020) be sufficient? Further filtering this should in theory at least speed up the data cleaning step by 33%.

The prior six months is needed for the rolling filter used by gridemissions

It looks like in step 18, we are loading the cleaned file (eia930_elec.csv), but not necessarily removing the imputed 1 values or filtering to only 2021 values (unless I am missing this).

No need to filter to 2021 values, since we only run the computation on dates in the OGE generation, which is limited to year.
We removed imputed ones when we use the 930 data to calculate residual profiles (eia930.remove_imputed_ones), but not for the consumed emissions calculation; I think I just overlooked this since the imputed ones were causing major issues with the residual profiles but weren't an issue with the consumed calc. I can add it for consistency.

@gailin-p
Copy link
Collaborator

gailin-p commented Dec 23, 2022

> # In some cases, we have zero generation but non-zero transmission
> # usually due to imputed zeros during physics-based cleaning being set to 1.0
> # but sometimes due to ok values being set to 1.0ß
> to_fix = (ID.sum(axis=1) > 0) & (G == 0)
> ID[:, to_fix] = 0
> ID[to_fix, :] = 0

Actually, rewriting this check to make it more general will fix the HGMA bug and seems like the lowest impact option; I'll do that

@grgmiller grgmiller mentioned this pull request Dec 23, 2022
@grgmiller grgmiller marked this pull request as ready for review December 28, 2022 21:56
@grgmiller
Copy link
Collaborator Author

This PR should close #163

@grgmiller grgmiller mentioned this pull request Dec 28, 2022
7 tasks
@grgmiller grgmiller merged commit b0e9319 into development Dec 30, 2022
@grgmiller grgmiller deleted the pudl_update branch December 30, 2022 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants