Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove rows with duplicate query cols and cleanup handling of TFM variants #173

Merged
merged 11 commits into from
Feb 2, 2024

Conversation

siddharth-krishna
Copy link
Collaborator

This PR makes the following changes:

  • It implements the suggestion of using Config.known_columns of the table's tag instead of always using that of TFM_INS.
  • It adds TFM_INS-TXT to veda-tags.json and modifies Config to use the base tag's known columns if none is specified. (These two changes correspond to the first 3 commits, and introduce no regression.)
  • It removes rows with duplicate query columns (for GAMS parameters), which is needed because Refactor process_wildcards and add support for TFM_MIG #166 changed the behaviour of TFM_UPD to add new rows instead of updating rows inplace.
  • However, this leads to a regression of correct rows, because for example in Demo 9 the ground truth DD repo has COM_PROJ defined in multiple files, with files having duplicate rows (when considering only query columns). So dd_to_csv.py is producing a CSV file for COM_PROJ that has duplicate rows, whereas we now only keep the last row in our output.
  • I tried to resolve the above regression by modifying the ground truth comparison code to first remove duplicates in the ground truth before comparing. I then manually checked that the benchmarks that continue to regress only do so because additional rows or ground truth rows reduced, and not because missing rows increased.

However, Ireland has new missing rows:
image

I looked into the new missing rows in ACT_EFF in Ireland:
image
image

It looks like the problem is that the order in which dd_to_csv.py reads DD files puts the 1 row last, while we produce the 1.075 row last. Perhaps we need to add the DD file order to benchmarks.yml so that we read DD files in the same order that GAMS will?

@olejandro
Copy link
Member

@siddharth-krishna for the moment I believe we only should be removing duplicate rows for the case described here. (Sorry haven't looked at the changes yet, so guessing that removal is not limited to the table that generates the rows)

@siddharth-krishna
Copy link
Collaborator Author

Do you mean we should only remove duplicate rows that are generated by the same TFM_UP table?

Why not remove the duplicates in all final outputs? I thought GAMS only considers the last one anyway.

@olejandro
Copy link
Member

Do you mean we should only remove duplicate rows that are generated by the same TFM_UP table?

yes, that's what I mean.

Why not remove the duplicates in all final outputs? I thought GAMS only considers the last one anyway.

Several reasons for it:

  • to keep our regression tests working properly. Keeping them the way they are allows us to take into account diversity in input that otherwise may be overwritten.
  • I also believe that's the way the original DDs are generated (i.e. removing duplicate rows that are generated by the same TFM_UPD table)

@olejandro
Copy link
Member

@siddharth-krishna, in addition to the above, how about we drop all the duplicates but only when printing to GAMS? This won't "overwrite" correctly yet, but the GAMS tests will be working.

@siddharth-krishna
Copy link
Collaborator Author

remove duplicate rows that are generated by the same TFM_UP table

Haha we should be more precise in our sentences. Even this can mean two things: now that TFM_UPD adds new rows to the table instead of updating in place, it could mean (a) if a TFM_UPD table generates 4 rows, out of which 2 have the same query columns, then only add 1 of the dupicated rows; or (b) the table generates 4 rows, all of which have corresponding duplicate rows in the original FI_T table, so delete the original rows and add the 4 new rows in, but if a second table generates 2 rows that overlap with the previous 4, do not delete the original ones but add 2 new rows.

I tried to implement (a), but on Demo 4 that still leaves us with 3 additional rows (the original ACT_COST = 1111 rows).

Implementing (b) without in-place updates is tricky! We'll have to keep track of which rows came from where..

Let's talk more on our call. For now, I've moved the de-duplicating code to the write DD function, and kept the behaviour of TFM_UPD to always add new rows.

@Antti-L
Copy link

Antti-L commented Feb 2, 2024

I tried to implement (a), but on Demo 4 that still leaves us with 3 additional rows (the original ACT_COST = 1111 rows).

I am seeing them hard-coded in your code, def generate_dummy_processes. 😁 Apparently VEDA2 does not write out such "internal" ACT_COST default values for the dummy imports (VEDA-FE actually did).

@siddharth-krishna
Copy link
Collaborator Author

@Antti-L ah yes, thanks, so that's where it comes from. I guess we should remove them before writing output, I've made an issue.

Based on my discussion with @olejandro the plan is to essentially merge this PR as it is. Rows with duplicate indices/query-cols are now removed just before writing DD files. This means the GDXDIFF test for Demos 1-4 are at 100% (OK). The other test shows 4 additional rows, but that can be solved by removing dummy rows, and by adding support for sourcescen -- both tasks for future PRs. :)

Co-authored-by: Olexandr Balyk <ob@facilitate.energy>
@siddharth-krishna siddharth-krishna enabled auto-merge (squash) February 2, 2024 19:31
@siddharth-krishna siddharth-krishna merged commit 161b2f0 into main Feb 2, 2024
1 check passed
@siddharth-krishna siddharth-krishna deleted the sidk/tfm-cleanup branch February 2, 2024 19:36
SamRWest added a commit to csiro-energy-systems/xl2times that referenced this pull request Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants