Remove rows with duplicate query cols and cleanup handling of TFM variants #173

siddharth-krishna · 2024-02-01T10:06:17Z

This PR makes the following changes:

It implements the suggestion of using Config.known_columns of the table's tag instead of always using that of TFM_INS.
It adds TFM_INS-TXT to veda-tags.json and modifies Config to use the base tag's known columns if none is specified. (These two changes correspond to the first 3 commits, and introduce no regression.)
It removes rows with duplicate query columns (for GAMS parameters), which is needed because Refactor process_wildcards and add support for TFM_MIG #166 changed the behaviour of TFM_UPD to add new rows instead of updating rows inplace.
However, this leads to a regression of correct rows, because for example in Demo 9 the ground truth DD repo has COM_PROJ defined in multiple files, with files having duplicate rows (when considering only query columns). So dd_to_csv.py is producing a CSV file for COM_PROJ that has duplicate rows, whereas we now only keep the last row in our output.
I tried to resolve the above regression by modifying the ground truth comparison code to first remove duplicates in the ground truth before comparing. I then manually checked that the benchmarks that continue to regress only do so because additional rows or ground truth rows reduced, and not because missing rows increased.

However, Ireland has new missing rows:

I looked into the new missing rows in ACT_EFF in Ireland:

It looks like the problem is that the order in which dd_to_csv.py reads DD files puts the 1 row last, while we produce the 1.075 row last. Perhaps we need to add the DD file order to benchmarks.yml so that we read DD files in the same order that GAMS will?

olejandro · 2024-02-01T11:48:54Z

@siddharth-krishna for the moment I believe we only should be removing duplicate rows for the case described here. (Sorry haven't looked at the changes yet, so guessing that removal is not limited to the table that generates the rows)

siddharth-krishna · 2024-02-01T11:57:29Z

Do you mean we should only remove duplicate rows that are generated by the same TFM_UP table?

Why not remove the duplicates in all final outputs? I thought GAMS only considers the last one anyway.

olejandro · 2024-02-01T12:28:39Z

Do you mean we should only remove duplicate rows that are generated by the same TFM_UP table?

yes, that's what I mean.

Why not remove the duplicates in all final outputs? I thought GAMS only considers the last one anyway.

Several reasons for it:

to keep our regression tests working properly. Keeping them the way they are allows us to take into account diversity in input that otherwise may be overwritten.
I also believe that's the way the original DDs are generated (i.e. removing duplicate rows that are generated by the same TFM_UPD table)

olejandro · 2024-02-02T01:19:28Z

@siddharth-krishna, in addition to the above, how about we drop all the duplicates but only when printing to GAMS? This won't "overwrite" correctly yet, but the GAMS tests will be working.

This reverts commit 2a8126f.

siddharth-krishna · 2024-02-02T07:42:56Z

remove duplicate rows that are generated by the same TFM_UP table

Haha we should be more precise in our sentences. Even this can mean two things: now that TFM_UPD adds new rows to the table instead of updating in place, it could mean (a) if a TFM_UPD table generates 4 rows, out of which 2 have the same query columns, then only add 1 of the dupicated rows; or (b) the table generates 4 rows, all of which have corresponding duplicate rows in the original FI_T table, so delete the original rows and add the 4 new rows in, but if a second table generates 2 rows that overlap with the previous 4, do not delete the original ones but add 2 new rows.

I tried to implement (a), but on Demo 4 that still leaves us with 3 additional rows (the original ACT_COST = 1111 rows).

Implementing (b) without in-place updates is tricky! We'll have to keep track of which rows came from where..

Let's talk more on our call. For now, I've moved the de-duplicating code to the write DD function, and kept the behaviour of TFM_UPD to always add new rows.

Antti-L · 2024-02-02T12:09:45Z

I tried to implement (a), but on Demo 4 that still leaves us with 3 additional rows (the original ACT_COST = 1111 rows).

I am seeing them hard-coded in your code, def generate_dummy_processes. 😁 Apparently VEDA2 does not write out such "internal" ACT_COST default values for the dummy imports (VEDA-FE actually did).

siddharth-krishna · 2024-02-02T14:20:31Z

@Antti-L ah yes, thanks, so that's where it comes from. I guess we should remove them before writing output, I've made an issue.

Based on my discussion with @olejandro the plan is to essentially merge this PR as it is. Rows with duplicate indices/query-cols are now removed just before writing DD files. This means the GDXDIFF test for Demos 1-4 are at 100% (OK). The other test shows 4 additional rows, but that can be solved by removing dummy rows, and by adding support for sourcescen -- both tasks for future PRs. :)

xl2times/config/veda-tags.json

Co-authored-by: Olexandr Balyk <ob@facilitate.energy>

…iants (etsap-TIMES#173)

siddharth-krishna added 5 commits February 1, 2024 10:26

Add TFM_INS-TXT to veda-tags.json

4198056

Config: Use base tag's known_columns

d7b365a

Use known_columns of tag instead of tfm_ins always

ab1cf54

Remove rows with duplicate query cols

2a8126f

Ground truth comparison: remove dups, count GT rows correctly

d90f665

siddharth-krishna requested a review from olejandro February 1, 2024 10:06

siddharth-krishna added 3 commits February 2, 2024 07:59

Revert "Remove rows with duplicate query cols"

1465467

This reverts commit 2a8126f.

Don't remove dups in GT

a47a383

Remove dups when writing DD files instead

882db4c

siddharth-krishna added 2 commits February 2, 2024 10:52

Add type annotations to process_wildcards

b6cba6f

Cleanup code a bit

2f2f1f0

siddharth-krishna mentioned this pull request Feb 2, 2024

Don't output default dummy process rows if user values are present? #174

Open

olejandro reviewed Feb 2, 2024

View reviewed changes

xl2times/config/veda-tags.json Outdated Show resolved Hide resolved

olejandro approved these changes Feb 2, 2024

View reviewed changes

Update veda-tags.json

3bd4f49

Co-authored-by: Olexandr Balyk <ob@facilitate.energy>

siddharth-krishna enabled auto-merge (squash) February 2, 2024 19:31

siddharth-krishna merged commit 161b2f0 into main Feb 2, 2024
1 check passed

siddharth-krishna deleted the sidk/tfm-cleanup branch February 2, 2024 19:36

SamRWest added a commit to csiro-energy-systems/xl2times that referenced this pull request Feb 16, 2024

Remove rows with duplicate query cols and cleanup handling of TFM var…

ef30184

…iants (etsap-TIMES#173)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove rows with duplicate query cols and cleanup handling of TFM variants #173

Remove rows with duplicate query cols and cleanup handling of TFM variants #173

siddharth-krishna commented Feb 1, 2024

olejandro commented Feb 1, 2024

siddharth-krishna commented Feb 1, 2024

olejandro commented Feb 1, 2024

olejandro commented Feb 2, 2024

siddharth-krishna commented Feb 2, 2024

Antti-L commented Feb 2, 2024

siddharth-krishna commented Feb 2, 2024

Remove rows with duplicate query cols and cleanup handling of TFM variants #173

Remove rows with duplicate query cols and cleanup handling of TFM variants #173

Conversation

siddharth-krishna commented Feb 1, 2024

olejandro commented Feb 1, 2024

siddharth-krishna commented Feb 1, 2024

olejandro commented Feb 1, 2024

olejandro commented Feb 2, 2024

siddharth-krishna commented Feb 2, 2024

Antti-L commented Feb 2, 2024

siddharth-krishna commented Feb 2, 2024