-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove rows with duplicate query cols and cleanup handling of TFM variants #173
Conversation
@siddharth-krishna for the moment I believe we only should be removing duplicate rows for the case described here. (Sorry haven't looked at the changes yet, so guessing that removal is not limited to the table that generates the rows) |
Do you mean we should only remove duplicate rows that are generated by the same TFM_UP table? Why not remove the duplicates in all final outputs? I thought GAMS only considers the last one anyway. |
yes, that's what I mean.
Several reasons for it:
|
@siddharth-krishna, in addition to the above, how about we drop all the duplicates but only when printing to GAMS? This won't "overwrite" correctly yet, but the GAMS tests will be working. |
Haha we should be more precise in our sentences. Even this can mean two things: now that TFM_UPD adds new rows to the table instead of updating in place, it could mean (a) if a TFM_UPD table generates 4 rows, out of which 2 have the same query columns, then only add 1 of the dupicated rows; or (b) the table generates 4 rows, all of which have corresponding duplicate rows in the original FI_T table, so delete the original rows and add the 4 new rows in, but if a second table generates 2 rows that overlap with the previous 4, do not delete the original ones but add 2 new rows. I tried to implement (a), but on Demo 4 that still leaves us with 3 additional rows (the original ACT_COST = 1111 rows). Implementing (b) without in-place updates is tricky! We'll have to keep track of which rows came from where.. Let's talk more on our call. For now, I've moved the de-duplicating code to the write DD function, and kept the behaviour of TFM_UPD to always add new rows. |
I am seeing them hard-coded in your code, |
@Antti-L ah yes, thanks, so that's where it comes from. I guess we should remove them before writing output, I've made an issue. Based on my discussion with @olejandro the plan is to essentially merge this PR as it is. Rows with duplicate indices/query-cols are now removed just before writing DD files. This means the GDXDIFF test for Demos 1-4 are at 100% (OK). The other test shows 4 additional rows, but that can be solved by removing dummy rows, and by adding support for |
Co-authored-by: Olexandr Balyk <ob@facilitate.energy>
This PR makes the following changes:
Config.known_columns
of the table's tag instead of always using that of TFM_INS.TFM_INS-TXT
toveda-tags.json
and modifiesConfig
to use the base tag's known columns if none is specified. (These two changes correspond to the first 3 commits, and introduce no regression.)dd_to_csv.py
is producing a CSV file for COM_PROJ that has duplicate rows, whereas we now only keep the last row in our output.However, Ireland has new missing rows:
I looked into the new missing rows in ACT_EFF in Ireland:
It looks like the problem is that the order in which
dd_to_csv.py
reads DD files puts the1
row last, while we produce the1.075
row last. Perhaps we need to add the DD file order tobenchmarks.yml
so that we read DD files in the same order that GAMS will?