-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include optimization.table #40
Conversation
* Allow ruff to run on tests/ directory
* Note: still rudimentary, requires clean up and expansion * Can only create tables without data for now
* Include several tests for tables.add_data() * Update table docs * Refactor column docs to own file
While this PR is still a WIP, you can already take a look at what is arguably the most interesting part: adding data in the DB layer to a Table, which requires lots of validation. The following behaviour and possibly more can be inferred from the existing tests and files in the data/abstract and data/db directories. Current details on adding and validating dataHow does it work?Currently, data can be provided as either a dict or a Before validation, though, we use the dict form of the data to merge them with possibly pre-existing Table.data. This is done via the union operator Whenever table.data is set (note that
At the moment, all of these cases raise
Some more edge case considerations
|
* Also expand tests for Indexsets and Scalars
* TEMPORARILY limited to DB layer
Questions and possible ToDos@meksor and @danielhuppmann, sorry for the long post. While cleaning up this PR, I came across the following questions/notes I took during its implementation. I'd like to clarify them before merging the PR even if our solution is to open a new issue and include the fix/expansion in another PR. Adding data to a Table in the data layerWe currently have the following syntax for that in the DB layer (very similar to how elements are added to an Indexset): table = test_mp.backend.optimization.tables.create(
run_id=run.id,
name="Table",
constrained_to_indexsets=[indexset_1.name, indexset_2.name],
)
test_mp.backend.optimization.tables.add_data(
table_id=table.id, data=test_data_1
)
table = test_mp.backend.optimization.tables.get(run_id=run.id, name="Table")
assert table.data == test_data_1 I'm wondering if we want to keep it that way. In the core layer, we have table = run.optimization.tables.create(
"Table",
constrained_to_indexsets=[indexset.name, indexset_2.name],
)
table.add(data=test_data_1)
assert table.data == test_data_1 So the object is updated without another Linking each to Column to a unique IndexsetI'm wondering if different Columns of a Table can be constrained to the same Indexset. Or in other words, I'm wondering if the following should raise an error: with pytest.raises(ValueError):
_ = test_mp.backend.optimization.tables.create(
run_id=run.id,
name="Table 2",
constrained_to_indexsets=[indexset_1.name, indexset_1.name],
column_names=["Column 1", "Column 2"]
) Raising distinct error messages when validating dataAt the moment, we have numerous validation checks for data that is being added to a Table, but all of these checks only raise ixmp4/ixmp4/data/db/optimization/table/model.py Lines 48 to 71 in b682f7e
And ixmp4/ixmp4/data/db/optimization/table/repository.py Lines 79 to 97 in b682f7e
So my question is: are we fine with that or would we want to have distinct custom errors for all these checks? Allowing Table.data to be added piece-mealCurrently, we allow to add data for the various Columns piece-meal like so: table_3 = run.optimization.tables.create(
name="Table 3",
constrained_to_indexsets=[indexset.name, indexset_2.name],
column_names=["Column 1", "Column 2"],
)
table_3.add(data={"Column 1": ["bar"]})
assert table_3.data == {"Column 1": ["bar"]}
table_3.add(data={"Column 2": [2]})
assert table_3.data == {"Column 1": ["bar"], "Column 2": [2]}
table_3.add(
data=pd.DataFrame({"Column 1": ["foo"], "Column 2": [3]}),
)
assert table_3.data == {"Column 1": ["foo"], "Column 2": [3]} If the data to be added contains data for a Column that already has data, that Column's data is overwritten.
Specifying the type of
|
Handling Columns of TablesAt the moment, Columns are only added to a Table during the creation of the Table. Do we want to keep it that way? |
Thanks @glatterf42, see my responses below: Adding data to a Table in the data layerYes, this approach makes sense to me. You could improve performance by setting def add():
test_mp.backend.optimization.tables.add_data(
table_id=table.id, data=test_data_1
)
self._table = None
@getter
def table:
if self._table is None:
self._table = test_mp.backend.optimization.tables.get(run_id=run.id, name="Table")
return self_table Linking each to Column to a unique IndexsetYes, it is absolutely crucial that multiple columns can be foreign-keyed to the same index set. Only the name has to be unique. Raising distinct error messages when validating dataGood enough for now, but maybe to be improved later. Allowing Table.data to be added piece-mealFine in principle to add data step by step similar to the current behavior, but data should be added row-wise. So if you have a table with two columns, the following line should raise an error. table_3.add(data={"Column 1": ["bar"]})
assert table_3.data == {"Column 1": ["bar"]}
> ValueError Missing entry for 'Column 2' Specifying the type of constrained_to_indexsetsI assume that the Handling Columns of TablesIt should not be possible to add columns to a table after creation. |
* Remove some outdated TODOs * Make _add_column() a private function * Return Indexset.names as constrained_to_indexsets * Enforce that Table.data be added row-wise
Thanks @glatterf42
No, this is not the expected use case, given current behavior of ixmp and usual modelling workflows. Imagine a "table" being the investment costs for different power plants in several regions. You often have situations where a modeler has a script e.g. computing parameters for only one type of power plant (or one region). Similar to the current MESSAGE tutorials, I see the need for a workflow like inv_cost_wind = pd.DataFrame() # get a dataframe of investment cost parameters for a certain technology
run.parameters.get("inv_cost").add_data(inv_cost_wind) adding (or replacing existing) datapoints but not removing existing parameter datapoints. |
Sorry, I don't quite follow: in your example, would And mainly for me, to clarify: I'm imagining a table like this:
And a modeler might now want to update |
The usual workflow is to only provide the data that should be changed. For example, IEA publishes new wind turbine cost estimates and a modeller updates the data like run.parameters.get("inv_cost").add_data(
{‘technology’: ‘wind’, ‘region’: ‘EEU’, ‘value’: w, ‘unit’: ‘EUR/GW’}
) |
Okay, I've updated the behaviour to this: table_3 = run.optimization.tables.create(
name="Table 3",
constrained_to_indexsets=[indexset.name, indexset_2.name],
column_names=["Column 1", "Column 2"],
)
table_3.add(data={"Column 1": ["bar"], "Column 2": [2]})
assert table_3.data == {"Column 1": ["bar"], "Column 2": [2]}
# Test data is expanded when Column.name is already present
table_3.add(
data=pd.DataFrame({"Column 1": ["foo"], "Column 2": [3]}),
)
assert table_3.data == {"Column 1": ["bar", "foo"], "Column 2": [2, 3]} If this is now how it should be, I'll add a DB migration and hopefully we can merge this PR soon :) |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
* Include pagination update for Scalar * Include pagination update for Table
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hiya, I looked at the current state of this PR code-wise and it looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @glatterf42, this looks very nice, almost ready to be merged. A few observations inline, and one more general observation (which I made a while ago already): I believe that the order of arguments table-name -> table-columns -> table-columns-constraints-to-indexsets
would be more intuitive...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, many thanks!
The next part of adding the message_ix/ixmp data model.
Still to be done:
For now, still a WIP.