Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show extended run-"index" from Platform.meta.tabulate() #33

Closed
wants to merge 6 commits into from

Conversation

danielhuppmann
Copy link
Member

While further working on the pyam-imxp4-integration, I noticed an inconsistency:

  • Platform.runs.tabulate() returns a dataframe with the columns run-id, model, scenario, version
  • Platform.iamc.tabulate() returns a dataframe with (only) the "user-readable index" model, scenario, version, ...
  • Platform.meta.tabulate() returns a dataframe only with (only) the run-id, ...

In actual work (like the pyam-integration), the model-scenario-version is more relevant.

So this PR extends the Platform.meta.tabulate() to return the "extended index" of model-name, scenario-name and version number.

As a utility, the PR also adds a map() endpoint to the model- and scenario-repository for easy translation of id's to names.

If there is a need for having (only) the run-id at the facade/core-layer, I'd be happy to implement that as a kwarg to both iamc/meta tabulate methods (and if someone has a good suggestion for the name of that argument...)

@danielhuppmann
Copy link
Member Author

The implementation follows the approach in Platform.runs.tabulate() where only the id's are queried from the backend and expanded in the core layer. I guess that joining the tables in the ORM-layer would be a viable alternative, not sure if there is a performance difference...

@danielhuppmann danielhuppmann marked this pull request as ready for review November 27, 2023 09:57
@meksor
Copy link
Contributor

meksor commented Nov 28, 2023

I think there is a strong argument to be made to do this (optionally) in the database layer. Performance wise there is definitely a difference: for big tables I expect faster execution and less memory usage in the database.

There is already similar functionality in the DataPointRepository when supplying join_parameters:

def select(
self,
*,
join_parameters: bool | None = False,
join_runs: bool = False,
_filter: DataPointFilter | None = None,
**kwargs: Any
) -> db.sql.Select:
exc = (
self.select_joined_parameters(join_runs)
if join_parameters
else select(self.bundle)
)
return super().select(_exc=exc, _filter=_filter, **kwargs)

More detail in select_joined_parameters.

Side note:
Currently there doesnt seem to be the need to implement the map method in both the api and db layer. You can just leave it in the abstract base class since its the same code...

@danielhuppmann
Copy link
Member Author

I think there is a strong argument to be made to do this (optionally) in the database layer. Performance wise there is definitely a difference: for big tables I expect faster execution and less memory usage in the database.

Thanks for the feedback. Just for my understanding, does the faster database-execution outweigh the downside of sending a table twice the size via the Rest API?

@meksor
Copy link
Contributor

meksor commented Nov 28, 2023

That's a good question. My guess: yes, since gzip will hopefully take care of compressing away any repeating patterns and sending a seperate request for the list of all runs also presents some overhead.

@danielhuppmann
Copy link
Member Author

closing in favor of #37

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants