Skip to content

Infilling#43

Merged
mzecc merged 43 commits intoopenscm:mainfrom
mzecc:REMIND_test
Mar 27, 2026
Merged

Infilling#43
mzecc merged 43 commits intoopenscm:mainfrom
mzecc:REMIND_test

Conversation

@mzecc
Copy link
Copy Markdown
Collaborator

@mzecc mzecc commented Mar 4, 2026

Description

Checklist

Please confirm that this pull request has done the following:

  • Tests added
  • Documentation added (where applicable)
  • Changelog item added to changelog/

@mzecc
Copy link
Copy Markdown
Collaborator Author

mzecc commented Mar 12, 2026

@znicholls
Messy PR just to show you a bit the work. I'll clean it a bit more.

I have some troubles whit the gridding/country level harmonisation.

  1. Should the country-history_202511261223_202511040855_202512032146_202512021030_7e32405ade790677a6022ff498395bff00d9792d.csv on Zenodo contain the model regions as well?
  2. On the docs/how-to-guides/how-to-run-the-cmip7-scenariomip-workflow.py we work with model_1 that's fine for global harmonisation but I do not get any match for the gridding history. What should happen in that case?

@znichollscr
Copy link
Copy Markdown
Collaborator

I have some troubles whit the gridding/country level harmonisation

Yep makes sense. For what Keywan needs, we don't need gridding-level harmonisation so forget about this. Go straight to checking the pre-processing or infilling (as you wish which one you start with).

@znichollscr
Copy link
Copy Markdown
Collaborator

  1. Should the country-history_202511261223_202511040855_202512032146_202512021030_7e32405ade790677a6022ff498395bff00d9792d.csv on Zenodo contain the model regions as well?

No they are in gridding-history_202511261223_202511040855_202512032146_202512021030_7e32405ade790677a6022ff498395bff00d9792d.feather

@znichollscr
Copy link
Copy Markdown
Collaborator

2. On the docs/how-to-guides/how-to-run-the-cmip7-scenariomip-workflow.py we work with model_1 that's fine for global harmonisation but I do not get any match for the gridding history. What should happen in that case?

(If/when you come back to this, grab the REMIND history from the file above, rename the model to model_1 and it should work.)

@mzecc
Copy link
Copy Markdown
Collaborator Author

mzecc commented Mar 13, 2026

(If/when you come back to this, grab the REMIND history from the file above, rename the model to model_1 and it should work.)

Yes, indeed when I do this it works fine.

Ok so for the time being I'll skip the gridding-level harmonisation.

@mzecc
Copy link
Copy Markdown
Collaborator Author

mzecc commented Mar 13, 2026

I have cleaned a bit and tests are passing locally. The new tests requires a lot of files for checking.

@znichollscr
Copy link
Copy Markdown
Collaborator

Cool. Let's leave this for now. We can come back to it once we have all the other global-level pieces you'll need working

@mzecc mzecc changed the title WIP Infilling Mar 17, 2026
@mzecc
Copy link
Copy Markdown
Collaborator Author

mzecc commented Mar 19, 2026

Ok made the corrections. I still have to correct the docs/how-to-guides/how-to-run-the-cmip7-scenariomip-workflow.py file. Some questions:

  1. I have created a CMIP7ScenarioMIPInfiller, would that be the preferred way?
  2. SupportedNamingConventions.GCAGES should be the correct convention we want to use internally, correct? so I should make sure that all the dataframes (infilling, ghg_inverse, historical) obey to that convention? And lead and lead_vl_marker as well I guess.
  3. I still have to restore the original pyproject.toml

@znichollscr
Copy link
Copy Markdown
Collaborator

  1. I have created a CMIP7ScenarioMIPInfiller, would that be the preferred way?

Yep that's good

2. SupportedNamingConventions.GCAGES should be the correct convention we want to use internally, correct?

Yep

2. And lead and lead_vl_marker as well I guess

Yep change them as they're loaded (so the original file remains unchanged)

3. I still have to restore the original pyproject.toml

👍

In terms of next steps, two options:

  1. keep going as we are i.e. one PR at a time. Pros: simple Cons: slow (because I can only review once per day)
  2. make a new branch called cmip7-scenariomip-global-integration. Point this PR at that branch. Then, make a new branch cmip7-scenariomip-scm-running. In that branch, add the SCM running stuff on top of this branch. Again, make a PR that points at cmip7-scenariomip-global-integration. Then, for any runs you need to do quickly, you can just use the cmip7-scenariomip-scm-running branch while we do any final clean up before merging into main. Pros: fast (you can just make all the branches and get it all working without needing any feedback from me) Cons: more complex (we'll end up with multiple branches and have to pull them all back together at the end)

Up to you which one you want to go with

@mzecc
Copy link
Copy Markdown
Collaborator Author

mzecc commented Mar 20, 2026

It should be almost done. I am not sure of:

  1. the infilled data frame that is returned is the infilled.complete but the others are not returned.
  2. A lot of warnings are triggered by pint.unit

Random things I was late to reply to:

  1. make a new branch called cmip7-scenariomip-global-integration.

It looks an interesting exercise.

I also wonder whether you think you would have spotted a lot of the changes I made if you re-read it yourself, or whether you've looked at this code so much now that it needed someone else to find the little errors.

I see that I might have abused a bit of your time, apologies for that Zeb. The main reason for all the errors you have spotted is that in many situation I am unsure on the best way to proceed or I do not see very clearly the big picture so I leave stuff behind "just in case". Basically, I content myself to make things pass and move on till I have a first draft. I'll try to be cleaner.

@mzecc mzecc mentioned this pull request Mar 21, 2026
3 tasks
@znichollscr
Copy link
Copy Markdown
Collaborator

  1. the infilled data frame that is returned is the infilled.complete but the others are not returned

Yep perfect (the others were just there to help me with debugging in emissions harmonisation historical). Maybe we'll add in some extra diagnostic layers, but for now we don't need to.

2. A lot of warnings are triggered by pint.unit

Oh yes, just ignore those for now.

I see that I might have abused a bit of your time, apologies for that Zeb. The main reason for all the errors you have spotted is that in many situation I am unsure on the best way to proceed or I do not see very clearly the big picture so I leave stuff behind "just in case". Basically, I content myself to make things pass and move on till I have a first draft. I'll try to be cleaner.

You never abuse my time, don't worry. The point is more that, in my experience, the fewer lines a reviewer has to look at, the better the review they will/can give. I think getting things working first is a good way of working, maybe just add a quick review to get to a second draft that cleans up all the easy stuff before getting reviews. If there are things where you still don't know the right path, then just add a comment (either directly in the code or in the github PR) saying that (e.g. "Not sure which of the two options is better") so reviewers know it's actually a thing to be discussed/considered, not an accident.

@mzecc
Copy link
Copy Markdown
Collaborator Author

mzecc commented Mar 24, 2026

Ready to merge after changelogs addition? Should I squash some commits?

Comment thread src/gcages/cmip7_scenariomip/infilling.py Outdated
@znichollscr
Copy link
Copy Markdown
Collaborator

Ready to merge after changelogs addition? Should I squash some commits?

There are still unresolved conversations and suggestions which we need to fix/resolve first. Once those are done, then yes once the changelogs are added we should be fine to merge.

Don't worry about squashing.

@znichollscr znichollscr mentioned this pull request Mar 25, 2026
3 tasks
Comment on lines +771 to +783
if self.cmip7_scenariomip_output:
# Use revert to cmip7 ScenatioMIP naming convention.
infilled = update_index_levels_func(
infilled,
{
"variable": lambda x: convert_variable_name(
x,
from_convention=SupportedNamingConventions.GCAGES,
to_convention=SupportedNamingConventions.CMIP7_SCENARIOMIP,
)
},
copy=False,
)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And what about this?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove it (and remove the cmip7_scenariomip_output attribute) and put the naming conversion in the test function

Comment on lines +10 to +11
# TODO: Not currently working. The hash keeps changing.
# Might be related to embargoed files on Zenodo?
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# TODO: Not currently working. The hash keeps changing.
# Might be related to embargoed files on Zenodo?
# TODO: Not currently working.
# We believe this is because you have to be logged in to retrieve the file,
# and we haven't set that up
# (this should work fine once the record is no longer embargoed).

Comment on lines +818 to +851
# Use gcages naming convention.
infilling_db = update_index_levels_func(
infilling_db,
{
"variable": lambda x: convert_variable_name(
x,
from_convention=SupportedNamingConventions.CMIP7_SCENARIOMIP,
to_convention=SupportedNamingConventions.GCAGES,
)
},
copy=False,
)
cmip7_ghg_inversions = update_index_levels_func(
cmip7_ghg_inversions,
{
"variable": lambda x: convert_variable_name(
x,
from_convention=SupportedNamingConventions.OPENSCM_RUNNER,
to_convention=SupportedNamingConventions.GCAGES,
)
},
copy=False,
)
historical_emissions = update_index_levels_func(
historical_emissions,
{
"variable": lambda x: convert_variable_name(
x,
from_convention=SupportedNamingConventions.CMIP7_SCENARIOMIP,
to_convention=SupportedNamingConventions.GCAGES,
)
},
copy=False,
)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I move these into the loading functions?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it's good here (and makes the logic clearer: loading just loads, but if you want to get setup like ScenarioMIP, you have to do this renaming too to make everything work)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok good. I'll add the change logs then

@mzecc
Copy link
Copy Markdown
Collaborator Author

mzecc commented Mar 27, 2026

Can we merge this one?

@znichollscr
Copy link
Copy Markdown
Collaborator

Yep

@mzecc mzecc merged commit ae6cab9 into openscm:main Mar 27, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants