Infilling by mzecc · Pull Request #43 · openscm/gcages

mzecc · 2026-03-04T18:30:08Z

Description

Checklist

Please confirm that this pull request has done the following:

Tests added
Documentation added (where applicable)
Changelog item added to changelog/

mzecc · 2026-03-12T20:41:12Z

@znicholls
Messy PR just to show you a bit the work. I'll clean it a bit more.

I have some troubles whit the gridding/country level harmonisation.

Should the country-history_202511261223_202511040855_202512032146_202512021030_7e32405ade790677a6022ff498395bff00d9792d.csv on Zenodo contain the model regions as well?
On the docs/how-to-guides/how-to-run-the-cmip7-scenariomip-workflow.py we work with model_1 that's fine for global harmonisation but I do not get any match for the gridding history. What should happen in that case?

znichollscr · 2026-03-13T06:30:08Z

I have some troubles whit the gridding/country level harmonisation

Yep makes sense. For what Keywan needs, we don't need gridding-level harmonisation so forget about this. Go straight to checking the pre-processing or infilling (as you wish which one you start with).

znichollscr · 2026-03-13T06:33:26Z

Should the country-history_202511261223_202511040855_202512032146_202512021030_7e32405ade790677a6022ff498395bff00d9792d.csv on Zenodo contain the model regions as well?

No they are in gridding-history_202511261223_202511040855_202512032146_202512021030_7e32405ade790677a6022ff498395bff00d9792d.feather

znichollscr · 2026-03-13T06:34:08Z

2. On the docs/how-to-guides/how-to-run-the-cmip7-scenariomip-workflow.py we work with model_1 that's fine for global harmonisation but I do not get any match for the gridding history. What should happen in that case?

(If/when you come back to this, grab the REMIND history from the file above, rename the model to model_1 and it should work.)

mzecc · 2026-03-13T08:27:10Z

(If/when you come back to this, grab the REMIND history from the file above, rename the model to model_1 and it should work.)

Yes, indeed when I do this it works fine.

Ok so for the time being I'll skip the gridding-level harmonisation.

mzecc · 2026-03-13T17:40:08Z

I have cleaned a bit and tests are passing locally. The new tests requires a lot of files for checking.

znichollscr · 2026-03-13T23:12:41Z

Cool. Let's leave this for now. We can come back to it once we have all the other global-level pieces you'll need working

mzecc · 2026-03-19T14:46:53Z

Ok made the corrections. I still have to correct the docs/how-to-guides/how-to-run-the-cmip7-scenariomip-workflow.py file. Some questions:

I have created a CMIP7ScenarioMIPInfiller, would that be the preferred way?
SupportedNamingConventions.GCAGES should be the correct convention we want to use internally, correct? so I should make sure that all the dataframes (infilling, ghg_inverse, historical) obey to that convention? And lead and lead_vl_marker as well I guess.
I still have to restore the original pyproject.toml

znichollscr · 2026-03-20T04:17:21Z

I have created a CMIP7ScenarioMIPInfiller, would that be the preferred way?

Yep that's good

2. SupportedNamingConventions.GCAGES should be the correct convention we want to use internally, correct?

Yep

2. And lead and lead_vl_marker as well I guess

Yep change them as they're loaded (so the original file remains unchanged)

3. I still have to restore the original pyproject.toml

👍

In terms of next steps, two options:

keep going as we are i.e. one PR at a time. Pros: simple Cons: slow (because I can only review once per day)
make a new branch called cmip7-scenariomip-global-integration. Point this PR at that branch. Then, make a new branch cmip7-scenariomip-scm-running. In that branch, add the SCM running stuff on top of this branch. Again, make a PR that points at cmip7-scenariomip-global-integration. Then, for any runs you need to do quickly, you can just use the cmip7-scenariomip-scm-running branch while we do any final clean up before merging into main. Pros: fast (you can just make all the branches and get it all working without needing any feedback from me) Cons: more complex (we'll end up with multiple branches and have to pull them all back together at the end)

Up to you which one you want to go with

mzecc · 2026-03-20T19:22:08Z

It should be almost done. I am not sure of:

the infilled data frame that is returned is the infilled.complete but the others are not returned.
A lot of warnings are triggered by pint.unit

Random things I was late to reply to:

make a new branch called cmip7-scenariomip-global-integration.

It looks an interesting exercise.

I also wonder whether you think you would have spotted a lot of the changes I made if you re-read it yourself, or whether you've looked at this code so much now that it needed someone else to find the little errors.

I see that I might have abused a bit of your time, apologies for that Zeb. The main reason for all the errors you have spotted is that in many situation I am unsure on the best way to proceed or I do not see very clearly the big picture so I leave stuff behind "just in case". Basically, I content myself to make things pass and move on till I have a first draft. I'll try to be cleaner.

znichollscr · 2026-03-22T09:31:49Z

the infilled data frame that is returned is the infilled.complete but the others are not returned

Yep perfect (the others were just there to help me with debugging in emissions harmonisation historical). Maybe we'll add in some extra diagnostic layers, but for now we don't need to.

2. A lot of warnings are triggered by pint.unit

Oh yes, just ignore those for now.

I see that I might have abused a bit of your time, apologies for that Zeb. The main reason for all the errors you have spotted is that in many situation I am unsure on the best way to proceed or I do not see very clearly the big picture so I leave stuff behind "just in case". Basically, I content myself to make things pass and move on till I have a first draft. I'll try to be cleaner.

You never abuse my time, don't worry. The point is more that, in my experience, the fewer lines a reviewer has to look at, the better the review they will/can give. I think getting things working first is a good way of working, maybe just add a quick review to get to a second draft that cleans up all the easy stuff before getting reviews. If there are things where you still don't know the right path, then just add a comment (either directly in the code or in the github PR) saying that (e.g. "Not sure which of the two options is better") so reviewers know it's actually a thing to be discussed/considered, not an accident.

mzecc · 2026-03-24T09:25:29Z

Ready to merge after changelogs addition? Should I squash some commits?

znichollscr · 2026-03-25T00:50:47Z

Ready to merge after changelogs addition? Should I squash some commits?

There are still unresolved conversations and suggestions which we need to fix/resolve first. Once those are done, then yes once the changelogs are added we should be fine to merge.

Don't worry about squashing.

mzecc · 2026-03-25T17:24:14Z

+        if self.cmip7_scenariomip_output:
+            # Use revert to cmip7 ScenatioMIP naming convention.
+            infilled = update_index_levels_func(
+                infilled,
+                {
+                    "variable": lambda x: convert_variable_name(
+                        x,
+                        from_convention=SupportedNamingConventions.GCAGES,
+                        to_convention=SupportedNamingConventions.CMIP7_SCENARIOMIP,
+                    )
+                },
+                copy=False,
+            )


And what about this?

Remove it (and remove the cmip7_scenariomip_output attribute) and put the naming conversion in the test function

znichollscr · 2026-03-25T20:54:30Z

+# TODO: Not currently working. The hash keeps changing.
+# Might be related to embargoed files on Zenodo?


Suggested change

# TODO: Not currently working. The hash keeps changing.

# Might be related to embargoed files on Zenodo?

# TODO: Not currently working.

# We believe this is because you have to be logged in to retrieve the file,

# and we haven't set that up

# (this should work fine once the record is no longer embargoed).

mzecc · 2026-03-26T10:24:59Z

+        # Use gcages naming convention.
+        infilling_db = update_index_levels_func(
+            infilling_db,
+            {
+                "variable": lambda x: convert_variable_name(
+                    x,
+                    from_convention=SupportedNamingConventions.CMIP7_SCENARIOMIP,
+                    to_convention=SupportedNamingConventions.GCAGES,
+                )
+            },
+            copy=False,
+        )
+        cmip7_ghg_inversions = update_index_levels_func(
+            cmip7_ghg_inversions,
+            {
+                "variable": lambda x: convert_variable_name(
+                    x,
+                    from_convention=SupportedNamingConventions.OPENSCM_RUNNER,
+                    to_convention=SupportedNamingConventions.GCAGES,
+                )
+            },
+            copy=False,
+        )
+        historical_emissions = update_index_levels_func(
+            historical_emissions,
+            {
+                "variable": lambda x: convert_variable_name(
+                    x,
+                    from_convention=SupportedNamingConventions.CMIP7_SCENARIOMIP,
+                    to_convention=SupportedNamingConventions.GCAGES,
+                )
+            },
+            copy=False,
+        )


Should I move these into the loading functions?

No it's good here (and makes the logic clearer: loading just loads, but if you want to get setup like ScenarioMIP, you have to do this renaming too to make everything work)

Ok good. I'll add the change logs then

mzecc · 2026-03-27T08:55:46Z

Can we merge this one?

znichollscr · 2026-03-27T11:27:46Z

Yep

mzecc added 3 commits March 4, 2026 19:28

WIP

8ef7d85

Country gridding + infilling

a9c6dc2

Country gridding + infilling

76ccade

mzecc added 4 commits March 13, 2026 14:17

infilling and global harmonisation

75c5269

cleaned how-to-run-the-cmip7-scenariomip-workflow

4e51f95

Failing tests/integration/harmonisation/test_integration_harmonisation

286487d

Tests passing locally

4064581

Added pandas-indexing into pyproject

6856041

mzecc changed the title ~~WIP~~ Infilling Mar 17, 2026

mzecc added 14 commits March 17, 2026 11:44

Resolve merge conflicts with main

9744763

Updated test files

49c2027

Splitted get_pre_industrial_aware_direct_scaling_infiller out

c82cab3

mypy: 1

a761134

mypy: 2

a5c26d7

mypy: 3

555b5bf

mypy: 4

558371e

mypy: 5

9dd8723

mypy: 5

e0e3bb7

mypy: 7

4164a01

mypy: 9

50f6e85

mypy: 11

9003906

mypy:11

025a523

mypy :11

cf5ce7f

mzecc added 10 commits March 20, 2026 16:05

tests passing

9421466

moved complete_index_gcages_names into infilling.py

416f76e

removed scm_runnig.py

760a76c

cleaned common.py

504affd

Updated how-to-run-the-cmip7-scenariomip-workflow

18ad33d

Revert changes to pyproject.toml

611ee19

Corrected how-to-run-the-cmip7-scenariomip-workflow.py

b5dd418

Small errors mypy

8df4528

Removed year column form get_cmip7_scenariomip_harmonised_emissions

b2ca0b9

Removed year column form get_cmip7_scenariomip_infilled_emissions

eed1e31

mzecc mentioned this pull request Mar 21, 2026

Global integration #48

Closed

3 tasks

mzecc commented Mar 24, 2026

View reviewed changes

Comment thread src/gcages/cmip7_scenariomip/infilling.py Outdated

znichollscr mentioned this pull request Mar 25, 2026

Scm Run #47

Merged

3 tasks

mzecc added 3 commits March 25, 2026 17:39

Clean up

0563dfc

Clean up

57cf468

Clean up

90e4d15

mzecc commented Mar 25, 2026

View reviewed changes

znichollscr reviewed Mar 25, 2026

View reviewed changes

removed cmip7 infilled result option

9a4fec2

mzecc commented Mar 26, 2026

View reviewed changes

Added changelog

d1e6c97

mzecc merged commit ae6cab9 into openscm:main Mar 27, 2026
21 checks passed

		# TODO: Not currently working. The hash keeps changing.
		# Might be related to embargoed files on Zenodo?

Conversation

mzecc commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

mzecc commented Mar 12, 2026

Uh oh!

znichollscr commented Mar 13, 2026

Uh oh!

znichollscr commented Mar 13, 2026

Uh oh!

znichollscr commented Mar 13, 2026

Uh oh!

mzecc commented Mar 13, 2026

Uh oh!

mzecc commented Mar 13, 2026

Uh oh!

znichollscr commented Mar 13, 2026

Uh oh!

mzecc commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

znichollscr commented Mar 20, 2026

Uh oh!

mzecc commented Mar 20, 2026

Uh oh!

znichollscr commented Mar 22, 2026

Uh oh!

mzecc commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

znichollscr commented Mar 25, 2026

Uh oh!

mzecc Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

znichollscr Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

znichollscr Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

mzecc Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

znichollscr Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

mzecc Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

mzecc commented Mar 27, 2026

Uh oh!

znichollscr commented Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mzecc commented Mar 4, 2026 •

edited

Loading

mzecc commented Mar 19, 2026 •

edited

Loading

mzecc commented Mar 24, 2026 •

edited

Loading