New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fetch and load public base model snapshot #102
Conversation
Codecov Report
@@ Coverage Diff @@
## main #102 +/- ##
==========================================
+ Coverage 67.11% 67.87% +0.76%
==========================================
Files 58 61 +3
Lines 4011 4106 +95
==========================================
+ Hits 2692 2787 +95
Misses 1319 1319
|
5bc623e
to
6e50b47
Compare
6e50b47
to
393f213
Compare
One solution here would be to increase the JVM heap space for the JDBCBackend used in the tests, when on GHA. |
Also, since we are actually working on |
Thanks, that's useful. Looking at This means everything else, i.e.:
…altogether takes up 99 - 73 = 26% of the run-time, i.e. 1/3 of the duration of the Excel read. If there's a faster way of reading large Excel files, we could incorporate that. I have searched repeatedly, but not found anything. |
Track all .csv.gz and .xlsx files using Git LFS.
8d51636
to
b3c043d
Compare
FYI @awais307: @glatterf42 mentioned that you wanted to write some code to fetch GLOBIOM data from Zenodo. I didn't know that GLOBIOM data was already published there! In any case, please look at the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests are all passing and the CLI command works perfectly, but the docs could use improvement:
Trying to execute the code in doc/api/model-snapshot.rst, I find ImportError: cannot import name 'snapshot' from 'message_ix_models' (/home/fridolin/message-ix-models/message_ix_models/__init__.py)
or Module "message_ix_models" has no attribute "snapshot"
. Also, line 24 should either be scenario = ...
or line 26 should be snapshot.load(s, 0)
, I think.
Thanks for catching those! I will push another commit to fix them, so that you can approve. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixes, snapshot
is importable now. And while I trust that you checked the load function yourself, I actually don't have enough modeling experience to get this code snippet to work (since I fail to provide suitable parameters for Scenario(...)
). This might indicate that the example should be expanded depending on who the intended users are, but I also don't doubt that I would be able to get this to run if I spent more time on reading the docs about Scenario()
.
Great —can you then please approve? ✅ This code is indeed meant to be used by users who have already learned how to create new Scenarios on message_ix, and the links are there to the documentation of the Scenario class if they need to remind themselves. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, looks good to me then.
Following the release of https://doi.org/10.5281/zenodo.5793870, this PR adds code to fetch the snapshot from Zenodo and load it in a platform of the user's choice, for various uses including further development and testing other code.
Notes
Units
The first snapshot contains the unit strings "USD_2005/t" and "USD_2005/t " (with trailing space).
ixmp-dev
platform (IIASA ECE's internal Oracle database), these two unit strings are already defined (it's unclear how this happened; possibly was done with an older version of the ixmp_source Java code).Scenario.read_excel()
fails..snapshot._unpack()
, that unpacks or explodes the entire Excel snapshot into 1 compressed CSV file per parameter. These files are then individually added to the scenario.Testing
java.lang.OutOfMemoryError: Java heap space
.--jvmargs
pytest option (defined inmessage_ix_models.testing
) is used to increase JVM heap space to 6 GB; this is slightly below the total available on GHA runners.These files are excluded from packaging (MANIFEST.in).
unpacked_snapshot_data
, which moves these files into the location they would be unpacked to.The files are thus not read from Excel again when the tests execute.
test_snapshot.test_load
is limited to the ubuntu-latest runners, i.e. skipped on macOS and Windows. This is because it increases the total job run time from:Other changes
How to review
mix-models snapshot fetch 0
on the branch; confirm the code works.PR checklist