Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dependencies #41

Merged
merged 24 commits into from
Feb 12, 2024
Merged

Update dependencies #41

merged 24 commits into from
Feb 12, 2024

Conversation

glatterf42
Copy link
Member

While troubleshooting a message-ix-models CI failure, I noticed that I couldn't update to the latest version of the dependencies because of outdated dependency versions in ixmp4 0.6.0. So this PR will eventually bump these/all dependencies.

At the moment, though, it still faces several warnings that need to be addressed. I'm not sure we can remove all of these warnings, though:

  • pytest 8.0.0 is incompatible with pytest-lazy-fixture, whose main branch has not seen an update in two years. So for now, I've adapted all tests to not use pytest-lazy-fixture, but as you can see, this adds quite a bit of repetition. A friendly coder has posted a different possible fix; please take a look @meksor and let me know which you prefer.
  • pandas is considering to add pyarrow as a mandatory dependency in 3.0.0. While this is still debated, they have already introduced a DeprecationWarning so that users can add pyarrow to their dependencies already. However, simply doing that caused several test failures for me (all related to data/db/base.py:bulk_upsert_chunk() in the meta tests) as detailed here.
  • Mypy is now complaining about lines 189 and 285 of db/filters.py, where the result of field_info.json_schema_extra.get() can be a multitude of types, while we only expect to deal with specific ones (though for line 189, I'm not sure whether we're expecting a dict or a list).
  • There are also several new Future- and DeprecationWarnings from pandas that we should address.

However, once these are clarified, I suggest releasing the next minor version.

Side note: I also want to try if we can support python 3.12; we might need a more recent version of e.g. pluggy, but this is transitive, so I don't know how much work this might be.

@glatterf42 glatterf42 added the enhancement New feature or request label Jan 29, 2024
@glatterf42 glatterf42 self-assigned this Jan 29, 2024
@glatterf42
Copy link
Member Author

glatterf42 commented Jan 30, 2024

@meksor I haven't quite figured out how to best address even the first pandas warnings that I started to look at. In particular, please check out data/db/meta/repository.py: the original solution is now triggering this:

tests/core/test_meta.py::test_run_meta[test_sqlite_mp]
tests/core/test_meta.py::test_run_meta[test_sqlite_mp]
tests/core/test_meta.py::test_run_meta[test_sqlite_mp]
tests/core/test_meta.py::test_run_meta[test_sqlite_mp]
tests/core/test_meta.py::test_run_meta[test_sqlite_mp]
  /home/fridolin/ixmp4/ixmp4/data/db/meta/repository.py:176: DeprecationWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.
    return df.groupby("type", group_keys=False).apply(map_value_column)

Until now, I haven't been able to replicate the exact behaviour of this without triggering warnings. The suggested solutions don't work for me; include_groups=False leads to an error because "type" is not available for being changed in map_value_column(), and the only way I found to select a group (get_group()) can only ever select one group, while we want to work on all groups. Maybe looping over groups is still the preferable solution here since the others aren't quite working either.
The other solutions all come from the same question on StackOverflow, specifically these answers:

None of these are quite there yet, I feel the one from numba might even be closest, but numba.jit doesn't seem to support types generic enough to hold our value column, I'm afraid.

Please also note that my solution for the first FutureWarning was to just not use pd.DataFrame.fillna() at all since the test dataframe already contains only np.nan and None, but this might not generally be true for future use cases.

@glatterf42
Copy link
Member Author

We also already receive warnings for the next big changed announced for pandas 3.0.0: Copy-on-Write will be the default on only mode of operation. Fortunately, there already exists a migration guide and we can invest some time to future-proof our code now.

@meksor
Copy link
Contributor

meksor commented Jan 31, 2024

  • lazy fixtures: the solution currently on this branch is a little confusing but its not too bad. My hunch is that its best to reimplement something close to what plf did and keep an eye on the project in case someone picks it back up.
  • not sure if its even wise to update pandas if there is so much commotion and uncertainty over this change. I would propose to wait on this if possible.
    I for one think its probably a good idea to add one of the parallelized data libaries to pandas directly. if this change works out we may be able to drop dask from our deps (although i havent found the right operation yet).
  • Mypy: i think i already fixed those errors in my latest PR
  • hm, unsure how to continue with this. I guess performance isnt the deciding factor here as apply has always been very slow. But if you find a good solution its probably best to use it everywhere apply is used.
  • fillna: Hm not sure about that i think there was a specific reason to replace all the nans with Nones but it may be best to check with @danielhuppmann ? Its probably also wise to bake those requirements into a test once clarified.

@glatterf42
Copy link
Member Author

glatterf42 commented Feb 7, 2024

Thanks for the feedback @meksor, I'll try to incorporate it further soon.

@danielhuppmann, @phackstock: I want to prioritize this PR to support Python 3.12, so we should aim for a release soon after following this PR. Hopefully, pyam-iamc can then follow suit and support 3.12, which would allow message-ix-models and message_data to support it. Currently, new users are running into issues by seeing that message_ix and ixmp do support 3.12 and not seeing that the rest of the stack doesn't necessarily.
Unfortunately, though, I have a small presentation to set up before I can focus on this.

@danielhuppmann
Copy link
Member

Thanks @glatterf42, sounds good to me to support python 3.12, I'll make sure that pyam will follow suit after the ixmp4 release. It would be critical to have the pyam-ixmp4-release before end of February 22 so that users can directly query the ixmp4 ssp-extensions database via our Python tools (and the public launch of that project will be end of February).

@glatterf42
Copy link
Member Author

Wrt lazy_fixtures, I'm actively following that discussion. Pytest might be incorporating lazy_fixtures in its core implementation, but that also might take some time.

I've reverted the pandas update and mypy is happy after the rebase, too.

@glatterf42
Copy link
Member Author

The PR now also checks python 3.12 and it's going surprisingly well. Locally, I ran into another error with test_benchmarks_filter (in the api layer, will try to replicate tomorrow or so), but these don't happen here, so I don't know if we're good. I'll also check if dateutil and multimethod can be updated, but apart from these two warnings (and potentially some documentation), we seem to be almost good to go for 3.12 :)

@glatterf42
Copy link
Member Author

Re: dateutil: This is not a direct dependency of ours and we don't call the deprecated function anywhere in our code. Dependencies like pandas and jupyter do depend on dateutil, though, which has last been released 2.5 years ago. Their current master branch already contains a fix for the DeprecationWarning we are seeing, but they haven't managed to publish a new release yet, despite there being large popular demand:

So for now, if we can live with the DeprecationWarning, I'd suggest keeping track of it with a new issue, but merging this PR anyway (if the other things are dealt with).

@glatterf42
Copy link
Member Author

Re: multimethod: same here, this is a transitive dependency of ours. Pygments and pandera are using it and the issue arises due to an incompatibility of pandera 0.18.0 and multimethod 1.11.0. This issue has already been raised with pandera and a version pin introduced on their main branch, but this was just ten hours ago, so they didn't have time for a new release yet. We might get a notification about a new release by checking this PR:

In this case, it seems more likely that a patch release is coming soon, so we might not need to keep track with a new issue.

@glatterf42

This comment was marked as outdated.

@glatterf42

This comment was marked as outdated.

@glatterf42
Copy link
Member Author

The current solution is to test once if a connection to a postgresql db can be established and skip all corresponding tests if it can't. This fix is now applied for all pqsgl_mp fixtures. My main question for @meksor is: do we need

ixmp4/tests/conftest.py

Lines 31 to 32 in 1013429

# mp.backend.close()
except OperationalError:
to safely disconnect from the db? If yes, mypy would not be happy about that, I think.

@glatterf42 glatterf42 marked this pull request as ready for review February 9, 2024 12:28
@glatterf42

This comment was marked as outdated.

@meksor
Copy link
Contributor

meksor commented Feb 9, 2024

Yes, we need backend.close! Why does mypy care?

@glatterf42
Copy link
Member Author

glatterf42 commented Feb 12, 2024

Mypy is showing "Backend" has no attribute "close" Mypyattr-defined. Curiously, this error is not shown e.g. here:

ixmp4/tests/conftest.py

Lines 115 to 118 in 55a4774

@pytest.fixture
def test_pgsql_mp():
mp = Platform(_backend=PostgresTestBackend())
yield mp

And when I insert a yield mp statement between my mp = and mp.backend.close() lines, the mypy error goes away (even though yield is not allowed outside of a function). So my guess is that mypy assumes something is happening to mp after it's yielded such that it does have .backend.close() at the time we call it. Which is probably not what we want, it should probably just recognize the function the first time around.

In other news, what do you mean with 'we need backend.close'? The tests seem to have been running fine even without it.

* Update badges for python 3.12
* Replace black with ruff
@glatterf42
Copy link
Member Author

This PR should now be good to go; fixed the spelling mistake (that already existed before this PR) and updated all badges/docs in general.

Copy link
Member

@danielhuppmann danielhuppmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment, then good to be merged, as far as I can tell...

ixmp4/data/db/base.py Outdated Show resolved Hide resolved
@glatterf42

This comment was marked as outdated.

@glatterf42
Copy link
Member Author

I'm going to merge this now, please let me know if we need any cleanup and I'll set up a new PR.

@glatterf42 glatterf42 merged commit bff2c8c into main Feb 12, 2024
5 checks passed
@glatterf42 glatterf42 deleted the update/dependencies-2024-1 branch February 12, 2024 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants