Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SortedContainers 2.0 broke compatibility with no warning #92

Closed
ghost opened this issue May 21, 2018 · 20 comments
Closed

SortedContainers 2.0 broke compatibility with no warning #92

ghost opened this issue May 21, 2018 · 20 comments

Comments

@ghost
Copy link

ghost commented May 21, 2018

Hello,
You may not view this as a bug, but it was a big surprise for me! SortedDict() .keys() used to return sets that could do .intersect() operations. It must now be converted from SortedKeysView (or something?) to a set() and this uses tons of memory and time. The iloc method was also removed. I only got that far before I realized the build I was troubleshooting had a newer version than dev and was broken because of it. Please put a "breaking changes" section into the documentation when making these sorts of things! Or better yet, don't go fixing what isn't broken and also remove functionality :-D

@observerss
Copy link

observerss commented May 21, 2018

same problem, big surprise
maybe should at least doc the change and reasoning and an upgrade guide?

Update: found doc in HISTORY.rst

@mrocklin
Copy link

Same problem. I and many users that I support depend on sortedcontainers. I'm happy to see things modernize but would appreciate a deprecation cycle so that downstream projects can respond sensibly. All of my users that are now installing things are getting failures without warning.

I would strongly appreciate the rapid removal of the offending package from PyPI and the upload of a new one that raised deprecation warnings.

@grantjenks
Copy link
Owner

Hello, all. Just woke up. Will look at this immediately. Sorry for the surprise breaks.

I have a busy day/week so the easiest current solution that I can see is to remove the V2 uploads from PyPI. Is there any way to leave them there and make v1.5.10 preferred?

@mrocklin
Copy link

Is there any way to leave them there and make v1.5.10 preferred?

I've seen people host pre-releases on PyPI so there must be a way to have newer-but-not-default packages around.

@grantjenks
Copy link
Owner

grantjenks commented May 21, 2018

Is it just iloc that you need? I can push a quick fix for that.

sorted_dict.iloc = sorted_dict.keys()

@grantjenks
Copy link
Owner

grantjenks commented May 21, 2018

@LipillaiDave Can you be more specific? The SortedKeysView inherits collections.KeysView which does implement a set-like interface.

Also breaking changes are documented in the release history at http://www.grantjenks.com/docs/sortedcontainers/history.html

Also upgrade guide at http://www.grantjenks.com/docs/sortedcontainers/introduction.html#migrating

@mrocklin
Copy link

Yes, iloc is specifically what has broken dask.distributed. However I'm more than happy to evolve towards a newer API that sortedcontainers devs think is cleaner long term. I just need some time to smooth over the transition.

In my ideal world iloc would continue to work for another version or two, but raise a warning the first time it was used (not every time, which would flood users' logs) warning people of the coming change and pointing them towards correct behavior. Probably some user would note this in downstream projects' issue trackers, which would then compel those projects to adapt to the change.

Of course, if you think that iloc is something that is useful to keep then I wouldn't mind that either.

@grantjenks
Copy link
Owner

grantjenks commented May 21, 2018

I get what you're saying. It sounds nice. But it will take me a fair amount of time to learn how to do that. I can probably get there by the end of next week. For today, let's figure out the best way forward.

  1. I think iloc should go away. The functionality is now in SortedKeysView which you can get with sorted_dict.keys(). There are V2 uploads on PyPI that I know folks have started using.
  2. I can put iloc back in as a temporary measure to un-break things. But beware that other methods like .items() and .values() also changed (all return dict views now).
  3. I can alternatively remove the V2 uploads from PyPI. I can't see a setting that marks them pre-release. Can you point to docs on that? This will make another group of users unhappy as they've moved to V2. Hopefully not many. I released V2 on Friday, May 18.
  4. Is there a way to change the sortedcontainers version to "<2" temporarily in your code?

I numbered things above just for reference. Don't think of them sequentially.

@mrocklin
Copy link

  1. I use iloc frequently within inner loops. Is there a performance hit here?

  2. I'm not particularly concerned about changes that reflect python 2/3 changes. In my experience most packages are already robust to this. If there are differences beyond what is typical in dicts when moving from Python 2/3 then I'm not sure how best to handle this.

  3. This might help? https://stackoverflow.com/questions/35131861/what-does-the-pre-option-in-pip-signify?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa

    I'm unfortunately just as inexperienced here

  4. Right, so my atlernative solution if you keep the package up is to upload a new package on my end that pins away from sortedcontainers as you suggest. I'm assuming that other projects also depend on sortedcontainers though and so it makes more sense to handle this upstream rather than in all of the downstream packages. This might not be the case though. Sortedcontainers seems like it was built to be an infrastructural library, so I've been treating it like one in the past.

@ghost
Copy link
Author

ghost commented May 21, 2018

Hi Grant,

Sorry if I seemed crabby. I'm extremely grateful for SortedCollections. It was just past 1AM and I was woke up from sleep troubleshooting remote customers.

I'm not opposed to it changing moving forward, although I'd like to present a case for keeping iloc. Is there any real reason to remove it? SortedContainers is a feature- and performance-plus package. It makes sense to emulate Python3 style, sure, but why remove what is working well and fast for those who use it for the right reasons?

In regards to your question about intersections, this line of code was getting the following error:

  File "thingy.py", line 558, in resample_query_sample_time_set
    times = ts.sampled_series_json.keys().intersection(set(xrange(time_start,time_end,60)))
AttributeError: 'SortedKeysView' object has no attribute 'intersection'

@grantjenks
Copy link
Owner

@LipillaiDave Thank you for the kind words. That was an unintended regression. Sorry! I will look into it today.

@mrocklin -- I can add .iloc back with no performance regression. I will try to do so this morning.

I am teaching class all day so I may be less responsive then I would like. I will update at 1pm today (lunch) and see what I can get done this morning.

I appreciate you all as users and your feedback!

@mrocklin
Copy link

mrocklin commented May 21, 2018 via email

@grantjenks
Copy link
Owner

grantjenks commented May 21, 2018

I added back the sorted_dict.iloc attribute. Committed at 0ac84ed and deployed at v2.0.2. @mrocklin Does that fix your problem?

@LipillaiDave The KeysView does implement the set operations but as operators rather than as methods. Can you change your ".intersect" code to use "val1 & val2"?

@grantjenks
Copy link
Owner

@LipillaiDave Looks like you're using Python 2 and executing https://github.com/grantjenks/python-sortedcontainers/blob/v1/sortedcontainers/sorteddict.py#L224 from v1. In v1 on Python 2, the sorted_dict.keys() methods returns a SortedSet which implements the intersect method. Given the source like:

times = ts.sampled_series_json.keys().intersection(set(xrange(time_start,time_end,60)))

I think you can rewrite it as:

times = ts.sampled_series_json.keys() & xrange(time_start, time_end, 60)

Now, the times variable will be a set object rather than SortedSet. Do you need a set or SortedSet? That can be easily changed. You can work around the bug by wrapping the result like:

times = SortedSet(ts.sampled_series_json.keys() & xrange(time_start, time_end, 60))

Now I'm just guessing but I think you might've implemented something like:

times = SortedSet(ts.sampled_series_json.keys()).intersection(xrange(time_start, time_end, 60))

That is indeed slow as iterating the SortedKeysView is currently an O(n*log(n)) operation. That should be made faster (sorry!) by defining an optimized iteration method instead of using Sequence.__iter__.

So I think the necessary changes are:

  1. Set operations for SortedKeysView and SortedItemsView should return a SortedSet.
  2. Iterating a SortedKeysView, SortedValuesView, or SortedItemsView should be as fast as possible.

Would that fix your issues?

@grantjenks
Copy link
Owner

I think (2, above) is already done. Iterating a KeysView iterates the mapping directly which iterates the sorted list which should be extremely fast.

For reference: https://github.com/python/cpython/blob/3.6/Lib/_collections_abc.py#L719

Could you say more about, "and this uses tons of memory and time," in your original report?

@mrocklin
Copy link

I added back the sorted_dict.iloc attribute. Committed at 0ac84ed and deployed at v2.0.2. @mrocklin Does that fix your problem?

Dask's test suite does pass again, yes.

@mrocklin
Copy link

Should we plan to deprecate iloc in the future?

@grantjenks
Copy link
Owner

@mrocklin Yes, consider SortedDict.iloc deprecated. I'm working on changes to emit the warning.

@grantjenks
Copy link
Owner

Deprecation warning emitted at 2a00449.

SortedKeysView and SortedItemsView set-operations returned sorted sets at 28e99c0.

I plan to release the changes in v2.0.3 this week.

@grantjenks
Copy link
Owner

Fixes deployed in v2.0.3 on PyPI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants