New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conform to set API, update to pytest, add CI badges #27
Conversation
Thanks for all the improvements! I think most of this sounds like a good idea. I see the tests marked with "doctest want" where the outputs aren't being checked. I'd say go ahead and make them into real doctests. Having code examples in the docstrings that are automatically checked is pretty nice. I can see the advantage of using Travis -- a public-facing CI for a public-facing project. I haven't actually used it before, only Jenkins, but it sounds easy enough to use. I think you're going to have to sell me on adding both Travis and Appveyor -- what additional benefit does Appveyor provide? Will I have to get an Appveyor account and manage it? This seems like quite a proliferation of tooling for a simple module. |
Actually I meant to remove Appveyor from this PR, but forgot. I think Travis is probably sufficient. The main difference is one runs code on Linux and the other runs on Windows. I don't see any reason why there would be a difference in this application (although I've been burned by that mindset before). I've gone ahead and removed the Appveyor badge. I've also changed all of the usernames to target your repo. As for Travis CI, you simply go to their website: https://travis-ci.org/. You don't actually have to create a new account. When you click "Sign Up", it gives you the option to simply link to your existing GitHub account. Then you just find this repository and toggle enable CI testing. That's it. New commits and PRs will automatically trigger the appropriate build as long as your repo contains a Something I forgot to mention, is that there is also a badge in there for https://codecov.io. You can create an account for that the same way as you would for travis, but you don't have to enable the project. This is because codecov.io has tight travis integration and will detect when it performs a CI build that produces a coverage file. Speaking of coverage, I've gone through and patched a few holes to bring this repo from ~97% coverage to 100% coverage. These were mostly due to not testing that type errors were raised. I've changed all of the unit tests into doctests. Note, the doctests were originally targeted towards xdoctest (a shameless self plug). I converted them to work with "old-style" builtin doctests as to not add any new dependencies to this simple lib. Lastly I wanted to bring up two other issues I saw when going through the code again: (1) When a method makes a copy (in Let me know on the last two points. If you like the idea then I'll make the changes. Otherwise I think the PR is good to go. |
I think your view of how doctests should work is different from mine. I don't think that all the tests should be in the docstrings. Seeing doctests with 'assert' and pytest decorators is very strange to me. To me, the purpose of doctests is that docstrings should provide usage examples, and when you have usage examples, you should test that they are correct. Also, I'm not sure about this, but I seem to recall that "old-school" doctest doesn't want the doctests to be indented by 4 spaces, it instead wants them to be at the same indentation level and separated by blank lines. |
I tend to be fairly liberal with my doctests (in order to develop more rapidly, I usually just paste whatever test code I wrote in IPython into the code as a first-pass test), but I understand your point, especially for a package like this that is not in rapid development. I thought you were saying that I could leave what I had as-is, but I'll simply move the ones that don't provide a simple example back into the tests.py file. The current doctests in this PR work with the builtin Python doctest module. The indentation before the I'll make this change later tonight. Again, let me know about points (1) and (2). |
(1) Having the methods return another object of the same class sounds like a good idea. (2) I agree that equality is inconsistent here, but it wouldn't work to ignore order in all comparisons. OrderedSets in different orders should be unequal. I frequently use OrderedSet when I need a bidirectional mapping between a sparse vocabulary and a dense array, so I can use fast NumPy operations but also keep track of what the data means. This use case is somewhat like I sometimes need to test these vocabularies for equality, when merging data. If you have arrays labeled with the same vocabulary in the same order, you can just concatenate them on that axis. If the vocabularies are in a different order, you can't. Another person who contributed via GitHub, Andrew Magee, added the case that allows comparing an OrderedSet to a built-in set for equality. I believe the main justification for this is that OrderedSet is supposed to be a subclass of set (actually of MutableSet). So the use cases of OrderedSet include both ordered and unordered equality. Maybe there could be more cases for lists and tuples; if I did end up comparing an OrderedSet to a list, then the least surprising and most useful thing would be to preserve order. But I can't think of anything more principled than just checking a bunch of types and deciding whether they're ordered or not. |
All of this makes sense. I can think of a principled approach to the ordered check problem. Borrowing ideas from PR #24, we can use multiple inheritance to inherit from both I've gone ahead and implemented this because I think its a nice idea, but I can revert it if you disagree. One bit of the implementation I was unsure of. Should: OrderedSet([1, 2, 3]) == [1, 1, 2, 2, 3, 3] return True of False? My instinct is that it should return False. My current implementation uses this behavior. Overall I think the new
I've added 4 new tests to test.py to make sure all of these cases are handled properly. All tests pass (on my machine) on python 2.7 and python 3.6, and the code has 100% coverage. If you like the way that I've addressed the aforementioned issues then I feel this PR is in a merge-able state. If not, lets discuss. |
That seems like a pretty good idea. We were basically implementing Sequence anyway, might as well inherit from the base class. When testing the methods that come from the base classes, I noticed an inconsistency from the fact that we have this order-sensitive
But that's been there since before this PR. I need to make |
And, I agree that |
This is a fairly large PR in terms of text, but I think the content is manageable for review.
It seems like this is the best implementation of OrderedSet on PYPI, but unfortunately it is missing functions implemented by the builtin set object like:
intersection
,symmetric_difference_update
, andissubset
. This PR adds implementations for these functions.A brief summary of changes is:
set
API.As for reasoning behind these changes:
__getitem__
change may be the most debatable. The previous docs stated thatOrderedSet()[:]
would return an exact self reference, but I want to argue against this behavior. The main point is that this deviates from a standard behavior taken by other Mutable Python objects. Hopefully this code snippet illustrates my point:A potential counter point may be that
tuple([1, 2, 3])[:]
, does return a reference to itself, buttuple
is an immutable object, where asOrderedSet
is not.Unfortunately, even though this change is relatively minor it is a potentially breaking change that requires a major version update. My PR includes that as well. If I fail to convince you that this is a good idea, it can obviously be taken out and the minor version can be incremented.
Note: I'm submitting the PR in its current state to gauge interest. Some of the code --- especially the tests --- needs a bit of tweaking to match the style of the other code. I will do this work if the maintainers like the general changes.
I have my own implementation of OrderedSet, also based on Raymond Hettiger's original recipe, in my utility package but I'd like to rely on a standard pypi package instead. Also, I agree that nobody wants O(1) deletes in exchange for O(N) lookups. Thus most of this PR was simply copy and pasted from my implementation.
All of the new tests were autogenerated from the doctests in my old package. As such, the stdout way of testing results doesn't work and all tests would need to be updated to actually check for the right values. Lastly, many of the tests might be redundant with existing tests. Again, I do the work to make these fixes if there is any interest in merging this PR.
A summary of the cleanup work that would need to be done is:
LuminosoInsight
instead ofErotemic
.