Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

intbitset: harmonise pop() behaviour #626

Closed
tiborsimko opened this Issue · 7 comments

3 participants

@tiborsimko
Owner

Originally on 2011-05-03

As mentioned in #621, we may want to improve a little bit the
description and/or behaviour of intbitset's pop(). Notably, people
may be using search engine's API functions returning intbitsets that
look like lists. Here, pop() has ordered meaning for lists, while
for sets it pops any random element:

In [2]: from invenio.intbitset import intbitset

In [3]: l = [1, 10, 2, 20]

In [4]: s = set(l)

In [5]: i = intbitset(l)

In [6]: l, s, i
Out[6]: ([1, 10, 2, 20], set([1, 10, 20, 2]), intbitset([1, 2, 10, 20]))

In [7]: xl, xs, xi = l.pop(), s.pop(), i.pop()

In [8]: l, s, i
Out[8]: ([1, 10, 2], set([10, 20, 2]), intbitset([2, 10, 20]))

In [9]: xl, xs, xi
Out[9]: (20, 1, 1)

The difference between lists and intbitsets is strictly taken OK,
because intbitsets emulate the API of sets, so pop() removes an
arbitrary set element. However, behind the scenes intbitset's pop()
calls intBitSetGetNext() that does an ordered removal, not an
"arbitrary" removal; so we can document this better for end users.

More to the point, intbitset has a native notion of element order,
being a set of increasing integers; it does resemble ordered lists of
integers in this respect. intbitset can be considered as a kind of
ordered set of increasing integers that emulates set API, so having
some facets of lists and some facets of sets, as it were.

Therefore we may want to improve the docstring of intbitset's pop()
in order to reflect this mixed nature of intbitsets: (i) at least by
documenting the non-arbitrary parts, but (ii) more to the point, we
may want to alter perhaps the meaning of what pop() returns, so that
intbitset would resemble lists more (i.e. returning last, not first
element). It will still be an "arbitrary" removal from the set API
point of view, but it will resemble more to what people may be used to
from the list API point of view, if they think of intbitsets as of
lists of increasing integers.

P.S. Not thinking here about list-specific calls like pop(n).

@kaplun
Collaborator

Originally on 2012-07-05

I agree with you on the non arbitrary implementation of pop() in intbitset. However, for performance reasons, it might be nicer to still return the smaller bit, because in order to find the bit to return, the implementation still has to scroll the whole bitset.

  • If it has to return the smallest bit it can stop the scrolling as soon as possible
  • If it has to return the largest bit it has to stop as late as possible, hence with a performance impact.

So what if we fully document this behavior? Alternatively one can imagine to add a flag to the .pop() function saying:

def pop(last=False):
    pass

In this way the faster implementation is used by default, but one can still use the slower and more stack-friendly one.

@invenio-developers
Collaborator

Originally by Samuele Kaplun samuele.kaplun@cern.ch on 2012-07-06

In [6ad9956]:

#CommitTicketReference repository="" revision="6ad995694e33b9d5d83fe5b09bf988f85050e2b3"
intbitset: pop last element

- When pop is called on a list, the last element is returned. When
  it is called on a set, an arbitrary element is returned. So far
  intbitset was returning the smallest element. In order to expose
  a behaviour more similar to the expectation of an ordered list,
  intbitset now returns the largest element.
  (closes #626)
@invenio-developers
Collaborator

Originally by Nikola Yolov nikola.yolov@cern.ch on 2012-08-03

In 6511a02:

#CommitTicketReference repository="" revision="6511a023ef60bf53a289e13257e8053e939604c6"
intbitset: fixes pop() function

- Fixes pop() function.  (addresses #626)
@invenio-developers
Collaborator

Originally by Samuele Kaplun samuele.kaplun@cern.ch on 2012-08-09

In 6ad9956:

#CommitTicketReference repository="" revision="6ad995694e33b9d5d83fe5b09bf988f85050e2b3"
intbitset: pop last element

- When pop is called on a list, the last element is returned. When
  it is called on a set, an arbitrary element is returned. So far
  intbitset was returning the smallest element. In order to expose
  a behaviour more similar to the expectation of an ordered list,
  intbitset now returns the largest element.
  (closes #626)
@invenio-developers
Collaborator

Originally by Nikola Yolov nikola.yolov@cern.ch on 2012-08-09

In 6511a02:

#CommitTicketReference repository="" revision="6511a023ef60bf53a289e13257e8053e939604c6"
intbitset: fixes pop() function

- Fixes pop() function.  (addresses #626)
@invenio-developers
Collaborator

Originally by Samuele Kaplun samuele.kaplun@cern.ch on 2012-08-09

In 6ad9956:

#CommitTicketReference repository="" revision="6ad995694e33b9d5d83fe5b09bf988f85050e2b3"
intbitset: pop last element

- When pop is called on a list, the last element is returned. When
  it is called on a set, an arbitrary element is returned. So far
  intbitset was returning the smallest element. In order to expose
  a behaviour more similar to the expectation of an ordered list,
  intbitset now returns the largest element.
  (closes #626)
@invenio-developers
Collaborator

Originally by Nikola Yolov nikola.yolov@cern.ch on 2012-08-09

In 6511a02:

#CommitTicketReference repository="" revision="6511a023ef60bf53a289e13257e8053e939604c6"
intbitset: fixes pop() function

- Fixes pop() function.  (addresses #626)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.