Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add chunked_filter (#344) #346

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
48 changes: 48 additions & 0 deletions boltons/iterutils.py
Original file line number Diff line number Diff line change
Expand Up @@ -374,6 +374,54 @@ def chunked_iter(src, size, **kw):
return


def chunked_filter(iterable, predicate, size):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, based on the docstring, I think you meant to rename predicate here to key.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure, see previous comments

"""A version of :func:`filter` which will call *key* with a chunk of the *src*.

>>> list(chunked_filter(range(10), lambda chunk: (x % 2==0 for x in chunk),5))
[0, 2, 4, 6, 8]

In the above example the lambda function is called twice: once with values
0-4 and then for 5-9.

rafalkrupinski marked this conversation as resolved.
Show resolved Hide resolved
Args:
iterable (Iterable): Items to filter
predicate (Callable): Bulk predicate function that accepts a list of items
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

predicate -> key to match description above?

and returns an interable of bools
size (int): The maximum size of chunks that will be passed the
predicate function.

The intended use case for this function is with external APIs,
for all kinds of validations. Since APIs always have limitations,
either explicitely for number of passed items, or at least for the request size,
it's required to pass large collections in chunks.
"""

if not is_iterable(iterable):
raise TypeError('expected an iterable')
size = _validate_positive_int(size, 'chunk size')

if not callable(predicate):
raise TypeError('expected callable key')

def predicate_(src_):
allow_iter = predicate(src_)
if not is_iterable(allow_iter):
raise TypeError('expected an iterable from key(src)')

allow_list = list(allow_iter)
if len(allow_list) != len(src_):
raise ValueError('expected the iterable from key(src) has the same length as the passed chunk of items')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"has the same" -> "to have the same"

I think it's great you're thinking about this. For exceptions, these days, I usually recommend the following format to maximize debuggability:

raise ValueError("chunked_filter expected key func to return an iterable of length {src_len}, got {allow_list_len}").

Similar changes could be made to other exception messages above just by adding ", not {actually_received_type}"


return allow_list

return (
item
for chunk in chunked_iter(iterable, size)
for item, allow in zip(chunk, predicate_(chunk))
if allow
)


def chunk_ranges(input_size, chunk_size, input_offset=0, overlap_size=0, align=False):
"""Generates *chunk_size*-sized chunk ranges for an input with length *input_size*.
Optionally, a start of the input can be set via *input_offset*, and
Expand Down
1 change: 1 addition & 0 deletions docs/iterutils.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ present in the standard library.

.. autofunction:: chunked
.. autofunction:: chunked_iter
.. autofunction:: chunked_filter
.. autofunction:: chunk_ranges
.. autofunction:: pairwise
.. autofunction:: pairwise_iter
Expand Down
20 changes: 20 additions & 0 deletions tests/test_iterutils.py
Original file line number Diff line number Diff line change
Expand Up @@ -511,6 +511,26 @@ def test_chunked_bytes():
assert chunked(b'123', 2) in (['12', '3'], [b'12', b'3'])


class TestChunkedFilter(object):
def test_not_iterable(self):
from boltons.iterutils import chunked_filter

with pytest.raises(TypeError):
chunked_filter(7, lambda chunk: (True for x in chunk), 10)

def test_size_zero(self):
from boltons.iterutils import chunked_filter

with pytest.raises(ValueError):
chunked_filter((1, 2, 3), lambda chunk: (True for x in chunk), 0)

def test_not_callable(self):
from boltons.iterutils import chunked_filter

with pytest.raises(TypeError):
chunked_filter((1, 2, 3), 'allow odd numbers', 10)


def test_chunk_ranges():
from boltons.iterutils import chunk_ranges

Expand Down