This package provides a function-based API for filtering collections of heterogeneous, nested dictionaries or complex objects. It has 100% test coverage.
At the core of the API is the q_filter
function, which is like
the built-in filter
function, but take any number of predicate functions
rather than just one.
The remainder of the functions in this package are used to construct predicates that evaluate items or attributes within filtered objects.
Inspired by the more class-based QueryableList.
This package is best suited to nested, heterogeneous data that one might find in a serialised HTTP response body.
pip install query-filter
In the next few examples, we'll be filtering a typical response from boto3
,
the python client for Amazon Web Services.
If we want to get data that have AssociatePublicIpAddress
set to True
, we can do the following:
>>> from query_filter import q_filter, q
>>> results = q_filter(
versions_data["LaunchTemplateVersions"],
q["LaunchTemplateData"]["NetworkInterfaces"][0]["AssociatePublicIpAddress"]
)
>>> results
<filter at 0x7f3515cba240>
>>> list(results)
[{'CreateTime': datetime.datetime(2017, 11, 20, 12, 52, 33),
'DefaultVersion': True,
'LaunchTemplateData': {'ImageId': 'ami-aabbcc11',
'KeyName': 'kp-us-east',
'NetworkInterfaces': [{'AssociatePublicIpAddress': True,
'DeleteOnTermination': False,
'DeviceIndex': 0,
'Groups': ['sg-7c227019'],
'SubnetId': 'subnet-7b16de0c',
'PrivateIpAddress': '80.141.44.12'}],
'UserData': ''},
'CreditSpecification': {'CpuCredits': 'standard'},
'CpuOptions': {'CoreCount': 1, 'ThreadsPerCore': 2},
'LaunchTemplateId': 'lt-068f72b72934aff71',
'VersionNumber': 1}]
The filter above doesn't use == True
but rather checks
the truthiness of the "AssociatePublicIpAddress"
key's value.
The equivalent generator expression for a simple query likes this is less readable.
>>> from typing import Collection
>>> results = (
version for version in versions_data["LaunchTemplateVersions"]
if version.get("LaunchTemplateData", {}).get("NetworkInterfaces") and
isinstance(version["LaunchTemplateData"]["NetworkInterfaces"], Collection) and
version["LaunchTemplateData"]["NetworkInterfaces"][0].get("AssociatePublicIpAddress")
)
This example is excessively defensive, but hopefully it explains the motivation behind this tool.
A get
call is needed in the generator expression above because the item
"AssociatePublicIpAddress"
is sometimes missing.
The first two conditions aren't strictly needed to filter the example data.
However, they do illustrate the fact that q_item
predicates silently
return False
if "LaunchTemplateData"
is not present, or
if "NetworkInterfaces"
is missing, is not a collection
or is an empty collection.
We can combine custom queries with those created with the help of this package. The following predicate can be used to ensure that the launch template versions specify a sufficient number of threads.
def threads_gte(min_threads: int):
def pred(version: dict):
cores = version["CpuOptions"]["CoreCount"]
threads = version["CpuOptions"]["ThreadsPerCore"]
return cores * threads >= min_threads
return pred
Here we're using q_any
, which combines the predicates passed into it,
returning True
if at least one of them is satisfied.
>>> from query_filter import q, q_any, q_filter
>>> results = q_filter(
versions_data["LaunchTemplateVersions"],
q_any(
threads_gte(5),
q["CreditSpecification"]["CpuCredits"] == "unlimited"
)
)
>>> list(results)
[{'CreateTime': datetime.datetime(2017, 11, 20, 15, 45, 33),
'DefaultVersion': False,
'LaunchTemplateData': {'ImageId': 'ami-cc3e8abf',
'KeyName': 'kp-us-east',
'NetworkInterfaces': [{'DeviceIndex': 0,
'Groups': ['sg-7c227019'],
'SubnetId': 'subnet-a4579fe6',
'Ipv6Addresses': [{'Ipv6Address': '4f08:ea60:17f9:3e89:4d66:2e8c:259c:d1a9'},
{'Ipv6Address': 'b635:26ad:8fdf:a274:88dc:cf8c:47df:26b7'},
{'Ipv6Address': 'eb7a:5a31:f899:dd8c:e566:3307:a45e:dcf6'}],
'Ipv6AddressCount': 3,
'PrivateIpAddress': '80.141.152.14'}]},
'CpuOptions': {'CoreCount': 4, 'ThreadsPerCore': 1},
'CreditSpecification': {'CpuCredits': 'unlimited'},
'LaunchTemplateId': 'lt-aaa68831cce2a8d91',
'VersionNumber': 4},
{'CreateTime': datetime.datetime(2017, 11, 20, 19, 4, 54),
'DefaultVersion': False,
'LaunchTemplateData': {'ImageId': 'ami-2f7ac02a',
'KeyName': 'kp-us-east',
'NetworkInterfaces': [{'DeviceIndex': 0,
'Groups': ['sg-1c628b25'],
'SubnetId': 'subnet-a4579fe6',
'Ipv6Addresses': [{'Ipv6Address': 'f486:915c:2be9:b0da:7d60:3fae:d65a:e8d8'},
{'Ipv6Address': 'eb7a:5a31:f899:dd8c:e566:3307:a45e:dcf6'}],
'Ipv6AddressCount': 2,
'PrivateIpAddress': '80.141.152.136'}]},
'CpuOptions': {'CoreCount': 3, 'ThreadsPerCore': 2},
'CreditSpecification': {'CpuCredits': 'standard'},
'LaunchTemplateId': 'lt-aaa68831cce2a8d91',
'VersionNumber': 5}]
This can be useful if you're working with objects that have a lot of "has-a" relationships to other objects. For brevity, a hacky binary tree-like class is used to build a fictional ancestor chart.
>>> class Node:
instances = []
def __init__(self, name, mother=None, father=None):
self.name = name
self.mother = mother
self.father = father
self.instances.append(self)
def __repr__(self):
return (f"Node('{self.name}', mother={repr(self.mother)}, "
f"father={repr(self.father)})")
>>> Node(name='Tiya Meadows',
mother=Node('Isobel Meadows (nee Walsh)',
mother=Node(name='Laura Walsh (nee Stanton)',
mother=Node('Opal Eastwood (nee Plant)'),
father=Node('Alan Eastwood')),
father=Node(name='Jimmy Walsh')),
father=Node(name='Isaac Meadows',
mother=Node('Halle Meadows (nee Perkins)'),
father=Node('Wilbur Meadows')))
To demonstrate the syntax, we can filter for the root node by their great-great-grandmother.
>>> from query_filter import q, q_contains, q_filter
>>> results = q_filter(
Node.instances,
q_contains(q.mother.mother.mother.name, "Opal Eastwood")
)
>>> list(results)
[Node('Tiya Meadows', mother=Node('Isobel Meadows (nee Walsh)', mother=Node('Laura Walsh (nee Stanton)', mother=Node('Opal Eastwood (nee Plant)', mother=None, father=None), father=Node('Alan Eastwood', mother=None, father=None)), father=Node('Jimmy Walsh', mother=None, father=None)), father=Node('Isaac Meadows', mother=Node('Halle Meadows (nee Perkins)', mother=None, father=None), father=Node('Wilbur Meadows', mother=None, father=None)))]
q_contains
above is the equivalent of the expression:
"Opal Eastwood" in Node.instances.mother.mother.mother.name
.
It is one of several functions that enable us to create queries
based on operators that cannot be overloaded in the same way
as the comparison operators.
Here is another example:
>>> from query_filter import q, q_is_not, q_matches_regex, q_filter
>>> results = q_filter(Node.instances,
q_matches_regex(q.name, r"Walsh(?! \(nee)"),
q_is_not(q.father, None))
>>> list(results)
[Node('Isobel Meadows (nee Walsh)', mother=Node('Laura Walsh (nee Stanton)', mother=Node('Opal Eastwood (nee Plant)', mother=None, father=None), father=Node('Alan Eastwood', mother=None, father=None)), father=Node('Jimmy Walsh', mother=None, father=None))]
query_filter.q_filter
This is an alias for query_filter.q_filter_all
.
query_filter.q_filter_all(objects: Iterable, *preds) -> Iterable[Any]
Returns a filter
iterator containing objects for which all of the predicates in preds
are true.
query_filter.q_filter_any(objects: Iterable, *preds) -> Iterable[Any]
Returns a filter
iterator containing objects for which any of the predicates in preds
are true.
query_filter.q_filter_not_any(objects: Iterable, *preds) -> Iterable[Any]
Returns a filter
iterator containing objects for which none of the predicates in preds
is true.
query_filter.q_all(*preds: Callable) -> Callable
Returns a predicate that returns True
if all predicates
in preds
return True
.
query_filter.q_any(*preds: Callable) -> Callable
Returns a predicate that returns True
if any predicates
in preds
return True
.
query_filter.q_not(pred: Callable) -> Callable
Returns a predicate that returns True
if the predicate pred
returns False
.
The Query
class, an instance of which is always imported as q
is used to specify attribute and item access.
It provides a way of specifying lookups on objects. For instance, this would could be used to filter for orders created in May:
>>> results = q_filter(orders, q.metadata['date_created'].month == 5)
The class supports some operators which offer the most convenient API for building queries.
The Query
class supports all six comparison operators:
<
, <=
, ==
, !=
, >
and >=
.
The bitwise not operator ~
negates the truthiness of the Query
object.
For example q.is_active
will produce a predicate that returns True
if
an object has an attributes named is_active
and that attribute's value
is truthy.
~q.is_active
will produce the opposite result.
There are some useful operators such as is
that cannot be overloaded.
Most of the functions below replace these.
query_filter.q_is_in(query: Query, container: Container) -> Callable[[Any], bool]
Returns a predicate that's true if the queried object is in the container
argument.
query_filter.q_contains(query: Query, member: Any) -> Callable[[Container], bool]
Returns a predicate that's true if the queried object contains the member
argument.
query_filter.q_is(query: Query, criterion: Any) -> Callable[[Any], bool]
Returns a predicate that's true if the queried object is identical to the criterion object.
query_filter.q_is_not(query, criterion: Any) -> Callable[[Any], bool]
Returns a predicate that's true if the queried object is not identical to the criterion object.
query_filter.q_matches_regex(query: Query, pattern: str | bytes) -> [[str | bytes], bool]
This function may be convenient when working with strings and byte strings.
It returns a predicate that's true if the queried object matches the regular expression
pattern
argument.
If you want to run tests, you'll first need to install the package from source and make it editable. Ensuring that you're in the root directory of this repo, enter:
pip install -e .
pip install -r requirements/development.txt
pytest
To run tests with coverage:
coverage run --source "query_filter" -m pytest tests
coverage report
- Query all items in an iterable rather than just one using
...
- Build queries out of
Query
objects using the&
and|
operators - Make silent failure when retrieving attributes and items optional