Conversation
Add dataclass field method to const
Update code metadata and FAIR badges
update badges
add zenodo doi badge
v1.1.3 - Fix: Update regex pattern for instances in `const.py`
piehld
left a comment
There was a problem hiding this comment.
Thank you @krish69212! This is great work. I'm still in the process of reviewing everything, but I figured I would leave my "half review" now with some changes that I see at this point.
| list(query()) | ||
| ``` | ||
|
|
||
| Some attributes in the RCSB schema are part of a nested indexing context, meaning they must be queried together to ensure correct matching behavior. These include attributes like rcsb_binding_affinity.type and rcsb_binding_affinity.value, which are associated with the same underlying object (e.g., an EC50 measurement). |
There was a problem hiding this comment.
Whenever you reference an attribute or code element, you should wrap them in single tick marks so they appear like this. Makes things look super polished 😛 .
For a full list of markdown, checkout: https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax
Great things to know for any future work and projects you work on, so definitely worth the time to learn.
piehld
left a comment
There was a problem hiding this comment.
Excellent work and bug fixes, @krish69212! I've left a handful of other small comments and suggestions.
While you address those, I'm going to try testing out the package locally a bit and ensure we didn't miss any cracks 😄 .
| if self.operator == "and": | ||
| if isinstance(other, Group): | ||
| # If keep_nested set to True, don't combine groups | ||
| # # If keep_nested set to True, don't combine groups |
There was a problem hiding this comment.
| # # If keep_nested set to True, don't combine groups | |
| # If keep_nested set to True, don't combine groups |
| """URL to view this query on the RCSB PDB website query builder""" | ||
|
|
||
| # --- Warning --- | ||
| logging.warning("Warning: For complex queries, Advanced Search links are incompatible and may not work. Please use .get_editor_link to access the Search API Query Editor") |
There was a problem hiding this comment.
| logging.warning("Warning: For complex queries, Advanced Search links are incompatible and may not work. Please use .get_editor_link to access the Search API Query Editor") | |
| logging.warning("Warning: For complex queries, the Advanced Search builder page may not be compatible and so links may not render correctly. Please use the `.get_editor_link()` method to access the Search API Query Editor instead.") |
| Args: | ||
| AttributeQuery (AttributeQuery): Two attribute-level queries that must all belong to the same nested context group. |
There was a problem hiding this comment.
Can you actually make this two separate lines, to make it more clear/explicit that two AttributeQuery arguments are required?
| AttributeQuery (AttributeQuery): Two attribute-level queries that must all belong to the same nested context group. | ||
|
|
||
| Raises: | ||
| Warning: If queries do not all belong to a valid and consistent nested attribute group. |
There was a problem hiding this comment.
Does it raise a warning or an error? Can you please confirm if logger.error actually raises an error and interrupts the process, or if it just logs and error and continues? If the latter, we should raise an error.
| EXAMPLE QUERY: | ||
| query1 = AttributeQuery(attribute="rcsb_chem_comp_related.resource_name", operator="exact_match", value="DrugBank") | ||
| query2 = AttributeQuery(attribute="rcsb_chem_comp_related.resource_accession_code", operator="exact_match", value="DB00114") | ||
| NestedAttributeQuery(query1,query2) |
There was a problem hiding this comment.
| NestedAttributeQuery(query1,query2) | |
| nestedQuery = NestedAttributeQuery(query1, query2) |
| nested = NestedAttributeQuery(q1, q2) | ||
|
|
||
| query = nested & q3 |
There was a problem hiding this comment.
| nested = NestedAttributeQuery(q1, q2) | |
| query = nested & q3 | |
| nestedQuery = NestedAttributeQuery(q1, q2) | |
| query = nestedQuery & q3 |
| expected_tuple = ('rcsb_uniprot_annotation.name', 'rcsb_uniprot_annotation.type') | ||
| self.assertIn(expected_tuple, SEARCH_SCHEMA.nested_attribute_schema) | ||
|
|
||
| not_expected_tuple = ('rcsb_uniprot_annotation.name.asdlkfjaskdjfaskldjf', 'rcsb_uniprot_annotation.type') |
There was a problem hiding this comment.
I appreciate the creativity lol, but can you use a pair of real attributes (but which do not have nested context) for this check?
|
Hi @krish69212, so I did some more testing, and overall things seem to be working as expected. However, I did notice a somewhat peculiar behavior when using parentheses which I'm curious about... For example, the below two query constructors should produce the same query JSON; but the second one ends up looking a bit different: from rcsbapi.search import AttributeQuery
from rcsbapi.search.search_query import NestedAttributeQuery
q1n = NestedAttributeQuery(AttributeQuery("rcsb_binding_affinity.type", "exact_match", "EC50"), AttributeQuery("rcsb_binding_affinity.value", "equals", 2.0))
q2 = AttributeQuery("rcsb_entry_info.selected_polymer_entity_types", "exists")
q3 = AttributeQuery("rcsb_nonpolymer_entity_container_identifiers.nonpolymer_comp_id", "exists")
q4a = AttributeQuery("rcsb_entry_info.structure_determination_methodology", "exact_match", "experimental")
q5 = AttributeQuery("rcsb_entry_info.deposited_polymer_monomer_count", "greater", 1000)
# Quer: q2 & q3 & q4a & q5 & q1n
{
'type': 'group',
'logical_operator': 'and',
'nodes': [
{'type': 'terminal', 'service': 'text', 'parameters': {'attribute': 'rcsb_entry_info.selected_polymer_entity_types', 'operator': 'exists', 'negation': False}, 'node_id': 0},
{'type': 'terminal', 'service': 'text', 'parameters': {'attribute': 'rcsb_nonpolymer_entity_container_identifiers.nonpolymer_comp_id', 'operator': 'exists', 'negation': False}, 'node_id': 0},
{'type': 'terminal', 'service': 'text', 'parameters': {'attribute': 'rcsb_entry_info.structure_determination_methodology', 'operator': 'exact_match', 'negation': False, 'value': 'experimental'}, 'node_id': 0},
{'type': 'terminal', 'service': 'text', 'parameters': {'attribute': 'rcsb_entry_info.deposited_polymer_monomer_count', 'operator': 'greater', 'negation': False, 'value': 1000}, 'node_id': 0},
{
'type': 'group',
'logical_operator': 'and',
'nodes': [
{'type': 'terminal', 'service': 'text', 'parameters': {'attribute': 'rcsb_binding_affinity.type', 'operator': 'exact_match', 'negation': False, 'value': 'EC50'}, 'node_id': 0},
{'type': 'terminal', 'service': 'text', 'parameters': {'attribute': 'rcsb_binding_affinity.value', 'operator': 'equals', 'negation': False, 'value': 2.0}, 'node_id': 0}
]
}
]
}
# Query: (q2 & q3) & (q4a & (q5 & q1n))
{
'type': 'group',
'logical_operator': 'and',
'nodes': [
{'type': 'terminal', 'service': 'text', 'parameters': {'attribute': 'rcsb_entry_info.selected_polymer_entity_types', 'operator': 'exists', 'negation': False}, 'node_id': 0},
{'type': 'terminal', 'service': 'text', 'parameters': {'attribute': 'rcsb_nonpolymer_entity_container_identifiers.nonpolymer_comp_id', 'operator': 'exists', 'negation': False}, 'node_id': 0},
{'type': 'terminal', 'service': 'text', 'parameters': {'attribute': 'rcsb_entry_info.structure_determination_methodology', 'operator': 'exact_match', 'negation': False, 'value': 'experimental'}, 'node_id': 0},
{
'type': 'group',
'logical_operator': 'and',
'nodes': [
{'type': 'terminal', 'service': 'text', 'parameters': {'attribute': 'rcsb_entry_info.deposited_polymer_monomer_count', 'operator': 'greater', 'negation': False, 'value': 1000}, 'node_id': 0},
{
'type': 'group',
'logical_operator': 'and',
'nodes': [
{'type': 'terminal', 'service': 'text', 'parameters': {'attribute': 'rcsb_binding_affinity.type', 'operator': 'exact_match', 'negation': False, 'value': 'EC50'}, 'node_id': 0},
{'type': 'terminal', 'service': 'text', 'parameters': {'attribute': 'rcsb_binding_affinity.value', 'operator': 'equals', 'negation': False, 'value': 2.0}, 'node_id': 0}
]
}
]
}
]
}Technically, both of those should produce the same JSON query structure. In fact, even the query Although the above still lead to the same result set, I think it would be good to understand why they are differing, in case some other use case arises in which this behavior does impact the set of results. Would you be able to look into what's going on here? |
|
@krish69212 Oh, also—one other very small thing—would you be able to add Doing so would allow you to simplify your import of # OLD:
from rcsbapi.search.search_query import NestedAttributeQuery
# NEW:
from rcsbapi.search import NestedAttributeQuery |
piehld
left a comment
There was a problem hiding this comment.
Thanks again @krish69212! I'm going to merge this now.
Creation of a nested attribute schema which is a dictionary with a tuple as a key that holds adjacent nested attributes and values as true.
Creation of nested attribute class which checks if two values are nested and groups them together.
Creation of nested attribute checker which checks if there are incorrectly nested attributes.
Switched from request.get to request.post in the search query.