Skip to content

Nested Attributes Query and Checker#66

Merged
piehld merged 23 commits intostagingfrom
dev-kp-nested
Jul 8, 2025
Merged

Nested Attributes Query and Checker#66
piehld merged 23 commits intostagingfrom
dev-kp-nested

Conversation

@krish69212
Copy link
Copy Markdown
Member

@krish69212 krish69212 commented Jun 9, 2025

Creation of a nested attribute schema which is a dictionary with a tuple as a key that holds adjacent nested attributes and values as true.

Creation of nested attribute class which checks if two values are nested and groups them together.

Creation of nested attribute checker which checks if there are incorrectly nested attributes.

Switched from request.get to request.post in the search query.

@krish69212 krish69212 requested a review from piehld June 9, 2025 20:31
@krish69212 krish69212 requested a review from brindakv June 12, 2025 15:16
Copy link
Copy Markdown
Collaborator

@piehld piehld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @krish69212! This is great work. I'm still in the process of reviewing everything, but I figured I would leave my "half review" now with some changes that I see at this point.

Comment thread docs/search_api/query_construction.md Outdated
list(query())
```

Some attributes in the RCSB schema are part of a nested indexing context, meaning they must be queried together to ensure correct matching behavior. These include attributes like rcsb_binding_affinity.type and rcsb_binding_affinity.value, which are associated with the same underlying object (e.g., an EC50 measurement).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whenever you reference an attribute or code element, you should wrap them in single tick marks so they appear like this. Makes things look super polished 😛 .

For a full list of markdown, checkout: https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax

Great things to know for any future work and projects you work on, so definitely worth the time to learn.

Comment thread docs/search_api/query_construction.md Outdated
Comment thread tests/test_search_query.py Outdated
Comment thread tests/test_search_query.py
Copy link
Copy Markdown
Collaborator

@piehld piehld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work and bug fixes, @krish69212! I've left a handful of other small comments and suggestions.

While you address those, I'm going to try testing out the package locally a bit and ensure we didn't miss any cracks 😄 .

Comment thread rcsbapi/search/search_query.py Outdated
if self.operator == "and":
if isinstance(other, Group):
# If keep_nested set to True, don't combine groups
# # If keep_nested set to True, don't combine groups
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# # If keep_nested set to True, don't combine groups
# If keep_nested set to True, don't combine groups

Comment thread docs/search_api/query_construction.md Outdated
Comment thread rcsbapi/search/search_query.py
Comment thread rcsbapi/search/search_query.py Outdated
"""URL to view this query on the RCSB PDB website query builder"""

# --- Warning ---
logging.warning("Warning: For complex queries, Advanced Search links are incompatible and may not work. Please use .get_editor_link to access the Search API Query Editor")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logging.warning("Warning: For complex queries, Advanced Search links are incompatible and may not work. Please use .get_editor_link to access the Search API Query Editor")
logging.warning("Warning: For complex queries, the Advanced Search builder page may not be compatible and so links may not render correctly. Please use the `.get_editor_link()` method to access the Search API Query Editor instead.")

Comment thread docs/search_api/query_construction.md Outdated
Comment thread rcsbapi/search/search_query.py Outdated
Comment on lines +1109 to +1110
Args:
AttributeQuery (AttributeQuery): Two attribute-level queries that must all belong to the same nested context group.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you actually make this two separate lines, to make it more clear/explicit that two AttributeQuery arguments are required?

Comment thread rcsbapi/search/search_query.py Outdated
AttributeQuery (AttributeQuery): Two attribute-level queries that must all belong to the same nested context group.

Raises:
Warning: If queries do not all belong to a valid and consistent nested attribute group.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it raise a warning or an error? Can you please confirm if logger.error actually raises an error and interrupts the process, or if it just logs and error and continues? If the latter, we should raise an error.

Comment thread rcsbapi/search/search_query.py Outdated
EXAMPLE QUERY:
query1 = AttributeQuery(attribute="rcsb_chem_comp_related.resource_name", operator="exact_match", value="DrugBank")
query2 = AttributeQuery(attribute="rcsb_chem_comp_related.resource_accession_code", operator="exact_match", value="DB00114")
NestedAttributeQuery(query1,query2)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
NestedAttributeQuery(query1,query2)
nestedQuery = NestedAttributeQuery(query1, query2)

Comment thread docs/search_api/query_construction.md Outdated
Comment on lines +211 to +213
nested = NestedAttributeQuery(q1, q2)

query = nested & q3
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
nested = NestedAttributeQuery(q1, q2)
query = nested & q3
nestedQuery = NestedAttributeQuery(q1, q2)
query = nestedQuery & q3

Comment thread tests/test_search_schema.py Outdated
expected_tuple = ('rcsb_uniprot_annotation.name', 'rcsb_uniprot_annotation.type')
self.assertIn(expected_tuple, SEARCH_SCHEMA.nested_attribute_schema)

not_expected_tuple = ('rcsb_uniprot_annotation.name.asdlkfjaskdjfaskldjf', 'rcsb_uniprot_annotation.type')
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the creativity lol, but can you use a pair of real attributes (but which do not have nested context) for this check?

@piehld
Copy link
Copy Markdown
Collaborator

piehld commented Jun 19, 2025

Hi @krish69212, so I did some more testing, and overall things seem to be working as expected. However, I did notice a somewhat peculiar behavior when using parentheses which I'm curious about...

For example, the below two query constructors should produce the same query JSON; but the second one ends up looking a bit different:

from rcsbapi.search import AttributeQuery
from rcsbapi.search.search_query import NestedAttributeQuery

q1n = NestedAttributeQuery(AttributeQuery("rcsb_binding_affinity.type", "exact_match", "EC50"), AttributeQuery("rcsb_binding_affinity.value", "equals", 2.0))
q2 = AttributeQuery("rcsb_entry_info.selected_polymer_entity_types", "exists")
q3 = AttributeQuery("rcsb_nonpolymer_entity_container_identifiers.nonpolymer_comp_id", "exists")
q4a = AttributeQuery("rcsb_entry_info.structure_determination_methodology", "exact_match", "experimental")
q5 = AttributeQuery("rcsb_entry_info.deposited_polymer_monomer_count", "greater", 1000)

# Quer: q2 & q3 & q4a & q5 & q1n
{
    'type': 'group',
    'logical_operator': 'and',
    'nodes': [
        {'type': 'terminal', 'service': 'text', 'parameters': {'attribute': 'rcsb_entry_info.selected_polymer_entity_types', 'operator': 'exists', 'negation': False}, 'node_id': 0},
        {'type': 'terminal', 'service': 'text', 'parameters': {'attribute': 'rcsb_nonpolymer_entity_container_identifiers.nonpolymer_comp_id', 'operator': 'exists', 'negation': False}, 'node_id': 0},
        {'type': 'terminal', 'service': 'text', 'parameters': {'attribute': 'rcsb_entry_info.structure_determination_methodology', 'operator': 'exact_match', 'negation': False, 'value': 'experimental'}, 'node_id': 0},
        {'type': 'terminal', 'service': 'text', 'parameters': {'attribute': 'rcsb_entry_info.deposited_polymer_monomer_count', 'operator': 'greater', 'negation': False, 'value': 1000}, 'node_id': 0},
        {
            'type': 'group',
            'logical_operator': 'and',
            'nodes': [
                {'type': 'terminal', 'service': 'text', 'parameters': {'attribute': 'rcsb_binding_affinity.type', 'operator': 'exact_match', 'negation': False, 'value': 'EC50'}, 'node_id': 0},
                {'type': 'terminal', 'service': 'text', 'parameters': {'attribute': 'rcsb_binding_affinity.value', 'operator': 'equals', 'negation': False, 'value': 2.0}, 'node_id': 0}
            ]
        }
    ]
}


# Query: (q2 & q3) & (q4a & (q5 & q1n))
{
    'type': 'group',
    'logical_operator': 'and',
    'nodes': [
        {'type': 'terminal', 'service': 'text', 'parameters': {'attribute': 'rcsb_entry_info.selected_polymer_entity_types', 'operator': 'exists', 'negation': False}, 'node_id': 0},
        {'type': 'terminal', 'service': 'text', 'parameters': {'attribute': 'rcsb_nonpolymer_entity_container_identifiers.nonpolymer_comp_id', 'operator': 'exists', 'negation': False}, 'node_id': 0},
        {'type': 'terminal', 'service': 'text', 'parameters': {'attribute': 'rcsb_entry_info.structure_determination_methodology', 'operator': 'exact_match', 'negation': False, 'value': 'experimental'}, 'node_id': 0},
        {
            'type': 'group',
            'logical_operator': 'and',
            'nodes': [
                {'type': 'terminal', 'service': 'text', 'parameters': {'attribute': 'rcsb_entry_info.deposited_polymer_monomer_count', 'operator': 'greater', 'negation': False, 'value': 1000}, 'node_id': 0},
                {
                    'type': 'group',
                    'logical_operator': 'and',
                    'nodes': [
                        {'type': 'terminal', 'service': 'text', 'parameters': {'attribute': 'rcsb_binding_affinity.type', 'operator': 'exact_match', 'negation': False, 'value': 'EC50'}, 'node_id': 0},
                        {'type': 'terminal', 'service': 'text', 'parameters': {'attribute': 'rcsb_binding_affinity.value', 'operator': 'equals', 'negation': False, 'value': 2.0}, 'node_id': 0}
                    ]
                }
            ]
        }
    ]
}

Technically, both of those should produce the same JSON query structure. In fact, even the query (q2 & q3) & q4a & (q5 & q1n) produces the same JSON as q2 & q3 & q4a & q5 & q1n; so, it's peculiar to me why (q2 & q3) & (q4a & (q5 & q1n)) does not.

Although the above still lead to the same result set, I think it would be good to understand why they are differing, in case some other use case arises in which this behavior does impact the set of results. Would you be able to look into what's going on here?

@piehld
Copy link
Copy Markdown
Collaborator

piehld commented Jun 19, 2025

@krish69212 Oh, also—one other very small thing—would you be able to add NestedAttributeQuery to the search __init__.py file, similar to how AttributeQuery is added there?

Doing so would allow you to simplify your import of NestedAttributeQuery, i.e.:

# OLD: 
from rcsbapi.search.search_query import NestedAttributeQuery
# NEW:
from rcsbapi.search import NestedAttributeQuery

@piehld piehld changed the base branch from master to staging July 8, 2025 10:02
Copy link
Copy Markdown
Collaborator

@piehld piehld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @krish69212! I'm going to merge this now.

@piehld piehld merged commit 30a1053 into staging Jul 8, 2025
2 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants