In [1]:
from medmodels import MedRecord
medrecord = MedRecord().from_example_dataset()


In [2]:
medrecord

-----------------------------------------------
Nodes Group Count Attribute   Info             
-----------------------------------------------
diagnosis   25    description 25 unique values 
drug        19    description 19 unique values 
patient     5     age         min: 19          
                              max: 96          
                              mean: 43.20      
                  gender      Categories: F, M 
procedure   24    description 24 unique values 
-----------------------------------------------

-------------------------------------------------------------
Edges Group       Count Attribute        Info                
-------------------------------------------------------------
patient_diagnosis 60    diagnosis_time   1962-10-21 00:00:00 
                                         2024-04-12 00:00:00 
                        duration_days    min: 0.00           
                                         max: 3416.00        
                                     

## 1. Node Operands

In [3]:
from medmodels.medrecord.querying import NodeOperand

def query_node_basic(node: NodeOperand):
  node.in_group("patient")

medrecord.select_nodes(query_node_basic)

['pat_4', 'pat_2', 'pat_3', 'pat_1', 'pat_5']

You can get to the same result via different approaches. That makes the query engine very versatile and adaptative to your specific needs. For instance, this produces the same result as `medrecord.nodes_in_group("patient")`. Let's complicate it a bit more involving more than one operand.

In [56]:
def query_node_intermmediate(node: NodeOperand):
  node.in_group("patient")
  node.index().contains("pat")
  
  node.has_attribute("age")
  node.attribute("age").less_than(30)

medrecord.select_nodes(query_node_intermmediate)


['pat_4', 'pat_2']

### Note:
The `has attribute()` function is not needed in this example, since the `attribute()` one already checks whether the nodes have the attribute. It is there merely for educational purposes.

In case, for instance, that you do not know whether there are different ways to assign the `gender` attribute across the MedRecord (with leading/trailing whitespaces or formatted in lower/uppercase), you can also modify the attribues of a node/edge. You can also perform mathematical calculations like `mean()`, `median()` or `min()` and assign them to a variable. Also, you can keep manipulating the operand, like in the following example, where we are adding 5 years to the `"mean_age"` to query on that value.

In [43]:
def query_node_advanced(node: NodeOperand):
  node.in_group("patient")
  node.index().contains("pat")

  gender = node.attribute("gender")
  gender.lowercase()  # Converts the string to lowercase
  gender.trim()  # Removes leading and trailing whitespaces
  gender.equal_to("m")

  node.has_attribute("age")
  mean_age = node.attribute("age").mean()
  mean_age.subtract(5)  # Subtract 5 from the mean age
  node.attribute("age").less_than(mean_age)

medrecord.select_nodes(query_node_advanced)


['pat_4', 'pat_1', 'pat_5']

### Important note

Query methods used for changing the operands cannot be concatenated or assigned to variables, since their `Return` is null. That is, the following code snippet will set `gender_lowercase` as `None`, and thus the query will not be able to find any nodes.
```python
# Wrong implementation
gender_lowercase = node.attribute("gender").lowercase()
gender.equal_to("m")

# Wrong implementation
gender = node.attribute("gender")
gender.lowercase().trim()
gender.equal_to("m")

# Correct implementation
gender = node.attribute("gender")
gender.lowercase()
gender.trim()
gender.equal_to("m")
```

Nor do the ones that compare operands to other operands, since their `Return` value is also null.
```python
# Wrong implementation
gender = node.attribute("gender")
gender.equal_to("M").not_equal_to("F")

# Correct implementation
gender = node.attribute("gender")
gender.equal_to("M")
gender.not_equal_to("F")
```

As you can see, the query engine is highly useful for finding nodes that fulfill different criteria in a highly optimized way, using descriptive programming as a precise tool. We can use previously defined queries too, and also use the `neighbors()` function to query also through the nodes that are neighbors to those nodes.

For instance, in the following example we are selecting the nodes that have the following characteristics
- Are in group `"patient"`.
- Their node index contains the string `"pat"`.
- Their attribute `"age"` is greater than 30, and their attribute `"gender"` is equal to "M".
- They are connected to other nodes which attribute `"description"`contains the words "Chronic pain".
- They are also connected to other nodes which attribute `"description"`contains the word "Fentanyl".

In [50]:
def query_node_neighbors(node: NodeOperand):
  query_node_intermmediate(node)

  description_neighbors = node.neighbors().attribute("description")
  description_neighbors.lowercase()
  description_neighbors.contains("fentanyl")

medrecord.select_nodes(query_node_neighbors)

['pat_5']

## 2. Edge Operands

In [51]:
from medmodels.medrecord.querying import EdgeOperand

def query_edge_basic(edge: EdgeOperand):
  edge.in_group("patient_drug")

edges = medrecord.select_edges(query_edge_basic)
edges[0:5]

[100, 104, 102, 81, 99]

The edge operand follows the same principles as the node operand, with some extra queries applicable only to edges like `source_node()` or `target_node()` (instead of the `neighbors()` one).

In [52]:
from medmodels.medrecord.querying import EdgeOperand

def query_edge_intermmediate(edge: EdgeOperand):
    edge.in_group("patient_drug")
    edge.attribute("cost").less_than(200)

    edge.source_node().attribute("age").is_max()
    edge.target_node().attribute("description").contains("insulin")

medrecord.select_edges(query_edge_intermmediate)

[76]

## 3. Combining Node \& Edge Queries

But the full power of the query engine appears once you combine both operands inside the queries. In the following query, we are able to query for nodes that:
- Are in group `"patient"`
- Their attribute `"age"` is greater than 30, and their attribute `"gender"` is equal to "M".
- They have at least an edge that is in in the `"patient_drug"` group, which attribute `"cost"` is less than 200 and its attribute `"quantity"` is equal to 1.

In [9]:
from medmodels.medrecord.querying import NodeOperand, EdgeOperand

def query_edge_combined(edge: EdgeOperand):
  edge.in_group("patient_drug")
  edge.attribute("cost").less_than(200)
  edge.attribute("quantity").equal_to(1)

def query_node_combined(node: NodeOperand):
  node.in_group("patient")
  node.attribute("age").is_int()
  node.attribute("age").greater_than(30)
  node.attribute("gender").equal_to("M")

  query_edge_combined(node.edges())
    
medrecord.select_nodes(query_node_combined)

['pat_5']

## 4. OR & NOT operations

The inherent structure of the query engine works with logical AND operations. However, a complete query engine should implement OR & NOT operations to be able to address all scenarios. For that the functions `exclude()` and `either_or()`.

In [10]:
from medmodels.medrecord.querying import NodeOperand, EdgeOperand

def query_edge_either(edge: EdgeOperand):
    edge.in_group("patient_drug")
    edge.attribute("cost").less_than(200)
    edge.attribute("quantity").equal_to(1)

def query_edge_or(edge: EdgeOperand):
    edge.in_group("patient_drug")
    edge.attribute("cost").less_than(200)
    edge.attribute("quantity").equal_to(12)
    
def query_node_either_or(node: NodeOperand):
    node.in_group("patient")
    node.attribute("age").greater_than(30)

    node.edges().either_or(query_edge_either, query_edge_or)

medrecord.select_nodes(query_node_either_or)

['pat_3', 'pat_5']

This includes also `"pat_3"`, that was not included because none of its edges was included in the `"query_edge_either"`, but it is in the `"query_edge_or"` now.

In [17]:
from medmodels.medrecord.querying import NodeOperand
    
def query_node_exclude(node: NodeOperand):
    node.in_group("patient")
    node.exclude(query_node_either_or)

medrecord.select_nodes(query_node_exclude)

['pat_4', 'pat_2', 'pat_1']

So this gives us all the patient nodes that were not selected with the previous query (logical NOT applied).

## 5. Clones

Since the statements in the query engine are additive, we cannot go back to a previous state of the query unless we want to rewrite the whole query again for an intermemediate step. For that reason, `"clone"` function was devised.

In [58]:
def query_node_clone(node: NodeOperand):
  node.in_group("patient")
  node.index().contains("pat")
  
  mean_age_original = node.attribute("age").mean()
  mean_age_clone = mean_age_original.clone()
  mean_age_clone.subtract(5)

  node.attribute("age").greater_than(mean_age_clone)
  node.attribute("age").less_than(mean_age_original)

medrecord.select_nodes(query_node_clone)

['pat_1']

This way, we keep the `mean_age_original` untouched, while we can manipulate the `mean_age_clone` and then use both values to find the node within that age range.