### Introduction
Since LLM output generated sequentially for conversion of Statutes in LegalRuleML XML format can have overlapping areas, it is useful to gauge the instances where such an output needs to be integrated with the previously generated schema. This notebook demonstrates an approach using similarity scores from TheFuzz library.

### Table of Contents
[Step 1](#Step-1) - Convert all elements and values to a dictionary

[Step 2](#Step-2) - Compare the Elements only using Fuzzy Similarity

[Step 3](#Step-3) - Clean the data and compare all elements and attributes with each other

[Results](#Results) - Some instances which cross a threshold of 0.5

### Conclusion
Using Simple Ratio, it is possible to iterate over the previously generated elements and attributes and compare them at the points at which the LLM's text generation has been similar. It will be beneficial to use similarity scores as a starting point for this comparison so that output can be compared when generated sequentially, and thereafter integrate. However, a further step would involve comparing the XML tree to ensure consistency. 

### Step 1
Convert all elements and values to a dictionary

In [19]:
import xml.etree.ElementTree as ET

def get_elements_and_attributes(xml_file):
    tree = ET.parse(xml_file)
    root = tree.getroot()

    elements_and_attributes = {}

    # Function to recursively traverse the XML tree
    def traverse(element):
        nonlocal elements_and_attributes

        # Adding the element itself
        attributes = element.attrib

        elements_and_attributes[element.tag] = list(element.attrib.keys())
        for attr in attributes:
            attributes_with_values = (attr, attributes.get(attr, 0))
            
            elements_and_attributes[element.tag] = attributes_with_values

            
        
        # Recursively traversing child elements
        for child in element:
            traverse(child)

    traverse(root)

    return elements_and_attributes

# Example usage
xml_file = 'Test-XML1.xml'  # Replace with the path to your XML file
elements_and_attributes = get_elements_and_attributes(xml_file)
print (elements_and_attributes)
for element, attributes in elements_and_attributes.items():
    print(f"Element: {element}")
    print(f"Attributes: {attributes}")


{'{http://www.oasis-open.org/committees/legalruleml}LegalRuleML': [], '{http://www.oasis-open.org/committees/legalruleml}PrescriptiveStatement': ('id', 'AccidentReporting'), '{http://www.oasis-open.org/committees/legalruleml}Rule': ('id', 'AccidentReportingRule'), '{http://www.oasis-open.org/committees/legalruleml}if': [], '{http://www.oasis-open.org/committees/legalruleml}Fact': [], '{http://www.oasis-open.org/committees/legalruleml}Rel': ('iri', '#majorAccident'), '{http://www.oasis-open.org/committees/legalruleml}then': [], '{http://www.oasis-open.org/committees/legalruleml}Obligation': [], '{http://www.oasis-open.org/committees/legalruleml}Action': [], '{http://www.oasis-open.org/committees/legalruleml}Intimation': [], '{http://www.oasis-open.org/committees/legalruleml}To': ('iri', '#prescribedAuthority'), '{http://www.oasis-open.org/committees/legalruleml}Within': ('{http://www.oasis-open.org/committees/legalruleml}TimeUnit', 'days'), '{http://www.oasis-open.org/committees/legalru

In [24]:
xml_file_1 = "Test-XML1.xml"
xml_file_2 = "Test-XML2.xml"

dict1 = get_elements_and_attributes(xml_file_1)
dict2 = get_elements_and_attributes(xml_file_2)

### Step 2

Compare the Elements only using Simple Ratio

In [22]:
from thefuzz import fuzz
def fuzzy_similarity(s1, s2):
    """
    Compute the fuzzy string similarity using TheFuzz library.
    """
    return fuzz.ratio(s1, s2) / 100.0  # Convert to a float [0, 1]

In [23]:
def compare_dicts(dict1, dict2):
    """
    Compare elements from two dictionaries and compute fuzzy similarity.
    """
    for key1, value1 in dict1.items():
        for key2, value2 in dict2.items():
            similarity = fuzzy_similarity(key1, key2)
            print(f"Similarity between '{key1}' and '{key2}': {similarity}")

In [25]:
dict1

{'{http://www.oasis-open.org/committees/legalruleml}LegalRuleML': [],
 '{http://www.oasis-open.org/committees/legalruleml}PrescriptiveStatement': ('id',
  'AccidentReporting'),
 '{http://www.oasis-open.org/committees/legalruleml}Rule': ('id',
  'AccidentReportingRule'),
 '{http://www.oasis-open.org/committees/legalruleml}if': [],
 '{http://www.oasis-open.org/committees/legalruleml}Fact': [],
 '{http://www.oasis-open.org/committees/legalruleml}Rel': ('iri',
  '#majorAccident'),
 '{http://www.oasis-open.org/committees/legalruleml}then': [],
 '{http://www.oasis-open.org/committees/legalruleml}Obligation': [],
 '{http://www.oasis-open.org/committees/legalruleml}Action': [],
 '{http://www.oasis-open.org/committees/legalruleml}Intimation': [],
 '{http://www.oasis-open.org/committees/legalruleml}To': ('iri',
  '#prescribedAuthority'),
 '{http://www.oasis-open.org/committees/legalruleml}Within': ('{http://www.oasis-open.org/committees/legalruleml}TimeUnit',
  'days'),
 '{http://www.oasis-open.

In [26]:
dict2

{'{http://www.oasis-open.org/committees/legalruleml}LegalRuleML': [],
 '{http://www.oasis-open.org/committees/legalruleml}PrescriptiveStatement': ('id',
  'AccidentReporting'),
 '{http://www.oasis-open.org/committees/legalruleml}Rule': ('id',
  'AccidentReportingRule'),
 '{http://www.oasis-open.org/committees/legalruleml}if': [],
 '{http://www.oasis-open.org/committees/legalruleml}Fact': [],
 '{http://www.oasis-open.org/committees/legalruleml}Rel': ('iri',
  '#majorAccident'),
 '{http://www.oasis-open.org/committees/legalruleml}then': [],
 '{http://www.oasis-open.org/committees/legalruleml}Obligation': [],
 '{http://www.oasis-open.org/committees/legalruleml}Action': [],
 '{http://www.oasis-open.org/committees/legalruleml}IntimateAuthority': ('iri',
  '#prescribedAuthority'),
 '{http://www.oasis-open.org/committees/legalruleml}Timing': [],
 '{http://www.oasis-open.org/committees/legalruleml}TimeSpan': ('start',
  'within24hours'),
 '{http://www.oasis-open.org/committees/legalruleml}Repo

In [27]:
compare_dicts(dict1, dict2)

Similarity between '{http://www.oasis-open.org/committees/legalruleml}LegalRuleML' and '{http://www.oasis-open.org/committees/legalruleml}LegalRuleML': 1.0
Similarity between '{http://www.oasis-open.org/committees/legalruleml}LegalRuleML' and '{http://www.oasis-open.org/committees/legalruleml}PrescriptiveStatement': 0.8
Similarity between '{http://www.oasis-open.org/committees/legalruleml}LegalRuleML' and '{http://www.oasis-open.org/committees/legalruleml}Rule': 0.94
Similarity between '{http://www.oasis-open.org/committees/legalruleml}LegalRuleML' and '{http://www.oasis-open.org/committees/legalruleml}if': 0.88
Similarity between '{http://www.oasis-open.org/committees/legalruleml}LegalRuleML' and '{http://www.oasis-open.org/committees/legalruleml}Fact': 0.89
Similarity between '{http://www.oasis-open.org/committees/legalruleml}LegalRuleML' and '{http://www.oasis-open.org/committees/legalruleml}Rel': 0.91
Similarity between '{http://www.oasis-open.org/committees/legalruleml}LegalRuleML

### Step 3

Clean the data and compare all elements and attributes with each other

In [53]:
def remove_string_from_tuple_value(value, string_to_remove):
    """
    Remove a specified string from all elements in the tuple value.
    """
    return tuple(item.replace(string_to_remove, '') for item in value)

def clean_dictionary(dictionary, string_to_remove):
    """
    Remove a specified string from all keys and values in the dictionary.
    """
    cleaned_dict = {}
    for key, value in dictionary.items():
        cleaned_key = key.replace(string_to_remove, '')
        if isinstance(value, tuple):
            cleaned_value = remove_string_from_tuple_value(value, string_to_remove)
        else:
            cleaned_value = value.replace(string_to_remove, '') if isinstance(value, str) else value
        cleaned_dict[cleaned_key] = cleaned_value
    return cleaned_dict
    
dict1_cleaned = clean_dictionary(dict1, "{http://www.oasis-open.org/committees/legalruleml}")
dict2_cleaned = remove_string(dict2, "{http://www.oasis-open.org/committees/legalruleml}")

In [54]:
dict1_cleaned


{'LegalRuleML': [],
 'PrescriptiveStatement': ('id', 'AccidentReporting'),
 'Rule': ('id', 'AccidentReportingRule'),
 'if': [],
 'Fact': [],
 'Rel': ('iri', '#majorAccident'),
 'then': [],
 'Obligation': [],
 'Action': [],
 'Intimation': [],
 'To': ('iri', '#prescribedAuthority'),
 'Within': ('TimeUnit', 'days'),
 'Reporting': [],
 'Form': ('iri', '#FormI')}

In [55]:
compare_dicts(dict1_cleaned, dict2_cleaned)

Similarity between 'LegalRuleML' and 'LegalRuleML': 1.0
Similarity between 'LegalRuleML' and 'PrescriptiveStatement': 0.19
Similarity between 'LegalRuleML' and 'Rule': 0.53
Similarity between 'LegalRuleML' and 'if': 0.0
Similarity between 'LegalRuleML' and 'Fact': 0.13
Similarity between 'LegalRuleML' and 'Rel': 0.29
Similarity between 'LegalRuleML' and 'then': 0.13
Similarity between 'LegalRuleML' and 'Obligation': 0.19
Similarity between 'LegalRuleML' and 'Action': 0.0
Similarity between 'LegalRuleML' and 'IntimateAuthority': 0.14
Similarity between 'LegalRuleML' and 'Timing': 0.12
Similarity between 'LegalRuleML' and 'TimeSpan': 0.21
Similarity between 'LegalRuleML' and 'Reporting': 0.2
Similarity between 'LegalRuleML' and 'Form': 0.0
Similarity between 'LegalRuleML' and 'Iri': 0.0
Similarity between 'PrescriptiveStatement' and 'LegalRuleML': 0.19
Similarity between 'PrescriptiveStatement' and 'PrescriptiveStatement': 1.0
Similarity between 'PrescriptiveStatement' and 'Rule': 0.08
S

In [52]:
dict1_cleaned

{'LegalRuleML': [],
 'PrescriptiveStatement': ('id', 'AccidentReporting'),
 'Rule': ('id', 'AccidentReportingRule'),
 'if': [],
 'Fact': [],
 'Rel': ('iri', '#majorAccident'),
 'then': [],
 'Obligation': [],
 'Action': [],
 'Intimation': [],
 'To': ('iri', '#prescribedAuthority'),
 'Within': ('TimeUnit', 'days'),
 'Reporting': [],
 'Form': ('iri', '#FormI')}

In [62]:
def compare_dicts(dict1, dict2):
    """
    Compare keys and values between two dictionaries and compute fuzzy similarity.
    """
    for key1, value1 in dict1.items():
        for key2, value2 in dict2.items():
            # Compare key1 with key2
            key_similarity = fuzzy_similarity(key1, key2)
            print(f"Similarity between key '{key1}' and key '{key2}': {key_similarity}")
            
            # Compare key1 with value2
            value_similarity_0 = fuzzy_similarity(key1, value2[0]) if isinstance(value2, tuple) else None
            value_similarity_1 = fuzzy_similarity(key1, value2[1]) if isinstance(value2, tuple) else None
            if value_similarity_0 is not None:
                print(f"Similarity between key '{key1}' and value '{value2[0]}': {value_similarity_0}")
                print(f"Similarity between key '{key1}' and value '{value2[1]}': {value_similarity_1}")
                
        # Compare value1 with each key in dict2
        for key2 in dict2.keys():
            value_similarity_0 = fuzzy_similarity(value1[0], key2) if isinstance(value1, tuple) else None
            value_similarity_1 = fuzzy_similarity(value1[1], key2) if isinstance(value1, tuple) else None
            if value_similarity_0 is not None:
                print(f"Similarity between value '{value1[0]}' and key '{key2}': {value_similarity_0}")
                print(f"Similarity between value '{value1[1]}' and key '{key2}': {value_similarity_1}")

# Example usage:
dict1 = {'apple': 'fruit', 'banana': 'fruit', 'carrot': 'vegetable'}
dict2 = {'apples': 'fruit', 'orange': 'fruit', 'carrots': 'vegetable'}

compare_dicts(dict1_cleaned, dict2_cleaned)


Similarity between key 'LegalRuleML' and key 'LegalRuleML': 1.0
Similarity between key 'LegalRuleML' and key 'PrescriptiveStatement': 0.19
Similarity between key 'LegalRuleML' and value 'id': 0.0
Similarity between key 'LegalRuleML' and value 'AccidentReporting': 0.21
Similarity between key 'LegalRuleML' and key 'Rule': 0.53
Similarity between key 'LegalRuleML' and value 'id': 0.0
Similarity between key 'LegalRuleML' and value 'AccidentReportingRule': 0.38
Similarity between key 'LegalRuleML' and key 'if': 0.0
Similarity between key 'LegalRuleML' and key 'Fact': 0.13
Similarity between key 'LegalRuleML' and key 'Rel': 0.29
Similarity between key 'LegalRuleML' and value 'iri': 0.0
Similarity between key 'LegalRuleML' and value '#majorAccident': 0.16
Similarity between key 'LegalRuleML' and key 'then': 0.13
Similarity between key 'LegalRuleML' and key 'Obligation': 0.19
Similarity between key 'LegalRuleML' and key 'Action': 0.0
Similarity between key 'LegalRuleML' and key 'IntimateAuthor

### Results

Some instances which cross a **threshold of 0.5**

- Similarity between value 'TimeUnit' and key 'Timing': 0.57
- Similarity between value '#prescribedAuthority' and key 'IntimateAuthority': 0.59
- Similarity between key 'Intimation' and key 'IntimateAuthority': 0.59
- Similarity between value 'AccidentReporting' and key 'Reporting': 0.69