Skip to content

irods/irods_rule_engine_plugins_policy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

iRODS Rule Engine Plugins - Policy

Motivation

The process for creating and deploying policy for iRODS requires the rule author to have a complete understanding of the API for iRODS as well as the associated plugin architecture in order to properly leverage the dynamic policy enforcement. The author may need to invoke the same policy across several policy enforcement points in order to cover all possible means to move data into iRODS for both object and POSIX data movement. The goal of this framework is to streamline the crafting and deployment of policy, as well as provide a reusable body of policy that may be easily configured.

Policy should be a matter of configuration rather than hand-crafted code. For most use cases, it should be able to follow a well-documented deployment pattern already in use by others.

policy_graphic

The Policy Interface

All policy (rules) to be invoked by this system must conform to a simple interface of two parameters (both serialized JSON strings) known as the parameters and the configuration. This policy may be implemented in any rule language or as a simple rule engine plugin.

For example, in the iRODS Rule Language

irods_policy_example_policy_implementation(*parameters, *configuration) {
    writeLine("stdout", "Hello, World!")
}

Or in Python, to be used by the Python Rule Engine Plugin:

def irods_policy_example_policy_implementation(rule_args, callback, rei):
# Parameters    rule_args[1]
# Configuration rule_args[2]

The parameters contain all the information captured by the event handler, or may be passed in as a prepopulated JSON object when configured. These values may change depending on how the policy is configured and the contract between the policy and the event handlers used. The configuration may contain any other pertinent information that the policy may need at the time of invocation. This is a way to externalize any variables the policy may leverage, such as a specific attribute to use for metadata application.

Policy may be invoked directly, by an event handler or by the Query Processor. This will be discussed later.

Event Handlers

Event handlers are a classification of rule engine plugin which consume dynamic policy enforcement points related to a noun within iRODS and invoke policy configured for events generated by the plugin. There exists one event handler per noun in the system.

The complete list is:

  • Data Objects
  • Collections
  • Metadata
  • Resources
  • Users and Groups

The policy to be invoked is a matter of the plugin-specific configuration within /etc/irods/server_config.json for a given instance of an event handler. The "plugin_specific_configuration" object for the given instance will look for a JSON array "policies_to_invoke", which itself is a series of JSON objects. These objects are the configuration of a policy to invoke for a given series of events.

The policy objects contain:

  • conditional
  • active_policy_clauses
  • events
  • policy_to_invoke

Conditional

A conditional describes a set of conditions which must be met in order to invoke the policy. These are a series of regular expressions which match the nouns involved. This could be the logical_path, metadata_applied, metadata_exists, user_name, source_resource, or destination_resource. The metadata conditionals are separated into two flavors, one for the application of metadata and one for invoking policy based on the existence of metadata. The metadata_exists conditional will test for the existence of matching metadata on any entity_type configured. The "entity_type" for the metadata maps to the iRODS nouns which are: "data_object", "collection", "resource", and "user". To match metadata that may exist anywhere in a logical path, a recursive flag may be set to walk the path looking for matching metadata.

An example might be:

"conditional" : {
    "logical_path" : "/tempZone/home/*",
    "metadata_applied" : {
        "attribute" : "foo*",
        "value" : "bar*",
        "units" : "baz*",
        "entity_type" : "data_object"
    }
}

An example for matching metadata in a logical path used to invoke an indexing event:

"conditional" : {
    "logical_path" : "/tempZone/home/*",
    "metadata_exists" : {
        "recursive" : "true",
        "attribute" : "irods::indexing::index",
        "value" : "elasticsearch::full_text",
        "entity_type" : "data_object"
    }
}

An example matching data objects put into a destination root resource:

"conditional" : {
    "logical_path" : "/tempZone/home/*",
    "destination_resource " : "dest_resc_*"
}

Active Policy Clauses

"active_policy_clauses" is a JSON array of one or more of the following strings: "pre", "post", "except", "finally". These map to which dynamic policy enforcement points are invoked at which point in the operation flow.

For example: "active_policy_clauses" : ["post"],

The policy clauses map directly to the policy enforcement point flow control, where a policy may be invoked at any of these points in the flow for any event configured.

Events

"events" is a JSON array of strings which map to the events generated by the event handler. Each event handler will have its own set of events for which it may invoke policy which are documented below.

For a data-object-modified example where data is ingested: "events" : ["create", "write", "registration"],

Policy to Invoke

"policy_to_invoke" is a JSON string which is the name of the policy to invoke. Following the policy is a "configuration" object which contains any specific information related to that given policy.

An example for data replication:

"policy_to_invoke" : "irods_policy_data_replication",
"configuration" : {
    "source_to_destination_map" : {
        "edge_resource_0" : ["long_term_resource_0"],
        "edge_resource_1" : ["long_term_resource_1"],
}

Data Object Modified Event Handler

The Data Object Modified event handler unifies both the Object and POSIX semantics, as well as other iRODS specific operations such as registration, into a single point of truth for invoking policy related to data access. The plugin maps policy enforcement points to specific set of events for which policy may be configured. The event handler provides events for all operations related to data objects:

  • CHECKSUM
  • COPY
  • CREATE
  • GET
  • PUT
  • REGISTER
  • RENAME
  • REPLICATION
  • SEEK
  • TRIM
  • TRUNCATE
  • UNLINK

The data object modified event handler captures all variables within the dataObjInp_t and rsComm_t which are then seralized to JSON and passed to the invoked policy. Additional information such as the event and associated policy enforcement point are also included.

{
"comm":{
    "auth_scheme":"native","client_addr":"X.X.X.X","proxy_auth_info_auth_flag":"5","proxy_auth_info_auth_scheme":"",
    "proxy_auth_info_auth_str":"","proxy_auth_info_flag":"0","proxy_auth_info_host":"","proxy_auth_info_ppid":"0",
    "proxy_rods_zone":"tempZone","proxy_sys_uid":"0","proxy_user_name":"rods","proxy_user_other_info_user_comments":"",
    "proxy_user_other_info_user_create":"","proxy_user_other_info_user_info":"","proxy_user_other_info_user_modify":"",
    "proxy_user_type":"","user_auth_info_auth_flag":"5","user_auth_info_auth_scheme":"","user_auth_info_auth_str":"",
    "user_auth_info_flag":"0","user_auth_info_host":"","user_auth_info_ppid":"0","user_rods_zone":"tempZone",
    "user_sys_uid":"0","user_user_name":"rods","user_user_other_info_user_comments":"","user_user_other_info_user_create":"",
    "user_user_other_info_user_info":"","user_user_other_info_user_modify":"","user_user_type":""
    },
"cond_input":{
    "dataIncluded":"","dataType":"generic","destRescName":"ufs0","noOpenFlag":"","openType":"1",
    "recursiveOpr":"1", "resc_hier":"ufs0","selObjType":"dataObj","translatedPath":""
    },
"create_mode":"33204",
"data_size":"1",
"event":"CREATE",
"num_threads":"0",
"obj_path":"/tempZone/home/rods/test_put_gt_max_sql_rows/junk0083",
"offset":"0",
"open_flags":"2",
"opr_type":"1",
"policy_enforcement_point":"pep_api_data_obj_put_post"
}

Metadata Modified Event Handler

The metadata modifed event handler reacts to the interaction of a client with the user defined metadata within the catalog and generates one event: METADATA.

It provides a parameter object of the following form:

{
    "metadata" : {
        "operation"   : "",
        "entity_type" : "",
        "attribute"   : "",
        "value"       : "",
        "units"       : ""
    },
    "logical_path" : "",
    "source_resource" : "",
    "user_name" : ""
}

operation may be one of the following: set, add, or remove.

entity_type may be one of data_object, collection, resource, or user.

logical_path, source_resource, and user_name are optional depending on the target of the metadata operation.

Collection Event Handler

The collection event handler emits three operations: CREATE, REMOVE, and REGISTER.

For the REGISTER operation it provides a parameters object identical to the Data Object Modified event handler including the dataObjInp_t and rcComm_t.

For the CREATE and REMOVE operations the collInp_t is serialized which provides logical_path, flags, opr_type, and the cond_input.

Administration Event Handlers: Resource, User, Group, and Zone

Interaction with the other nouns in the system is performed solely through the general administration API endpoint which allows for these event handlers to provide identical configuration and behavior. The events emitted are CREATE, MODIFY and REMOVE.

The generalAdminInp_t is serialized which provides action, target, and arg2 through arg9 which contain various information depending on the action and target in question. Additionally, depending on the target, user_name, group_name, source_resource or zone will be present.

Policy Engines

The policy engine framework provides a set of utilities for the creation of a light rule engine plugin which implements a policy that conforms to the Event Handler interface. It is a goal that the community continues to capture policy which may be reflected as reusable components within this framework.

Query Processor

The irods_policy_query_processor policy engine wraps the query_processor library within the iRODS development environment. This policy engine will invoke a configured policy for every resulting row from the given query. Each resulting row is passed to the invoked policy via the parameters as a JSON array query_results. The data within the array arrives in the same order as the columns selected within the query.

Example:

"query_string" : "SELECT USER_NAME, COLL_NAME, DATA_NAME, RESC_NAME WHERE COLL_NAME = '/tempZone/home/rods' AND DATA_NAME = 'test_put_file'",
"query_limit" : 1,
"query_type" : "general",
"number_of_threads" : 1,
"policy_to_invoke" : "irods_policy_testing_policy",
"configuration" : {
}

Access Time

The access_time policy engine will annotate a data object with the last access time, which is useful for other policies such as data movement. By default, a metadata attribute of irods::access_time is utilized. This can be overridden with an "attribute" string in the "configuration" of the policy.

Example:

            {
                "instance_name": "irods_rule_engine_plugin-policy_engine-access_time-instance",
                "plugin_name": "irods_rule_engine_plugin-policy_engine-access_time",
                "plugin_specific_configuration": {
                    "log_errors" : "true"
                }
            },
            {
                "instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
                "plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
                "plugin_specific_configuration": {
                    "policies_to_invoke" : [
                        {
                            "active_policy_clauses" : ["post"],
                            "events" : ["put", "get", "create", "read", "write", "rename", "registration", "replication"],
                            "policy_to_invoke"    : "irods_policy_access_time",
                            "configuration" : {
                                "attribute" : "custom_access_time_attribute"
                            }
                        }
                    ]
                }
            }

Data Replication

The data_replication policy engine will replicate data from a resource to a configured destination resource, or use a mapping from source resource to an array of destination resources.

Synchronous Replication

Example:

           {
                "instance_name": "irods_rule_engine_plugin-policy_engine-data_replication-instance",
                "plugin_name": "irods_rule_engine_plugin-policy_engine-data_replication",
                "plugin_specific_configuration": {
                    "log_errors" : "true"
                }
           },
           {
                "instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
                "plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
                "plugin_specific_configuration": {
                    "policies_to_invoke" : [
                        {
                            "active_policy_clauses" : ["post"],
                            "events" : ["put", "create", "write", "registration"],
                            "policy_to_invoke"    : "irods_policy_data_replication",
                            "configuration" : {
                                "destination_resource" : "AnotherResc",                                
                            }
                        }
                    ]
                }
           },
           {
                "instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
                "plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
                "plugin_specific_configuration": {
                    "policies_to_invoke" : [
                        {
                            "active_policy_clauses" : ["post"],
                            "events" : ["put", "create", "write", "registration"],
                            "policy_to_invoke"    : "irods_policy_data_replication",
                            "configuration" : {
                                "source_to_destination_map" : {
                                     "source_resource_0" : ["destination_resource_0", "destination_resource_1"],
                                     "source_resource_1" : ["destination_resource_2", "destination_resource_3"]                                     
                                }
                            }
                        }
                    ]
                }
           }               

Asynchronous Replication

Example:

           {
                "instance_name": "irods_rule_engine_plugin-policy_engine-data_replication-instance",
                "plugin_name": "irods_rule_engine_plugin-policy_engine-data_replication",
                "plugin_specific_configuration": {
                    "log_errors" : "true"
                }
           },
           {
                "instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
                "plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
                "plugin_specific_configuration": {
                    "policies_to_invoke" : [
                        {
                            "active_policy_clauses" : ["post"],
                            "events" : ["put", "create", "write", "registration"],
                            "policy_to_invoke" : "irods_policy_enqueue_rule",
                            "parameters" : {
                                "delay_conditions" : "<PLUSET>1s</PLUSET>",
                                "policy_to_invoke" : "irods_policy_execute_rule",
                                "parameters" : {
                                    "policy_to_invoke"    : "irods_policy_data_replication",
                                    "configuration" : {
                                        "source_to_destination_map" : {
                                            "source_resource_0" : ["destination_resource_0", "destination_resource_1"],
                                            "source_resource_1" : ["destination_resource_2", "destination_resource_3"]
                                        }
                                    }
                                }
                            }
                        }
                    ]
                }
           }

Data Retention

The data_retention policy engine will either remove a given data object or trim a single replica of the data object depending on the mode. The mode may either be "trim_single_replica" or "remove_all_replicas". The configuration also supports a "resource_white_list", an array of resource names that defines which resources may have their data removed.

Synchronous Data Retention

            {
                "instance_name": "irods_rule_engine_plugin-policy_engine-data_retention-instance",
                "plugin_name": "irods_rule_engine_plugin-policy_engine-data_retention",
                "plugin_specific_configuration": {
                    "log_errors" : "true"
                }
            },
            {
                "instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
                "plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
                "plugin_specific_configuration": {
                    "policies_to_invoke" : [
                        {
                            "active_policy_clauses" : ["post"],
                            "events" : ["replication"],
                            "policy_to_invoke"    : "irods_policy_data_retention",
                            "configuration" : {
                                "mode" : "trim_single_replica"
                            }
                        }
                    ]
                }
            }

Asynchronous Data Retention

            {
                "instance_name": "irods_rule_engine_plugin-policy_engine-data_retention-instance",
                "plugin_name": "irods_rule_engine_plugin-policy_engine-data_retention",
                "plugin_specific_configuration": {
                    "log_errors" : "true"
                }
            },
            {
                "instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
                "plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
                "plugin_specific_configuration": {
                    "policies_to_invoke" : [
                        {
                            "active_policy_clauses" : ["post"],
                            "events" : ["replication"],
                            "policy_to_invoke" : "irods_policy_enqueue_rule",
                            "parameters" : {
                                "delay_conditions" : "<PLUSET>1s</PLUSET>",                            
                                "policy_to_invoke" : "irods_policy_execute_rule",
                                "parameters" : {
                                    "policy_to_invoke"    : "irods_policy_data_retention",
                                    "configuration" : {
                                        "mode" : "trim_single_replica",
                                        "resource_white_list" : ["demoResc", "AnotherResc"]
                                    }
                                }
                            }                            
                        }
                    ]
                }
            }

Data Verification

The data_verification policy engine is used to determine if a replica of a data object is correct at rest. This verification can take one of three methods as configured by administrative metadata annotating the replica's root resource: irods::verification::type. Should another attribute be desired, it may be configured using the "attribute" setting.

Verification types include: "catalog", "filesystem", and "checksum".

The "catalog" mode will stat the object within the catalog to determine that it is properly registered.

The "filesystem" configuration will stat the object in the catalog and then stat the object at rest within the storage resource and compare sizes and determine whether they match.

The "checksum" configuration will compute a checksum of the replica at rest and compare that with the catalog. Should no checksum exist in the catalog another good replica will be used to compute the checksum.

Example:

            {
                "instance_name": "irods_rule_engine_plugin-policy_engine-data_verification-instance",
                "plugin_name": "irods_rule_engine_plugin-policy_engine-data_verification",
                "plugin_specific_configuration": {
                    "log_errors" : "true"
                }
            },
            {
                "instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
                "plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
                "plugin_specific_configuration": {
                    "policies_to_invoke" : [
                        {
                            "active_policy_clauses" : ["post"],
                            "events" : ["replication"],
                            "policy_to_invoke"    : "irods_policy_data_verification",
                            "configuration" : {
                                "log_errors" : "true",
                                "attribute"  : "event_handler_attribute",
                            }
                        }
                    ]
                }
            }

Checksum Verification

Data integrity may be verified directly with a computation of the replica's checksum, and comparison with the assumed existing catalog value. This policy requires a logical_path and a source_resource parameter in order to be invoked correctly.

Example ConfigurationW

Within the server configuration:

            {
                "instance_name": "irods_rule_engine_plugin-policy_engine-verify_checksum-instance",
                "plugin_name": "irods_rule_engine_plugin-policy_engine-verify_checksum",
                "plugin_specific_configuration": {
                }
            },

An implementation of periodic checksum verification:

{
    "policy_to_invoke" : "irods_policy_enqueue_rule",
    "parameters" : {
        "delay_conditions" : "<PLUSET>1s</PLUSET><EF>REPEAT FOR EVER</EF><INST_NAME>irods_rule_engine_plugin-cpp_default_policy-instance</INST_NAME>",
        "policy_to_invoke" : "irods_policy_execute_rule",
        "parameters" : {
            "policy_to_invoke"    : "irods_policy_query_processor",
            "parameters" : {
                "query_string"  : "SELECT USER_NAME, COLL_NAME, DATA_NAME, RESC_NAME WHERE RESC_NAME like 'tier_%'",
                "query_limit"   : 0,
                "query_type"    : "general",
                "number_of_threads" : 1,
                "policies_to_invoke" : [
                    {
                        "policy_to_invoke" : "irods_policy_verify_checksum",
                        "configuration" : {
                            "log_errors" : "true"
                        }
                    }
                ]
            }
        }
    }
}
INPUT null
OUTPUT ruleExecOut

Checksum Verification

Data integrity may be verified directly with a computation of the replica's checksum, and comparison with the assumed existing catalog value. This policy requires a logical_path and a source_resource parameter in order to be invoked correctly.

Example ConfigurationW

           {
                "instance_name": "irods_rule_engine_plugin-policy_engine-filesystem_usage-instance",
                "plugin_name": "irods_rule_engine_plugin-policy_engine-filesystem_usage",
                "plugin_specific_configuration": {
                    "log_errors" : "true"
                }
           }

An implementation of a periodic rule to invoke the policy:

{
    "policy_to_invoke" : "irods_policy_enqueue_rule",
    "parameters" : {
        "comment"          : "Set the PLUSET value to the interval desired to run the rule",
        "delay_conditions" : "<PLUSET>10s</PLUSET><EF>REPEAT FOR EVER</EF><INST_NAME>irods_rule_engine_plugin-cpp_default_policy-instance</INST_NAME>",
        "policy_to_invoke" : "irods_policy_execute_rule",
        "parameters" : {
            "policy_to_invoke"    : "irods_policy_filesystem_usage",
            "parameters" : {
                "source_resource" : "demoResc"
            }
        }
    }
}
INPUT null
OUTPUT ruleExecOut

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published