The process for creating and deploying policy for iRODS requires the rule author to have a complete understanding of the API for iRODS as well as the associated plugin architecture in order to properly leverage the dynamic policy enforcement. The author may need to invoke the same policy across several policy enforcement points in order to cover all possible means to move data into iRODS for both object and POSIX data movement. The goal of this framework is to streamline the crafting and deployment of policy, as well as provide a reusable body of policy that may be easily configured.
Policy should be a matter of configuration rather than hand-crafted code. For most use cases, it should be able to follow a well-documented deployment pattern already in use by others.
All policy (rules) to be invoked by this system must conform to a simple interface of two parameters (both serialized JSON strings) known as the parameters
and the configuration
. This policy may be implemented in any rule language or as a simple rule engine plugin.
For example, in the iRODS Rule Language
irods_policy_example_policy_implementation(*parameters, *configuration) {
writeLine("stdout", "Hello, World!")
}
Or in Python, to be used by the Python Rule Engine Plugin:
def irods_policy_example_policy_implementation(rule_args, callback, rei):
# Parameters rule_args[1]
# Configuration rule_args[2]
The parameters
contain all the information captured by the event handler, or may be passed in as a prepopulated JSON object when configured. These values may change depending on how the policy is configured and the contract between the policy and the event handlers used. The configuration
may contain any other pertinent information that the policy may need at the time of invocation. This is a way to externalize any variables the policy may leverage, such as a specific attribute to use for metadata application.
Policy may be invoked directly, by an event handler or by the Query Processor. This will be discussed later.
Event handlers are a classification of rule engine plugin which consume dynamic policy enforcement points related to a noun within iRODS and invoke policy configured for events generated by the plugin. There exists one event handler per noun in the system.
The complete list is:
- Data Objects
- Collections
- Metadata
- Resources
- Users and Groups
The policy to be invoked is a matter of the plugin-specific configuration within /etc/irods/server_config.json
for a given instance of an event handler. The "plugin_specific_configuration"
object for the given instance will look for a JSON array "policies_to_invoke"
, which itself is a series of JSON objects. These objects are the configuration of a policy to invoke for a given series of events.
The policy objects contain:
- conditional
- active_policy_clauses
- events
- policy_to_invoke
A conditional describes a set of conditions which must be met in order to invoke the policy. These are a series of regular expressions which match the nouns involved. This could be the logical_path
, metadata_applied
, metadata_exists
, user_name
, source_resource
, or destination_resource
. The metadata conditionals are separated into two flavors, one for the application of metadata and one for invoking policy based on the existence of metadata. The metadata_exists
conditional will test for the existence of matching metadata on any entity_type
configured. The "entity_type"
for the metadata maps to the iRODS nouns which are: "data_object", "collection", "resource", and "user"
. To match metadata that may exist anywhere in a logical path, a recursive
flag may be set to walk the path looking for matching metadata.
An example might be:
"conditional" : {
"logical_path" : "/tempZone/home/*",
"metadata_applied" : {
"attribute" : "foo*",
"value" : "bar*",
"units" : "baz*",
"entity_type" : "data_object"
}
}
An example for matching metadata in a logical path used to invoke an indexing event:
"conditional" : {
"logical_path" : "/tempZone/home/*",
"metadata_exists" : {
"recursive" : "true",
"attribute" : "irods::indexing::index",
"value" : "elasticsearch::full_text",
"entity_type" : "data_object"
}
}
An example matching data objects put into a destination root resource:
"conditional" : {
"logical_path" : "/tempZone/home/*",
"destination_resource " : "dest_resc_*"
}
"active_policy_clauses"
is a JSON array of one or more of the following strings: "pre", "post", "except", "finally"
. These map to which dynamic policy enforcement points are invoked at which point in the operation flow.
For example: "active_policy_clauses" : ["post"],
The policy clauses map directly to the policy enforcement point flow control, where a policy may be invoked at any of these points in the flow for any event configured.
"events"
is a JSON array of strings which map to the events generated by the event handler. Each event handler will have its own set of events for which it may invoke policy which are documented below.
For a data-object-modified example where data is ingested: "events" : ["create", "write", "registration"],
"policy_to_invoke"
is a JSON string which is the name of the policy to invoke. Following the policy is a "configuration"
object which contains any specific information related to that given policy.
An example for data replication:
"policy_to_invoke" : "irods_policy_data_replication",
"configuration" : {
"source_to_destination_map" : {
"edge_resource_0" : ["long_term_resource_0"],
"edge_resource_1" : ["long_term_resource_1"],
}
The Data Object Modified event handler unifies both the Object and POSIX semantics, as well as other iRODS specific operations such as registration, into a single point of truth for invoking policy related to data access. The plugin maps policy enforcement points to specific set of events for which policy may be configured. The event handler provides events for all operations related to data objects:
- CHECKSUM
- COPY
- CREATE
- GET
- PUT
- REGISTER
- RENAME
- REPLICATION
- SEEK
- TRIM
- TRUNCATE
- UNLINK
The data object modified event handler captures all variables within the dataObjInp_t
and rsComm_t
which are then seralized to JSON and passed to the invoked policy. Additional information such as the event and associated policy enforcement point are also included.
{
"comm":{
"auth_scheme":"native","client_addr":"X.X.X.X","proxy_auth_info_auth_flag":"5","proxy_auth_info_auth_scheme":"",
"proxy_auth_info_auth_str":"","proxy_auth_info_flag":"0","proxy_auth_info_host":"","proxy_auth_info_ppid":"0",
"proxy_rods_zone":"tempZone","proxy_sys_uid":"0","proxy_user_name":"rods","proxy_user_other_info_user_comments":"",
"proxy_user_other_info_user_create":"","proxy_user_other_info_user_info":"","proxy_user_other_info_user_modify":"",
"proxy_user_type":"","user_auth_info_auth_flag":"5","user_auth_info_auth_scheme":"","user_auth_info_auth_str":"",
"user_auth_info_flag":"0","user_auth_info_host":"","user_auth_info_ppid":"0","user_rods_zone":"tempZone",
"user_sys_uid":"0","user_user_name":"rods","user_user_other_info_user_comments":"","user_user_other_info_user_create":"",
"user_user_other_info_user_info":"","user_user_other_info_user_modify":"","user_user_type":""
},
"cond_input":{
"dataIncluded":"","dataType":"generic","destRescName":"ufs0","noOpenFlag":"","openType":"1",
"recursiveOpr":"1", "resc_hier":"ufs0","selObjType":"dataObj","translatedPath":""
},
"create_mode":"33204",
"data_size":"1",
"event":"CREATE",
"num_threads":"0",
"obj_path":"/tempZone/home/rods/test_put_gt_max_sql_rows/junk0083",
"offset":"0",
"open_flags":"2",
"opr_type":"1",
"policy_enforcement_point":"pep_api_data_obj_put_post"
}
The metadata modifed event handler reacts to the interaction of a client with the user defined metadata within the catalog and generates one event: METADATA
.
It provides a parameter object of the following form:
{
"metadata" : {
"operation" : "",
"entity_type" : "",
"attribute" : "",
"value" : "",
"units" : ""
},
"logical_path" : "",
"source_resource" : "",
"user_name" : ""
}
operation
may be one of the following: set
, add
, or remove
.
entity_type
may be one of data_object
, collection
, resource
, or user
.
logical_path
, source_resource
, and user_name
are optional depending on the target of the metadata operation.
The collection event handler emits three operations: CREATE
, REMOVE
, and REGISTER
.
For the REGISTER
operation it provides a parameters object identical to the Data Object Modified event handler including the dataObjInp_t
and rcComm_t
.
For the CREATE
and REMOVE
operations the collInp_t
is serialized which provides logical_path
, flags
, opr_type
, and the cond_input
.
Interaction with the other nouns in the system is performed solely through the general administration API endpoint which allows for these event handlers to provide identical configuration and behavior. The events emitted are CREATE
, MODIFY
and REMOVE
.
The generalAdminInp_t
is serialized which provides action
, target
, and arg2
through arg9
which contain various information depending on the action and target in question. Additionally, depending on the target, user_name
, group_name
, source_resource
or zone
will be present.
The policy engine framework provides a set of utilities for the creation of a light rule engine plugin which implements a policy that conforms to the Event Handler interface. It is a goal that the community continues to capture policy which may be reflected as reusable components within this framework.
The irods_policy_query_processor
policy engine wraps the query_processor
library within the iRODS development environment. This policy engine will invoke a configured policy for every resulting row from the given query. Each resulting row is passed to the invoked policy via the parameters as a JSON array query_results
. The data within the array arrives in the same order as the columns selected within the query.
Example:
"query_string" : "SELECT USER_NAME, COLL_NAME, DATA_NAME, RESC_NAME WHERE COLL_NAME = '/tempZone/home/rods' AND DATA_NAME = 'test_put_file'",
"query_limit" : 1,
"query_type" : "general",
"number_of_threads" : 1,
"policy_to_invoke" : "irods_policy_testing_policy",
"configuration" : {
}
The access_time
policy engine will annotate a data object with the last access time, which is useful for other policies such as data movement. By default, a metadata attribute of irods::access_time
is utilized. This can be overridden with an "attribute"
string in the "configuration"
of the policy.
Example:
{
"instance_name": "irods_rule_engine_plugin-policy_engine-access_time-instance",
"plugin_name": "irods_rule_engine_plugin-policy_engine-access_time",
"plugin_specific_configuration": {
"log_errors" : "true"
}
},
{
"instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
"plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
"plugin_specific_configuration": {
"policies_to_invoke" : [
{
"active_policy_clauses" : ["post"],
"events" : ["put", "get", "create", "read", "write", "rename", "registration", "replication"],
"policy_to_invoke" : "irods_policy_access_time",
"configuration" : {
"attribute" : "custom_access_time_attribute"
}
}
]
}
}
The data_replication
policy engine will replicate data from a resource to a configured destination resource, or use a mapping from source resource to an array of destination resources.
Example:
{
"instance_name": "irods_rule_engine_plugin-policy_engine-data_replication-instance",
"plugin_name": "irods_rule_engine_plugin-policy_engine-data_replication",
"plugin_specific_configuration": {
"log_errors" : "true"
}
},
{
"instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
"plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
"plugin_specific_configuration": {
"policies_to_invoke" : [
{
"active_policy_clauses" : ["post"],
"events" : ["put", "create", "write", "registration"],
"policy_to_invoke" : "irods_policy_data_replication",
"configuration" : {
"destination_resource" : "AnotherResc",
}
}
]
}
},
{
"instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
"plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
"plugin_specific_configuration": {
"policies_to_invoke" : [
{
"active_policy_clauses" : ["post"],
"events" : ["put", "create", "write", "registration"],
"policy_to_invoke" : "irods_policy_data_replication",
"configuration" : {
"source_to_destination_map" : {
"source_resource_0" : ["destination_resource_0", "destination_resource_1"],
"source_resource_1" : ["destination_resource_2", "destination_resource_3"]
}
}
}
]
}
}
Example:
{
"instance_name": "irods_rule_engine_plugin-policy_engine-data_replication-instance",
"plugin_name": "irods_rule_engine_plugin-policy_engine-data_replication",
"plugin_specific_configuration": {
"log_errors" : "true"
}
},
{
"instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
"plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
"plugin_specific_configuration": {
"policies_to_invoke" : [
{
"active_policy_clauses" : ["post"],
"events" : ["put", "create", "write", "registration"],
"policy_to_invoke" : "irods_policy_enqueue_rule",
"parameters" : {
"delay_conditions" : "<PLUSET>1s</PLUSET>",
"policy_to_invoke" : "irods_policy_execute_rule",
"parameters" : {
"policy_to_invoke" : "irods_policy_data_replication",
"configuration" : {
"source_to_destination_map" : {
"source_resource_0" : ["destination_resource_0", "destination_resource_1"],
"source_resource_1" : ["destination_resource_2", "destination_resource_3"]
}
}
}
}
}
]
}
}
The data_retention
policy engine will either remove a given data object or trim a single replica of the data object depending on the mode
. The mode may either be "trim_single_replica"
or "remove_all_replicas"
. The configuration also supports a "resource_white_list"
, an array of resource names that defines which resources may have their data removed.
{
"instance_name": "irods_rule_engine_plugin-policy_engine-data_retention-instance",
"plugin_name": "irods_rule_engine_plugin-policy_engine-data_retention",
"plugin_specific_configuration": {
"log_errors" : "true"
}
},
{
"instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
"plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
"plugin_specific_configuration": {
"policies_to_invoke" : [
{
"active_policy_clauses" : ["post"],
"events" : ["replication"],
"policy_to_invoke" : "irods_policy_data_retention",
"configuration" : {
"mode" : "trim_single_replica"
}
}
]
}
}
{
"instance_name": "irods_rule_engine_plugin-policy_engine-data_retention-instance",
"plugin_name": "irods_rule_engine_plugin-policy_engine-data_retention",
"plugin_specific_configuration": {
"log_errors" : "true"
}
},
{
"instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
"plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
"plugin_specific_configuration": {
"policies_to_invoke" : [
{
"active_policy_clauses" : ["post"],
"events" : ["replication"],
"policy_to_invoke" : "irods_policy_enqueue_rule",
"parameters" : {
"delay_conditions" : "<PLUSET>1s</PLUSET>",
"policy_to_invoke" : "irods_policy_execute_rule",
"parameters" : {
"policy_to_invoke" : "irods_policy_data_retention",
"configuration" : {
"mode" : "trim_single_replica",
"resource_white_list" : ["demoResc", "AnotherResc"]
}
}
}
}
]
}
}
The data_verification
policy engine is used to determine if a replica of a data object is correct at rest. This verification can take one of three methods as configured by administrative metadata annotating the replica's root resource: irods::verification::type
. Should another attribute be desired, it may be configured using the "attribute"
setting.
Verification types include: "catalog"
, "filesystem"
, and "checksum"
.
The "catalog"
mode will stat the object within the catalog to determine that it is properly registered.
The "filesystem"
configuration will stat the object in the catalog and then stat the object at rest within the storage resource and compare sizes and determine whether they match.
The "checksum"
configuration will compute a checksum of the replica at rest and compare that with the catalog. Should no checksum exist in the catalog another good replica will be used to compute the checksum.
Example:
{
"instance_name": "irods_rule_engine_plugin-policy_engine-data_verification-instance",
"plugin_name": "irods_rule_engine_plugin-policy_engine-data_verification",
"plugin_specific_configuration": {
"log_errors" : "true"
}
},
{
"instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
"plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
"plugin_specific_configuration": {
"policies_to_invoke" : [
{
"active_policy_clauses" : ["post"],
"events" : ["replication"],
"policy_to_invoke" : "irods_policy_data_verification",
"configuration" : {
"log_errors" : "true",
"attribute" : "event_handler_attribute",
}
}
]
}
}
Data integrity may be verified directly with a computation of the replica's checksum, and comparison with the assumed existing catalog value. This policy requires a logical_path
and a source_resource
parameter in order to be invoked correctly.
Within the server configuration:
{
"instance_name": "irods_rule_engine_plugin-policy_engine-verify_checksum-instance",
"plugin_name": "irods_rule_engine_plugin-policy_engine-verify_checksum",
"plugin_specific_configuration": {
}
},
An implementation of periodic checksum verification:
{
"policy_to_invoke" : "irods_policy_enqueue_rule",
"parameters" : {
"delay_conditions" : "<PLUSET>1s</PLUSET><EF>REPEAT FOR EVER</EF><INST_NAME>irods_rule_engine_plugin-cpp_default_policy-instance</INST_NAME>",
"policy_to_invoke" : "irods_policy_execute_rule",
"parameters" : {
"policy_to_invoke" : "irods_policy_query_processor",
"parameters" : {
"query_string" : "SELECT USER_NAME, COLL_NAME, DATA_NAME, RESC_NAME WHERE RESC_NAME like 'tier_%'",
"query_limit" : 0,
"query_type" : "general",
"number_of_threads" : 1,
"policies_to_invoke" : [
{
"policy_to_invoke" : "irods_policy_verify_checksum",
"configuration" : {
"log_errors" : "true"
}
}
]
}
}
}
}
INPUT null
OUTPUT ruleExecOut
Data integrity may be verified directly with a computation of the replica's checksum, and comparison with the assumed existing catalog value. This policy requires a logical_path
and a source_resource
parameter in order to be invoked correctly.
{
"instance_name": "irods_rule_engine_plugin-policy_engine-filesystem_usage-instance",
"plugin_name": "irods_rule_engine_plugin-policy_engine-filesystem_usage",
"plugin_specific_configuration": {
"log_errors" : "true"
}
}
An implementation of a periodic rule to invoke the policy:
{
"policy_to_invoke" : "irods_policy_enqueue_rule",
"parameters" : {
"comment" : "Set the PLUSET value to the interval desired to run the rule",
"delay_conditions" : "<PLUSET>10s</PLUSET><EF>REPEAT FOR EVER</EF><INST_NAME>irods_rule_engine_plugin-cpp_default_policy-instance</INST_NAME>",
"policy_to_invoke" : "irods_policy_execute_rule",
"parameters" : {
"policy_to_invoke" : "irods_policy_filesystem_usage",
"parameters" : {
"source_resource" : "demoResc"
}
}
}
}
INPUT null
OUTPUT ruleExecOut