Skip to content

Configuration

povimd9 edited this page May 14, 2023 · 24 revisions

Configuration

FileChampion4j configurations are defined in a JSON object that is passed to the FileChampion4j class at initialization. This approach allows you to choose any design pattern that fits your use case, such as loading the configuration from a file, environment, remote location, or other source.

The configurations include 3 sections:

  • General - configurations impacting multiple objects
  • Validations - configurations of validation categories and related definitions
  • Plugins - configurations of plugins that can be used in validations

JSON Structure

FileChampion configurations are defined in a JSON object with the following structure:

{
  "General": {
    "--- Options impacting multiple objects ---"
  },
  "Validations": {
    "File Category 1": {
      "File Extension 1": {
        "--- Validation methods for file extension 1 ---"
      },
      "File Extension 2": {
        "--- Validation methods for file extension 2 ---"
      }
    },
    "File Category 2": {
      "File Extension 1": {
        "--- Validation methods for file extension 1 ---"
      },
      "File Extension 2": {
        "--- Validation methods for file extension 2 ---"
      }
    }
  },
  "Plugins": {
    "Plugin 1": {
      "step1.step": {
        "--- Step 1 of plugin 1 ---"
      },
      "step2.step": {
        "--- Step 2 of plugin 1 ---"
      },
    "Plugin 2": {
      "some_step.step": {...}
    }
  }
}

General Options

Configurations defined in this section impact all validations/plugins.

Checksums

Files checksum is a Hash-based signature, corresponding uniquely to the file content.

Checksum values can be used for multiple purposes, including:

  • Integrity verification of file content, ensuring file has not changed since signature generation
  • Track files by unique identifiers
  • Query internal/external services with signature for identification (such as ensuring 'singe time' upload of a file/checking malicious databases for known threats)

FileChampion calculates a SHA-256 checksum for validated files by default. This is in accordance with current industry practices, balancing between performance and uniqueness of calculated hashes.

While this may be sufficient for most use cases, some instances might require use of algorithms other than SHA-256. Such needs are supported by defining the required hash algorithms in the "General" section of the configurations.

Supported Algorithms:

  • MD5 - Considered broken, use with caution
  • SHA-1 - Considered weak, use with caution
  • SHA-256
  • SHA-512

Note that choosing the appropriate algorithm is a tradeoff between uniqueness and performance, such that stronger algorithms have direct impact on hashing performance.

Single or multiple algorithms may be defined - 'getFileChecksums()' will return a HashMap with "Algorithm": "Checksum" sets of defined algorithms. Note that defining multiple algorithm will impact processing time of a file in accordance with hashing time

Example JSON with all supported algorithms:

"General": {
  "checksums": ["MD5", "SHA-1", "SHA-256", "SHA-512"]
}

Using these configurations will result in 4 key:value sets in the getFileChecksums(), with corresponding algorithm:checksum entries.

Defining Validations

The "Validations" section of the configurations includes all validation-related definitions, and has the following structure:

"Validations": {
  "File Category 1": {
    "File Extension 1": {
      "--- Validation methods for file extension 1 ---"
    },
    "File Extension 2": {
      "--- Validation methods for file extension 2 ---"
    }
  },
  "File Category 2": {
    "File Extension 1": {
      "--- Validation methods for file extension 1 ---"
    },
    "File Extension 2": {
      "--- Validation methods for file extension 2 ---"
    }
  }
}

File types can be aggregated by category (such as "Documents" or "Images"), or used as "Any" for a single category object.

Each category section includes a validation definition for each file extension. The "key" is the file type extension, such as "pdf", "docx", or any other file extension that is expected to pass validation.

For each extension section, at least one validation must be defined, or error is thrown at initialization.

Validations include size, MIME types, magic bytes, and header/footer signatures. (see Validation Options for details)

For each extension section, the library also supports the following file processing tasks:

  • Setting file ownership and permissions
  • Base64 encoding of the file name (if an output directory was set as part of the doValidation() call)

In addition to built-in library functionality, FileChampion supports execution of custom tasks for every file type, as defined under the "Plugins" section (since version 0.9.8). (See the Plugins page for more details and the Roadmap for enhanced plugin definition timelines)

It is important to note that FileChampion does not include any out-of-the-box file type definitions. This is intended to support any potential file types, including custom file types.

To find definitions for existing/common file types, you can use the following resources or any equivalent source:

Validation Options

Validation Type Key Value Example
Check that file matches expected MIME type for extension. String "mime_type" mime type associated with this file type "mime_type": "application/pdf"
Check that file contains extension type magic bytes String "magic_bytes" 'magic bytes' associated with this file type "magic_bytes": "25504446"
Check that file contains extension header bytes signature String "header_signatures" header bytes associated with this file type "header_signatures": "25504446"
Check that file contains extension footerbytes signature String "footer_signatures" footer bytes associated with this file type "footer_signatures": "2525454f46"
Should file permissions be changed from default? Bool "change_ownership" true/false if file saving should perform ownership/permissions tasks on file "change_ownership": true
Username that should be set for the file if change_ownership is true String "change_ownership_user" System user/account name to set for file ownership "change_ownership_user": "lowPrivAcnt"
Permissions to be set for the file if change_ownership is true String "change_ownership_mode" file permissions to set in 'rwx' format "change_ownership_mode": "r"
Should the file name be base64 encoded? bool "name_encoding" true/false if file name should be base64 encoded "name_encoding": true
Max size allowed for the file type String "max_size" max file size in KB for validation to pass "max_size": "126000"
Disable File Checksum Calculation bool "add_checksum" optional disabling of checksum calculation (for performance improvement when not required) "add_checksum": false
Return results on first failure bool "fail_fast" return validation results upon first failure to improve performance "fail_fast": true
Extensions to be executed as part of file extension validations String Array "extension_plugins" list of plugins that should be executed as part of file type validations "extension_plugins": ["PLUGIN_NAME.STEP_NAME", "handle_pdf_documents.step1", "handle_pdf_documents.step2"]

Defining Plugins

Introduction

FileChampion plugins engine allows defining custom tasks, to be carried out before/after file validation controls execution.

Currently only 'Cli' plugin type is supported, which allows running any target process available on the system.

FileChampion4j v0.9.9 will support defining 'http' plugins, to perform http based requests as a plugin definition (GET/POST/PUT).

Plugins are defined in the "Plugins" section of the JSON (root), with every plugin defining relevant job steps. These sections can contain any number of plugins/steps.

Every step must include 'type', 'endpoint', 'timeout', 'on_timeout_or_fail', and 'response'. These define the plugin type (cli/http), target command as string, timeout in seconds after which process will be terminated, 'fail' or 'pass' upon execution failure, and expected successful response/output pattern of the step.

In order to execute defined steps, '"run_before": true' or '"run_after": true' must be defined for the step - which defines whether the step should be executed prior/after file type validations.

Note!

  • 'run_before' steps will execute prior to any file type validations.
  • 'run_after' steps will only execute if file type validations are passed.
  • 'run_before' and 'run_after' should not be defined for the same step
  • If neither 'run_before' or 'run_after' are set, the step will not execute.
  • Plugins will only run for extensions that define the correct step in 'extension_plugins'.

Variables for injection/extraction of steps fields

FileChampion supports injection and extraction of values during plugin steps execution, supporting passing/receiving of relevant values to/from other services.

Injection variables can be defined in 'endpoint', 'header', and 'body' arguments, and include:

  • ${filePath} - Injects a temporary file path which contains the file to be processed
  • ${fileContent} - Injects the file content as a base64 value to the request
  • ${fileChecksum.md5} - Inject the file md5 checksum
  • ${fileChecksum.sha1} - Inject the file sha-1 checksum
  • ${fileChecksum.sha256} - Inject the file sha-256 checksum
  • ${fileChecksum.sha512} - Inject the file sha-512 checksum

Example usage of injections:

  • "curl -X POST -F "file=@${filePath}" https://avscanner:8080/scan"
  • "java -jar remove_pdf_dangerous_objects.jar ${filePath}"
  • "vt scan file ${fileContent}"
  • "vt analysis file ${fileContent}"
  • "vt file ${fileChecksum.sha1}"

Extraction variables can be defined in 'response' argument of a step, and include:

  • ${STEP_NAME.filePath} - Extract file path from position in response
  • ${STEP_NAME.fileContent} - Extract base64 encoded file content from position in response

Example usage of extractions from response:

  • "remove_pdf_dangerous_objects results: Success. File: ${STEP_NAME.filePath}. additional information: lorem ipsum" - this will extract the new file from the variable defined location in the response, the new file will be used for the rest of the validations/storage (depends if run_before/run_after).

Following any content extraction, checksum/s will be calculated again for the new content, allowing usage of the content/checksum/s in follow up steps.

Note! Variables can be combined and concatenated with other values to achieve required patterns.

Examples:

  • endpoint: "clean_pdf_file.sh ${filePath}__clean_file.pdf" - will execute "clean_pdf_file.sh /full_path/TARGET_PDF_FILE_clean_file.pdf"
  • response: "results: Success. File: ${STEP_NAME.filePath}." - will extract and read "_/full_path/TARGET_PDF_FILE_clean_file.pdf" if response is "results: Success. File: /full_path/TARGET_PDF_FILE_clean_file.pdf." or fail if the response does not contain this pattern.

Plugins Options

Description Required Type Key Value Example
The plugin type No String "type" cli / http "type": "cli"
Whether plugin step should be executed before file validations Yes bool "run_before" true / false "run_before"
Whether plugin step should be executed after file validations Yes bool "run_after" true / false "run_after"
Step timeout in seconds after which step process will be terminated No int "timeout" seconds integer "timeout": 320
How should library behave upon timeout or failure No String "on_timeout_or_fail" pass / fail "on_timeout_or_fail": "fail""on_timeout_or_fail": "pass"
Step process cli command or http url No String "endpoint" valid command / URL "endpoint": "remove_all_active_pdf_objects.sh ${filePath}"
Expected response pattern for successful step execution No String "response" Pattern expected in return "response": "Success: ${step1.filePath}"

Defining Logging Level

FileChampion4j utilizes java default logging library. By default, logging level and properties are loaded from default locations defined for the environment, or can be set by argument:

-Djava.util.logging.config.file=src\test\resources\logging.properties

Overriding these properties can be done by using dedicated logging libraries, or running the application with java.util.logging.config.file=resources/logging.properties.

Normal use should define 'INFO' for logging level, while 'FINE' log level can be used for detailed information, intended for troubleshooting purposes.

Example JSON

{
  "General": {
    "checksums": ["MD5", "SHA-1", "SHA-256", "SHA-512"]
  },
  "Validations": {
    "SmallDocuments": {
      "pdf": {
        "mime_type": "application/pdf",
        "magic_bytes": "25504446",
        "header_signatures": "25504446",
        "footer_signatures": "2525454f46",
        "change_ownership": true,
        "change_ownership_user": "User1",
        "change_ownership_mode": "r",
        "name_encoding": true,
        "max_size": "5000",
        "add_checksum": true,
        "extension_plugins": ["clean_pdf_documents1.step1", "clean_pdf_documents2.step1"]
      },
      "doc": {
        "mime_type": "application/msword",
        "magic_bytes": "D0CF11E0A1B11AE1",
        "header_signatures": "D0CF11E0A1B11AE1",
        "footer_signatures": "0000000000000000",
        "change_ownership": true,
        "change_ownership_user": "User1",
        "change_ownership_mode": "r",
        "name_encoding": true,
        "max_size": "5000"
      }
    },
    "LargeDocuments": {
      "pdf": {
        "mime_type": "application/pdf",
        "magic_bytes": "25504446",
        "header_signatures": "25504446",
        "footer_signatures": "2525454f46",
        "change_ownership": true,
        "change_ownership_user": "User1",
        "change_ownership_mode": "r",
        "name_encoding": true,
        "max_size": "150000",
        "add_checksum": false,
        "fail_fast": true,
        "extension_plugins": ["clean_pdf_documents1.step1", "clean_pdf_documents2.step1"]
      },
      "doc": {
        "mime_type": "application/msword",
        "magic_bytes": "D0CF11E0A1B11AE1",
        "header_signatures": "D0CF11E0A1B11AE1",
        "footer_signatures": "0000000000000000",
        "change_ownership": true,
        "change_ownership_user": "User1",
        "change_ownership_mode": "r",
        "name_encoding": true,
        "max_size": "150000",
        "add_checksum": false,
        "fail_fast": true
      }
    },
    "Images": {
      "jpg": {
        "mime_type": "image/jpeg",
        "magic_bytes": "FFD8",
        "header_signatures": "FFD8FF",
        "footer_signatures": "FFD9",
        "change_ownership": true,
        "change_ownership_user": "User1",
        "change_ownership_mode": "r",
        "name_encoding": true,
        "max_size": "4000"
        },
      "png": {
        "mime_type": "image/png",
        "magic_bytes": "89504E470D0A1A0A",
        "header_signatures": "89504E470D0A1A0A0000000D49484452",
        "footer_signatures": "49454E44AE426082",
        "change_ownership": true,
        "change_ownership_user": "User1",
        "change_ownership_mode": "r",
        "name_encoding": true,
        "max_size": "4000"
      }
    }
  },
  "Plugins": {
    "clean_pdf_documents1": {
      "step1.step": {
        "type": "cli",
        "run_after": true,
        "endpoint": "java -jar plugins\\java_echo.jar Success: ${filePath}",
        "timeout": 320,
        "on_timeout_or_fail": "fail",
        "response": "Success: ${step1.filePath}"
      }
    },
    "clean_pdf_documents2": {
      "step1.step": {
        "type": "cli",
        "run_after": true,
        "endpoint": "java -jar plugins\\java_echo.jar Success: MTIzNDU2IA0K",
        "timeout": 320,
        "on_timeout_or_fail": "fail",
        "response": "Success: ${step1.fileContent}"
      }
    }
  }
}