Masking when enabled masks all columns by default unless specified not to mask.
Specification of what to unmask is specified using a configuration file. Masking needs to be enabled in redshiftbatcher configuration:
mask: true
maskSalt: sample-salt
maskFile: "/usr/inventory.yaml"
Mask all the columns in all the tables in inventory
database except the column id
in customers
table.
/usr/inventory.yaml
non_pii_keys:
customers:
- id
Conditional NonPiiKeys unmasks columns if it matches any of the pattern in the pattern list.
conditional_non_pii_keys:
customers:
email:
- '%example.com'
- '%exampledev.com'
Dependent NonPiiKeys unmask a column based on the values of other columns.
dependent_non_pii_keys:
customers:
# dependentColumnName
first_name:
# providerColumn
last_name:
- 'Jones'
- 'Dhoni'
Creates extra column containing the length or original column. email_length
gets created containing the length of data in email
column.
length_keys:
customers:
- email
Mobile keys, if specified, the first 4 digits of E164 formatted mobile numbers will be copied into an additional column.
Eg: If mobile_number is +919812345678
, +9198
is stored in mobile_number_init5
mobile_keys:
customers:
- mobile_number
Mapping PII Keys, if specified adds new columns with the masked values and when this key is specified it overrides all the keys and unmasks all the other columns
Eg: id
will be as it is(unmasked) and hashed_id
would be added with masked values.
mapping_pii_keys:
establishments:
- id
Specify one or more columns in a table as Redshift Sort Key.
sort_keys:
customers:
- created_at
Specify one or more columns in a table as Redshift Disk Key.
dist_keys:
customers:
- account_id
restrict tables that are allowed to be sinked. The operator shrinks the kafkaTopicRegex
listed tables further using include tables. This feature is supported only if you are using RedshiftSink operator.
For example: if kafkaTopicRegex: ts.inventory.*
lists 10 tables, then include_tables
will shrink it to two tables.
include_tables:
- customers
- orders
Helps in keeping free text columns masked and adds a boolean column giving boolean info about the kind of value in the free text column.
For example: We add a boolean column favourite_quote_has_philosphy
.
If value in column favourite_quote
matches the regex 'life|time'
, then the value in extra column favourite_quote_has_philosphy
is true
else false
.
Regex match is case insensitive.
regex_pattern_boolean_keys:
customers:
favourite_quote:
has_philosphy: 'life|time'
has_text_funny: 'funny'