Skip to content

sealink/sensitive_data_filter

Repository files navigation

SensitiveDataFilter

Gem Version Build Status Coverage Status

A Rack Middleware filter for sensitive data

Installation

Add this line to your application's Gemfile:

gem 'sensitive_data_filter'

And then execute:

$ bundle

Or install it yourself as:

$ gem install sensitive_data_filter

Usage

Enable the middleware

Insert the middleware in the stack before any parameter parsing is performed.

E.g. for Rails, add the following in application.rb

# --- Sensitive Data Filtering ---
config.middleware.insert_before 'ActionDispatch::ParamsParser', SensitiveDataFilter::Middleware::Filter

To ensure that no sensitive data is accessed at any level of the stack, insert the middleware at the top of the stack.

E.g.

# --- Sensitive Data Filtering ---
config.middleware.insert_before 0, SensitiveDataFilter::Middleware::Filter

Important note for Rails

Rails logs the URI of the request in Rails::Rack::Logger. At this point of the stack, Rails generally has not yet set the session in the env. If you insert the sensitive data filtering middleware before this middleware you will prevent sensitive data from appearing in the logs, but you will not have access to the session via the occurrence or the env in the occurrence handling block.

Configuration

SensitiveDataFilter.config do |config|
  config.enable_types :credit_card # Already defaults to :credit_card if not specified
  config.on_occurrence do |occurrence|
    # Report occurrence
  end
  config.whitelist pattern1, pattern2 # Allows specifying patterns to whitelist matches
  config.whitelist_key key_pattern1, key_pattern2 # Allows specifying patterns to whitelist hash values based on their keys
  config.register_parser('yaml', -> params { YAML.load params }, -> params { YAML.dump params })
end

An occurrence object has the following properties:

  • origin_ip: the IP address that originated the request
  • request_method: the HTTP method for the request (GET, POST, etc.)
  • url: the URL of the request
  • content_type: the Content-Type of the request
  • original_query_params: the query parameters sent with the request
  • original_body_params: the body parameters sent with the request
  • filtered_query_params: the query parameters sent with the request, with sensitive data filtered
  • filtered_body_params: the body parameters sent with the request, with sensitive data filtered
  • session: the session properties for the request
  • matches: the matched sensitive data
  • matches_count: the number of matches per data type, e.g. { 'CreditCard' => 1 }
  • original_env: the original unfiltered Rack env
  • changeset: the modified rack env variables

It also exposes to_h and to_s methods for hash and string representation respectively.
Please note that these representations omit sensitive data, i.e. original_query_params, original_body_params and matches are not included.

Important Notes

Body parameters will not be parsed if a parser for the request's content type is not defined.

You might want to filter sensitive parameters (e.g: passwords). In Rails you can do something like:

filters = Rails.application.config.filter_parameters
filter  = ActionDispatch::Http::ParameterFilter.new filters
filtered_query_params = filter.filter @occurrence.filtered_query_params
filtered_body_params = if @occurrence.filtered_body_params.is_a? Hash
                         filter.filter @occurrence.filtered_body_params
                       else
                         @occurrence.filtered_body_params
                       end

Whitelisting

A list of whitelisting patterns can be passed to config.whitelist. Any sensitive data match which also matches any of these patterns will be ignored.

A list of whitelisting patterns can be passed to config.whitelist_key. When scanning and matching hashes, any value whose key matches any of these patterns will be ignored.

Parameter Parsing

Parsers for parameters encoded for a specific content type can be defined. The arguments for config.register_parser are:

  • a pattern to match the content type
  • a parser for the parameters
  • an unparser to convert parameters back to the encoded format

The parser and unparser must be objects that respond to call and accept the parameters as an argument (e.g. procs or lambdas).
The parser should handle parsing exceptions gracefully by returning the arguments. This ensures that sensitive data scanning and masking is applied on the raw parameters.

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Release

To publish a new version of this gem the following steps must be taken.

  • Update the version in the following files
      CHANGELOG.md
      lib/sensitive_data_filter/version.rb
    
  • Create a tag using the format v0.1.0
  • Follow build progress in GitHub actions

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/sealink/sensitive_data_filter. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

License

The gem is available as open source under the terms of the MIT License.