Skip to content

jeffvestal/pii_redaction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Elastic PII Redaction on Intest Implementation Guide

What is this

This guide will show you how to set up an ingest pipeline to redact specific information on ingest. This will allow you to keep PII from being stored with an elastic search index.

Short Self-Guided Demo

ela.st/pii-redaction-demo

Main Components

  • NER model
    • An NER model is used to identify information (entities) which does not have a standard pattern or structure. The most common entities identified by these models are people, organizations, locations
  • Grok Patterns (Simlar to Regex)
    • A list of grok patterns can be configured to identify data which has a standard pattern (SSN, Credit Card Numbers, etc.)

Requirements

  • Elastic Platinum or Enterprise license
    • This is required to run the NER model
  • Machine Learning node(s)

Installation

Option 1 - Python Script

You can run this jypyter python notbook to load the HF mode and ingest pipeline

Alternatively see steps below for overview of installation

Option 2 Self Install

1. Load NER Model

A compatible NER model can be loaded from Hugging Face model hub using eland

2. Load Ingest Pipeline

An example ingest pipeline config is provided in this here

PUT _ingest/pipeline/pii_script-redact
{
... ingest pipeline json from example
}

Configuration

  1. Inference Processor
    1. Set model_id to the id the model is stored with in Elastic
      1. Kibana -> Machine Learning -> Trained Models -> listed under id column
      2. use the GET Trained Models API
  2. Redact Processor
    1. Add new Grok patterns to match the patterns in your data
      1. Create one grok pattern per value you want to match and give it a name. This name will be used to mask.
  3. Configure Data to use the pipeline through one of these approaches
    1. Configure the process sending data to Elastic to use the ingest pipeline as part of the indexing request
    2. Configure the default pipeline in the index settigns

Starting the Pipeline

  1. Start the NER model
    1. This will deploy the model to ML nodes and make it available for the inference processor
  2. Ingest Data
    1. Data configured to use the ingest pipeline will now be processed

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published