Elastic PII Redaction on Intest Implementation Guide

What is this

This guide will show you how to set up an ingest pipeline to redact specific information on ingest. This will allow you to keep PII from being stored with an elastic search index.

Short Self-Guided Demo

ela.st/pii-redaction-demo

Main Components

NER model
- An NER model is used to identify information (entities) which does not have a standard pattern or structure. The most common entities identified by these models are people, organizations, locations
Grok Patterns (Simlar to Regex)
- A list of grok patterns can be configured to identify data which has a standard pattern (SSN, Credit Card Numbers, etc.)

Requirements

Elastic Platinum or Enterprise license
- This is required to run the NER model
Machine Learning node(s)

Installation

Option 1 - Python Script

You can run this jypyter python notbook to load the HF mode and ingest pipeline

Alternatively see steps below for overview of installation

Option 2 Self Install

1. Load NER Model

A compatible NER model can be loaded from Hugging Face model hub using eland

The model we used in testing is the dslim/bert-base-NER.
- Any Elastic compatible NER model can be used.

2. Load Ingest Pipeline

An example ingest pipeline config is provided in this here

PUT _ingest/pipeline/pii_script-redact
{
... ingest pipeline json from example
}

Configuration

Inference Processor
1. Set model_id to the id the model is stored with in Elastic
  1. Kibana -> Machine Learning -> Trained Models -> listed under id column
  2. use the GET Trained Models API
Redact Processor
1. Add new Grok patterns to match the patterns in your data
  1. Create one grok pattern per value you want to match and give it a name. This name will be used to mask.
Configure Data to use the pipeline through one of these approaches
1. Configure the process sending data to Elastic to use the ingest pipeline as part of the indexing request
2. Configure the default pipeline in the index settigns

Starting the Pipeline

Start the NER model
1. This will deploy the model to ML nodes and make it available for the inference processor
Ingest Data
1. Data configured to use the ingest pipeline will now be processed

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
configuration		configuration
tree/main/assets		tree/main/assets
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Elastic PII Redaction on Intest Implementation Guide

What is this

Short Self-Guided Demo

Main Components

Requirements

Installation

Option 1 - Python Script

Option 2 Self Install

1. Load NER Model

2. Load Ingest Pipeline

Configuration

Starting the Pipeline

About

Releases

Packages

Languages

jeffvestal/pii_redaction

Folders and files

Latest commit

History

Repository files navigation

Elastic PII Redaction on Intest Implementation Guide

What is this

Short Self-Guided Demo

Main Components

Requirements

Installation

Option 1 - Python Script

Option 2 Self Install

1. Load NER Model

2. Load Ingest Pipeline

Configuration

Starting the Pipeline

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages