<a href="https://colab.research.google.com/github/jeffvestal/pii_redaction/blob/main/assets/pii_redact_pipeline_setup.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Automating pii redaction pipeline setup
Code to install components for pii pipeline. Details in [this repo](https://github.com/jeffvestal/pii_redaction)

---
---
# Setup
---
---

### Install and import required packages
Elastic uses the [eland python library](https://github.com/elastic/eland) to download modesl from Hugging Face hub and load them into elasticsearch

In [None]:
pip install eland elasticsearch transformers sentence_transformers torch==1.11


In [None]:
from pathlib import Path
from eland.ml.pytorch import PyTorchModel
from eland.ml.pytorch.transformers import TransformerModel
from elasticsearch import Elasticsearch
from elasticsearch.client import MlClient
from elasticsearch.exceptions import NotFoundError

import getpass
import requests
import json

### Configure Elastic Cloud Authentication and Connect
The recommended authentication approach is using the [Elastic Cloud ID](https://www.elastic.co/guide/en/cloud/current/ec-cloud-id.html) and a [cluster level API key](https://www.elastic.co/guide/en/kibana/current/api-keys.html)

You can use any method you wish to set the required credentials. We are using getpass in this example to prompt for credentials to avoide storing them in github.

In [None]:
es_cloud_id = getpass.getpass('Enter Elastic Cloud ID:  ')
es_api_id = getpass.getpass('Enter cluster API key ID:  ') 
es_api_key = getpass.getpass('Enter cluster API key:  ')

In [None]:
es = Elasticsearch(cloud_id=es_cloud_id, 
                   api_key=(es_api_id, es_api_key)
                   )
es.info() # should return cluster info

---
---
# Loading and Starting the model
---
---

### Loading the model into Elastic
The model we used in testing is the [dslim/bert-base-NER](https://huggingface.co/dslim/bert-base-NER).

Any [Elastic compatible NER](https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-model-ref.html#ml-nlp-model-ref-ner) model can be used.

The model is downloaded from Hugging Face and then loaded into Elasticsearch for use in the inference processor.

In [None]:
hf_model_id='dslim/bert-base-NER'
tm = TransformerModel(hf_model_id, "ner")

es_model_id = tm.elasticsearch_model_id()
es_model_id

try:
  m = MlClient.get_trained_models(es, model_id=es_model_id)
  print ('Model Already Loaded -- Skipping Import')

except NotFoundError:
  print ('Model NOT currently loaded, loading into Elastic Cluster....')

  tmp_path = "models"
  Path(tmp_path).mkdir(parents=True, exist_ok=True)
  model_path, config, vocab_path = tm.save(tmp_path)

  ptm = PyTorchModel(es, es_model_id)
  ptm.import_model(model_path=model_path, config_path=None, vocab_path=vocab_path, config=config) 

  #todo add confirmation loading successful

### Starting the model for Use

In [None]:
# Optional if you want to see model information in Elastic before starting uncomment and run
#m = MlClient.get_trained_models(es, model_id=es_model_id)
#m.body

In [None]:
s = MlClient.start_trained_model_deployment(es, model_id=es_model_id)
s.body

stats = MlClient.get_trained_models_stats(es, model_id=es_model_id)
stats.body['trained_model_stats'][0]['deployment_stats']['nodes'][0]['routing_state']

---
---
# Loading Ingest Pipeline Config

---
---

Load the json pipeline definition and confirm it was loaded

In [None]:
url = "https://raw.githubusercontent.com/jeffvestal/pii_redaction/main/configuration/ingest_pipeline.pii_redact.json"

response = requests.get(url)
pipeline_definition = json.loads(response.text)


pipeline_id = 'pii_redaction_pipeline'

if es.ingest.put_pipeline(id=pipeline_id, body=pipeline_definition):
    print("Pipeline created successfully")
else:
    print("Failed to create pipeline")


pipeline = es.ingest.get_pipeline(id=pipeline_id)
pipeline.body

---
---
# Post Install Configuration
---
---

After the model has been started and the ingest pipeline has been loaded, follow the configuration steps in the [configuration section of read.me](https://)