-
Notifications
You must be signed in to change notification settings - Fork 528
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
61a5405
commit 24a76a8
Showing
10 changed files
with
324 additions
and
7 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,127 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "gothic-trademark", | ||
"metadata": {}, | ||
"source": [ | ||
"# Keeping some PIIs from being anonymized\n", | ||
"\n", | ||
"This sample shows how to use Presidio's `keep` anonymizer to keep some of the identified PIIs in the output string" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "roman-allergy", | ||
"metadata": {}, | ||
"source": [ | ||
"### Set up imports" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"id": "extensive-greensboro", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from presidio_anonymizer import AnonymizerEngine\n", | ||
"from presidio_anonymizer.entities import RecognizerResult, OperatorConfig" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "metropolitan-atlantic", | ||
"metadata": {}, | ||
"source": [ | ||
"### Presidio Anonymizer: Keep person names\n", | ||
"\n", | ||
"This example input has 2 PIIs, an person name and a location. We configure the anonymizer to replace the location name with a placeholder, but keep the person name unmodified." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"id": "medium-ridge", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"engine = AnonymizerEngine()\n", | ||
"\n", | ||
"# Invoke the anonymize function with the text,\n", | ||
"# analyzer results (potentially coming from presidio-analyzer)\n", | ||
"# and 'keep' operator on <PERSON> PIIs\n", | ||
"anonymize_result = engine.anonymize(\n", | ||
" text=\"My name is James Bond, I live in London\",\n", | ||
" analyzer_results=[\n", | ||
" RecognizerResult(entity_type=\"PERSON\", start=11, end=21, score=0.8),\n", | ||
" RecognizerResult(entity_type=\"LOCATION\", start=33, end=39, score=0.8),\n", | ||
" ],\n", | ||
" operators={\n", | ||
" \"PERSON\": OperatorConfig(\"keep\"),\n", | ||
" \"DEFAULT\": OperatorConfig(\"replace\"),\n", | ||
" },\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "1d2cabaa-4aa6-49cf-875d-4bdf407215b4", | ||
"metadata": {}, | ||
"source": [ | ||
"### Result: Name unmodified, but tracked\n", | ||
"\n", | ||
"The person name is preserved in the result text, but remains tracked in the items list." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"id": "421c2914-9b75-4c33-a270-e410d91d036b", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"text: My name is James Bond, I live in <LOCATION>\n", | ||
"items:\n", | ||
"[\n", | ||
" {'start': 33, 'end': 43, 'entity_type': 'LOCATION', 'text': '<LOCATION>', 'operator': 'replace'},\n", | ||
" {'start': 11, 'end': 21, 'entity_type': 'PERSON', 'text': 'James Bond', 'operator': 'keep'}\n", | ||
"]" | ||
] | ||
}, | ||
"execution_count": 3, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"anonymize_result" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.10" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
"""Keeps the PII text unmodified.""" | ||
from typing import Dict | ||
|
||
from presidio_anonymizer.operators import Operator, OperatorType | ||
|
||
|
||
class Keep(Operator): | ||
"""No-op anonymizer that keeps the PII text unmodified. | ||
This is useful when you don't want to anonymize some types of PII, | ||
but wants to keep track of it with the other PIIs. | ||
""" | ||
|
||
def operate(self, text: str = None, params: Dict = None) -> str: | ||
""":return: original text.""" | ||
return text | ||
|
||
def validate(self, params: Dict = None) -> None: | ||
"""Keep does not require any paramters so no validation is needed.""" | ||
pass | ||
|
||
def operator_name(self) -> str: | ||
"""Return operator name.""" | ||
return "keep" | ||
|
||
def operator_type(self) -> OperatorType: | ||
"""Return operator type.""" | ||
return OperatorType.Anonymize |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
import pytest | ||
|
||
from presidio_anonymizer.operators import Keep | ||
|
||
|
||
@pytest.mark.parametrize( | ||
# fmt: off | ||
"params", | ||
[ | ||
{"new_value": ""}, | ||
{}, | ||
], | ||
# fmt: on | ||
) | ||
def when_given_valid_value_then_same_string_returned(params): | ||
text = Keep().operate("bla", params) | ||
assert text == "bla" | ||
|
||
|
||
def test_when_validate_anonymizer_then_correct_name(): | ||
assert Keep().operator_name() == "keep" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters