Deidentify people's names along with pronoun substitution
This is a command-line program used to substitute a person's given name and/or surname along with any gender specific pronouns. A Windows GUI for this program is also available.
Input:
I think John Smith likes programming. You can tell he enjoys using Python.
Output:
I think PERSON likes programming. You can tell HE/SHE enjoys using Python.
- This program relies on Spacy for Named-entitiy recognition and pronoun substitution.
- For best results, you can set up a Python Virtual Environment and install
Spacy
with these settings: Spacy
can be installed with other Spacy configuration options.
git clone https://github.com/jftuga/deidentify.git
python -m venv deidentify
cd deidentify
(Windows) - scripts\activate
(Linux/MacOS) - source bin/activate
python -m pip install --upgrade pip
pip install setuptools wheel
pip install spacy
python -m spacy download en_core_web_trf
usage: deidentify.py [-h] -r REPLACEMENT [-o OUTPUT_FILE] [-H] input_file
positional arguments:
input_file text file to deidentify
optional arguments:
-h, --help show this help message and exit
-r REPLACEMENT, --replacement REPLACEMENT
a word/phrase to replace identified names with
-o OUTPUT_FILE, --output_file OUTPUT_FILE
output file
-H, --html output in HTML format
-- Windows
cd deidentify
scripts\activate
python deidentify.py -r PERSON -o output.txt input.txt
diff input.txt output.txt
-- Linux
cd deidentify
source bin/activate
python deidentify.py -r PERSON -o output.txt input.txt
diff input.txt output.txt
-- HTML Output
python deidentify.py -H -r PERSON -o output.htm input.txt
These are listed as possible_misses
in an intermeadiate JSON file named input--tokens.json
when using input.txt
as the input file.