multilingual PII tool

Extension of BigScience PII-manager data-tool (https://github.com/bigscience-workshop/data_tooling/tree/master/pii-manager). Contains new phone number recognition tasks for over 20 languages. Test done using a different scripts than the unit testing feature, unit tests to be implemented.

See Building below for installation.

New mode also implemented: convert, which converts the detected PII to a placeholder, e.g. example@email.com. convert implemented for PHONE_NUMBER, EMAIL_ADDRESS, and IP_ADDRESS. See src/api/manager.py row 36->

Example usage:

from pii_manager import PiiEnum
from pii_manager.api import PiiManager
from pii_manager.lang import COUNTRY_ANY #, LANG_ANY

lang = "en" # or any other implemented one
country = COUNTRY_ANY  # this uses all rules for English: US, UK, CA, AU, IN
tasklist = (PiiEnum.IP_ADDRESS, PiiEnum.EMAIL_ADDRESS, PiiEnum.PHONE_NUMBER)   # Define here which tasks are to be used

# Define the detector:
proc = PiiManager(lang, country, tasks=tasklist, mode="convert")  # mode can be "tag", "replace" (default) or "convert"
# or debug (all_tasks=True needed for this)
#proc = PiiManager(lang, country, all_tasks=True, debug=True)

# get info of tasks, if you so desire:
print(proc.task_info())

text = "..."
redacted = proc(text)
print(redacted)

bigscience-workshop/data-tooling/pii-manager

This repository builds a Python package that performs PII processing for text data i.e. replacement/tagging/extraction of PII (Personally Identifiable Information aka Personal Data) items existing in the text.

The PII Tasks in the package are structured by language & country, since many of the PII elements are language- and/or -country dependent.

Requirements

The package needs at least Python 3.8, and uses the python-stdnum package to validate identifiers.

Usage

The package can be used:

As an API, in two flavors: function-based API and object-based API
As a command-line tool NOTE: Command-line tool has no "convert" mode implemented.

For details, see the usage document.

Building

The provided Makefile can be used to process the package:

make pkg will build the Python package, creating a file that can be installed with pip
make unit will launch all unit tests (using pytest, so pytest must be available)
make install will install the package in a Python virtualenv. The virtualenv will be chosen as, in this order:
- the one defined in the VENV environment variable, if it is defined
- if there is a virtualenv activated in the shell, it will be used
- otherwise, a default is chosen as /opt/venv/bigscience (it will be created if it does not exist)

Contributing

To add a new PII processing task, please see the contributing instructions.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
doc		doc
src/pii_manager		src/pii_manager
test		test
.gitignore		.gitignore
CHANGES.md		CHANGES.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

multilingual PII tool

Requirements

Usage

Building

Contributing

About

Releases

Packages

Languages

License

mmanteli/multilingual-PII-tool

Folders and files

Latest commit

History

Repository files navigation

multilingual PII tool

Requirements

Usage

Building

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages