Skip to content

piisa/pii-extract-base

Repository files navigation

Pii Extract Base

version changelog license build status

This repository builds a Python package providing a base library for PII detection for Source Documents i.e. extraction of PII (Personally Identifiable Information aka Personal Data) items existing in the document.

The package itself does not implement any PII Detection tasks, it only provides the base infrastructure for the process. Detection tasks must be supplied externally.

Requirements

The package needs

  • at least Python 3.8
  • the pii-data base package
  • one or more pii-extract plugins (to actually do real detection work)

Usage

The package can be used:

  • As an API, in two flavors: function-based API and object-based API
  • As a command-line tool

For details, see the usage document.

Building

The provided Makefile can be used to process the package:

  • make pkg will build the Python package, creating a file that can be installed with pip
  • make unit will launch all unit tests (using pytest, so pytest must be available)
  • make install will install the package in a Python virtualenv. The virtualenv will be chosen as, in this order:
    • the one defined in the VENV environment variable, if it is defined
    • if there is a virtualenv activated in the shell, it will be used
    • otherwise, a default is chosen as /opt/venv/pii (it will be created if it does not exist)

About

Base library for PII Detection in documents

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published