This is a tool to monitor new files dropped in a specified directory and to remove/mask pii information in the files if any.
Following are the components of the program
-
main.py
- entry point of the program
-
config.py
- config file where running configurations are written. A config entry typically contains following
- watch directory
- setup action (optional)
- response actions
- config file where running configurations are written. A config entry typically contains following
-
config_vars
- contains constant values
-
response_actions.py
- This file has all response actions as functions
-
watcher.py
- contains function watchman() which listens for new files in a given folder
-
setup_action
- task to execute before watchman is triggered. As per the use case setup action is to create a directory named todecode if same already exists, then deletes all files in the directory
- only one setup action is allowed for one watchman instance
-
pii_regex.py
- contains compiled regular expressions to detect pii information
- only the regex in pii_regex_list is used to detect pii info
- to add a new regex, add a compiled regex to the file using re.compile() and add it to the pii_regex_list also.
- Abbreviations used:
- WdT: Watch Directory Thread
- CdT: Compressed Directory Thread
- First create a virtualenv and run
pip install -r requirements.txt
pyminizip
needs to be installed and same is used to set password to zip files.
cd to pii_filter directory once this repo is cloned successfully
$python3 main.py
watchman running for /path/to/pii_filter/watch_dir
watchman running for /path/to/pii_filter/todecode
$python3 -m unittest tests/test_*.py
test_compress_files
.test_extract_all
.test_setup_action_delete_if_exists
.test_setup_action_directory_does_not_exist
.test_setup_action_no_delete
.
----------------------------------------------------------------------
Ran 5 tests in 0.010s
OK