diff --git a/README.md b/README.md index f2d24c4..62681a7 100644 --- a/README.md +++ b/README.md @@ -3,19 +3,29 @@ [![Node CI](https://github.com/ioncache/data-sanitization/actions/workflows/ci.yml/badge.svg)](https://github.com/ioncache/data-sanitization/actions/workflows/ci.yml) [![Coverage](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/ioncache/e2afdd1c4942b8c99362ceb3853a331e/raw/coverage.json)](https://gist.github.com/ioncache/e2afdd1c4942b8c99362ceb3853a331e) -Pattern-based sanitization for sensitive data in objects and strings. Masks or removes fields matching configurable patterns, making data safe for logging or external exposure. +Pattern-based sanitization for sensitive data in objects and strings. Use it to +mask or remove fields before logging, debugging, or sending data to systems that +should not receive sensitive values such as secrets, PII, PHI, credentials, or +other private data. -Works with both JavaScript and TypeScript — ships with compiled JS, TypeScript declarations, and source maps. +Works with JavaScript and TypeScript. The package ships compiled JavaScript, +TypeScript declarations, and source maps. ## Table of Contents - [data-sanitization](#data-sanitization) - [Table of Contents](#table-of-contents) - [Installation](#installation) + - [npm](#npm) + - [Yarn](#yarn) + - [pnpm](#pnpm) + - [Bun](#bun) + - [Importing](#importing) - [Usage](#usage) - - [Sanitize an object](#sanitize-an-object) + - [Quick start](#quick-start) - [Sanitize a string](#sanitize-a-string) - [Remove fields instead of masking](#remove-fields-instead-of-masking) + - [Sanitize PII and PHI with custom patterns](#sanitize-pii-and-phi-with-custom-patterns) - [Options](#options) - [Default patterns](#default-patterns) - [Default matchers](#default-matchers) @@ -27,17 +37,55 @@ Works with both JavaScript and TypeScript — ships with compiled JS, TypeScript ## Installation +Install with the package manager used by your project. + +### npm + ```bash npm install data-sanitization ``` +### Yarn + ```bash yarn add data-sanitization ``` +### pnpm + +```bash +pnpm add data-sanitization +``` + +### Bun + +```bash +bun add data-sanitization +``` + +## Importing + +The named export is recommended: + +```typescript +import { sanitizeData, DataSanitizationError } from 'data-sanitization'; +``` + +The sanitizer is also available as the default export: + +```typescript +import sanitizeData from 'data-sanitization'; +``` + +CommonJS consumers can require the compiled package: + +```javascript +const { sanitizeData } = require('data-sanitization'); +``` + ## Usage -### Sanitize an object +### Quick start ```typescript import { sanitizeData } from 'data-sanitization'; @@ -54,7 +102,8 @@ const result = sanitizeData(input); ### Sanitize a string -Works with JSON strings and form-encoded strings: +String sanitization works with JSON-like strings, escaped JSON-like strings, and +form-encoded strings: ```typescript sanitizeData('{"password":"secret","username":"mark"}'); @@ -74,20 +123,80 @@ sanitizeData( // => { username: 'mark' } ``` +### Sanitize PII and PHI with custom patterns + +Use `customPatterns` to mask fields that are sensitive for your domain, such as +PII or PHI fields. + +```typescript +import { sanitizeData } from 'data-sanitization'; + +const sensitivePatterns = [ + 'address', + 'date_of_birth', + 'email', + 'emergency_contact', + 'full_name', + 'health_card', + 'ip_address', + 'medications', + 'phone', + 'postal_code', + 'ssn', +]; + +const patient = { + accountId: 'acct_123', + full_name: 'Avery Example', + email: 'avery@example.com', + phone: '+1-555-0100', + date_of_birth: '1989-04-12', + health_card: 'HC-1234-5678', + medications: ['example-medication'], +}; + +sanitizeData(patient, { + customPatterns: sensitivePatterns, + useDefaultPatterns: false, +}); +// => { +// accountId: 'acct_123', +// full_name: '**********', +// email: '**********', +// phone: '**********', +// date_of_birth: '**********', +// health_card: '**********', +// medications: '**********', +// } +``` + +Use `removeMatches` with the same patterns to remove those fields instead of +masking them. + +```typescript +sanitizeData(patient, { + customPatterns: sensitivePatterns, + useDefaultPatterns: false, + removeMatches: true, +}); +// => { accountId: 'acct_123' } +``` + ## Options -| Option | Type | Default | Description | -| -------------------- | --------------------------- | ------------ | ------------------------------------------------- | -| `patternMask` | `string` | `**********` | String used to replace matched field values | -| `removeMatches` | `boolean` | `false` | Remove matched fields entirely instead of masking | -| `customPatterns` | `string[]` | | Additional field name patterns to match | -| `customMatchers` | `DataSanitizationMatcher[]` | | Additional regex matchers for custom data formats | -| `useDefaultPatterns` | `boolean` | `true` | Whether to include the built-in default patterns | -| `useDefaultMatchers` | `boolean` | `true` | Whether to include the built-in default matchers | +| Option | Type | Default | Description | +| -------------------- | --------------------------- | ------------ | --------------------------------------------------- | +| `patternMask` | `string` | `**********` | String used to replace matched field values | +| `removeMatches` | `boolean` | `false` | Remove matched fields entirely instead of masking | +| `customPatterns` | `string[]` | `[]` | Additional field name patterns to match | +| `customMatchers` | `DataSanitizationMatcher[]` | `[]` | Additional regex matchers for custom string formats | +| `useDefaultPatterns` | `boolean` | `true` | Whether to include the built-in default patterns | +| `useDefaultMatchers` | `boolean` | `true` | Whether to include the built-in default matchers | ## Default patterns -The following field name patterns are matched by default (case-insensitive, substring match): +The following field name patterns are matched by default using a +case-insensitive substring match: - `apikey` - `api_key` @@ -102,27 +211,33 @@ these patterns match as substrings. Three matchers are included by default: -- **JSON matcher** — matches `"fieldName":"value"` patterns in JSON and JSON-like strings -- **Escaped JSON matcher** — matches `\"fieldName\":\"value\"` patterns in JSON embedded inside JSON string values -- **Form-encoded matcher** — matches `fieldName=value` and `fieldName:value` patterns in URL-encoded and similarly delimited strings +- **JSON matcher** — matches `"fieldName":"value"` patterns in JSON and + JSON-like strings +- **Escaped JSON matcher** — matches `\"fieldName\":\"value\"` patterns in + JSON embedded inside JSON string values +- **Form-encoded matcher** — matches `fieldName=value` and `fieldName:value` + patterns in URL-encoded and similarly delimited strings ## Custom patterns and matchers ```typescript import { sanitizeData } from 'data-sanitization'; -// Add a custom pattern alongside defaults +const data = { + username: 'mark', + ssn: '123-45-6789', + credit_card: '4111111111111111', +}; + sanitizeData(data, { customPatterns: ['ssn', 'credit_card'], }); -// Use only custom patterns, no defaults sanitizeData(data, { customPatterns: ['ssn'], useDefaultPatterns: false, }); -// Use a custom mask sanitizeData(data, { patternMask: '[REDACTED]', }); @@ -133,12 +248,24 @@ takes a pattern string and returns a global, case-insensitive `RegExp`. The regex must use capture groups `$1` and `$2` to preserve the field name and trailing delimiter while replacing the value. +```typescript +const headerMatcher = (pattern: string) => + new RegExp(`(${pattern}:\\s*).+?(\\n|$)`, 'gi'); + +sanitizeData('authorization: Bearer abc123\nuser: mark', { + customPatterns: ['authorization'], + customMatchers: [headerMatcher], + useDefaultMatchers: false, +}); +// => 'authorization: **********\nuser: mark' +``` + ## Error handling `sanitizeData` throws a `DataSanitizationError` when: -- The input is not a `string` or `object` (e.g., `number`, `boolean`, `undefined`) -- An unexpected error occurs during sanitization (e.g., malformed JSON that cannot be re-parsed) +- The input is not a `string`, `object`, or `null`. +- An unexpected error occurs during sanitization. ```typescript import { sanitizeData, DataSanitizationError } from 'data-sanitization'; @@ -159,13 +286,22 @@ original input payload. ## How it works 1. **String input** is sanitized directly via regex replacement with the configured matchers. -2. **Object input** is sanitized recursively by key name without JSON serialization. Sensitive keys are masked or removed regardless of whether their values are strings, numbers, arrays, objects, or other primitives. -3. **Plain nested objects and arrays** are cloned as they are sanitized. Non-plain object instances are preserved without modification to avoid corrupting their prototypes. -4. Each configured pattern is matched case-insensitively against object keys. For string input, each configured pattern is tested against each matcher to produce regex instances that find and replace sensitive field values. +2. **Object input** is sanitized recursively by key name without JSON + serialization. Sensitive keys are masked or removed regardless of whether + their values are strings, numbers, arrays, objects, or other primitives. +3. **Plain nested objects and arrays** are cloned as they are sanitized. + Non-plain object instances are preserved without modification to avoid + corrupting their prototypes. +4. **Null input** is accepted and returns `null`. +5. Each configured pattern is matched case-insensitively against object keys. + For string input, each configured pattern is tested against each matcher to + produce regex instances that find and replace sensitive field values. ## Contributing -For development setup, testing, and release process, see [docs/development.md](docs/development.md). +For development setup, testing, and release process, see +[docs/development.md](docs/development.md). For future direction, see +[docs/ROADMAP.md](docs/ROADMAP.md). ## License diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md index dcc4899..8542779 100644 --- a/docs/ROADMAP.md +++ b/docs/ROADMAP.md @@ -13,8 +13,8 @@ on configurable patterns; ships TypeScript declarations; and avoids exposing input payloads in sanitizer error details. The project should continue to prioritize a small public API, predictable -behavior, safe logging use cases, and low-friction adoption in JavaScript and -TypeScript projects. +behavior, sensitive-data sanitization for logging and debugging workflows, and +low-friction adoption in JavaScript and TypeScript projects. ## Near-Term v1.x Work diff --git a/docs/development.md b/docs/development.md index 891eac3..726b587 100644 --- a/docs/development.md +++ b/docs/development.md @@ -2,15 +2,18 @@ ## Setup -This repository uses Yarn and Husky hooks. +This repository uses Yarn, Husky hooks, and Volta-pinned tool versions. Install +Volta or use compatible local versions of Node and Yarn before installing +dependencies. ```bash yarn install ``` -Common commands: +Common package scripts: ```bash +yarn build yarn format yarn format:check yarn lint @@ -26,7 +29,9 @@ Build artifacts are emitted to `dist/`: yarn build ``` -`prepack` runs the build automatically to ensure published packages use compiled output. +The build emits compiled JavaScript, TypeScript declarations, and source maps. +`prepack` runs the build automatically to ensure published packages use compiled +output. ## Testing @@ -93,6 +98,15 @@ yarn release --bump patch Supported bump values: `major`, `minor`, `patch`. +Before publishing or cutting a release, run the local validation scripts: + +```bash +yarn format:check +yarn lint +yarn build +yarn test:coverage +``` + Live release behavior: 1. Generates release notes from conventional commits. diff --git a/docs/plans/001-coverage-tracking.md b/docs/plans/001-coverage-tracking.md index aa87eb5..9d34e70 100644 --- a/docs/plans/001-coverage-tracking.md +++ b/docs/plans/001-coverage-tracking.md @@ -11,7 +11,8 @@ accounts are required beyond GitHub. ## Pre-implementation - Create a public GitHub Gist with a file named `coverage.json` -- Create a classic PAT with `gist` scope at https://github.com/settings/tokens +- Create a classic PAT with `gist` scope at + [github.com/settings/tokens](https://github.com/settings/tokens) - Add the PAT as repository secret `GIST_SECRET` - Add the Gist ID as repository variable `COVERAGE_GIST_ID` - Create GitHub issue #274 diff --git a/docs/plans/004-readme-release-polish.md b/docs/plans/004-readme-release-polish.md new file mode 100644 index 0000000..b6e54bf --- /dev/null +++ b/docs/plans/004-readme-release-polish.md @@ -0,0 +1,70 @@ +# README Release Polish + +## Approach + +Update the package-facing documentation and registry metadata so the v1 library +is easier to evaluate, install, and use without changing runtime behavior. This +is the first concrete implementation slice from the post-v1 roadmap, focused on +README clarity, package-manager installation guidance, TypeScript/import +expectations, and lightweight release metadata. + +## Pre-implementation + +Create branch `docs/readme-release-polish` from `main` after committing the +standalone roadmap. + +## Steps + +1. `docs/plans/004-readme-release-polish.md` - add this plan for the README and + release polish work. +2. `README.md` - refresh the opening, quick-start flow, installation section, + import examples, usage examples, and wording around current behavior. +3. `package.json` - add non-promotional registry metadata such as `homepage` + and `bugs` while leaving funding metadata omitted. +4. `docs/development.md` - clarify contributor setup and release validation + expectations, including Volta and the package scripts used for checks. +5. `docs/ROADMAP.md` - keep roadmap wording aligned with the README's + sensitive-data positioning. +6. `docs/plans/001-coverage-tracking.md` - fix the existing markdown bare URL + diagnostic if this branch runs documentation linting against plan files. + +## Relevant Files + +- `docs/plans/004-readme-release-polish.md` - new plan for this documentation + and metadata slice. +- `README.md` - updated user-facing documentation for install, imports, usage, + options, and behavior. +- `package.json` - updated package metadata for registry discoverability. +- `docs/development.md` - updated contributor and release validation notes. +- `docs/ROADMAP.md` - updated roadmap wording for sensitive-data positioning. +- `docs/plans/001-coverage-tracking.md` - existing plan with a markdown + diagnostic that may be corrected if validation requires it. + +## Verification + +1. Run `yarn format:check`. +2. Run `yarn lint`. +3. Run `yarn build`. +4. Re-run workspace diagnostics for touched Markdown and JSON files. +5. Manually review `README.md` for readable GitHub rendering, accurate anchors, + and examples that do not imply unsupported package ecosystems. + +## Decisions + +**Roadmap before implementation plan** - the long-term post-v1 direction lives +in `docs/ROADMAP.md`, while this numbered plan records the first concrete +branch-sized execution slice. + +**Documentation and metadata only** - this branch avoids runtime code changes so +README polish, package metadata, and contributor documentation can be reviewed +without behavior risk. + +**Install docs cover package managers where the npm package is usable** - npm, +yarn, pnpm, and bun are included. Deno or JSR instructions are omitted unless +the package is actually published for those ecosystems. + +**Funding metadata remains omitted** - the package should gain `homepage` and +`bugs` metadata, but no funding link is added for this pass. + +**Changelog remains optional** - release history can be revisited later; it is +not part of this first release-polish branch. diff --git a/package.json b/package.json index a58109a..2b0ae40 100644 --- a/package.json +++ b/package.json @@ -1,7 +1,7 @@ { "name": "data-sanitization", "version": "1.0.1", - "description": "Sanitization library for obfuscating/removing/securing data.", + "description": "Sanitization library for masking or removing sensitive data.", "keywords": [ "sanitize", "sanitization", @@ -11,6 +11,8 @@ "mask", "masking", "sensitive-data", + "pii", + "phi", "secrets", "logging", "typescript" @@ -19,6 +21,10 @@ "type": "git", "url": "https://github.com/ioncache/data-sanitization.git" }, + "homepage": "https://github.com/ioncache/data-sanitization#readme", + "bugs": { + "url": "https://github.com/ioncache/data-sanitization/issues" + }, "main": "./dist/index.js", "types": "./dist/index.d.ts", "exports": {