sensitive

sensitive is a Go library that detects sensitive data in text. It scans for credit card numbers, email addresses, Japanese phone numbers, Japanese My Number, JWTs, AWS access keys, IBANs, IP addresses, Bitcoin addresses, and Ethereum addresses, returning the position, type, and confidence level of each match. It also includes international and fintech-focused detectors such as SWIFT/BIC, US ABA routing numbers, UK sort codes, payment tokens, card CVV/expiry, and ACH trace numbers. Masking is available as an optional helper, but detection is the core focus.

The library has zero external dependencies and relies only on the Go standard library.

Requirements

Go Version: 1.24 or later
Operating Systems (tested on):
- Linux
- macOS
- Windows

Installation

go get github.com/nao1215/sensitive

Quick Start

Create a Scanner, choose which detectors to enable, call ScanString, and optionally mask findings:

package main

import (
    "fmt"

    "github.com/nao1215/sensitive"
    "github.com/nao1215/sensitive/detector"
    "github.com/nao1215/sensitive/mask"
)

func main() {
    scanner := sensitive.NewScanner(sensitive.WithAll())
    text := "user tanaka@example.com paid with 4532015112830366"
    findings := scanner.ScanString(text)

    for _, f := range findings {
        fmt.Printf("type=%s raw=%s confidence=%.2f\n",
            f.DetectorName, f.RawValue, f.Confidence)
    }

    masked := mask.Mask(text, findings, map[sensitive.DetectorName]mask.Strategy{
        detector.NamePAN:   mask.Last4,
        detector.NameEmail: mask.Partial,
    })
    fmt.Println(masked)
}

Output (order may vary):

type=pan raw=4532015112830366 confidence=1.00
type=email raw=tanaka@example.com confidence=1.00
user t*****@example.com paid with ************0366

WithAll() turns on every built-in detector. If you only care about specific types, pick them individually:

scanner := sensitive.NewScanner(sensitive.WithPAN(), sensitive.WithEmail())

Caution on WithAll(): WithAll() enables all built-in detectors, including context-based weak detectors (WithBankAccount, WithACHTrace, WithMerchantID, WithCVV, WithCardExpiry). These detectors rely on nearby keywords rather than checksums and may produce false positives. In strict/financial-audit scenarios where false positive cost is high, avoid WithAll() and enable only the specific detectors you need.

Note: NewScanner() with no options creates a scanner with zero detectors, so Scan will always return an empty result. You must pass at least one With*() option to enable detection.

Common mistakes:

// Mistake 1: No detectors — always returns empty results.
scanner := sensitive.NewScanner()
findings := scanner.ScanString("4532015112830366") // findings is empty!

// Mistake 2: WithAll() in strict mode produces noise from weak detectors.
// Use specific options instead.
scanner = sensitive.NewScanner(sensitive.WithPAN(), sensitive.WithEmail())

Supported Detectors

Option	Detects	Validation
`WithPAN()`	Credit card numbers (Visa, Mastercard, Amex, JCB, Discover, Diners, UnionPay)	BIN prefix + Luhn algorithm
`WithEmail()`	Email addresses	Structure + known TLD check
`WithJPPhone()`	Japanese phone numbers (mobile, landline, IP phone, toll-free, M2M/IoT, service)	Prefix classification + digit count
`WithMyNumber()`	Japanese My Number (12-digit individual number)	MOD 11 check digit
`WithJWT()`	JSON Web Tokens	Header decode + `alg` key check
`WithAWSKey()`	AWS Access Key IDs (`AKIA...` / `ASIA...`)	Prefix + 20-char alphanumeric
`WithIBAN()`	International Bank Account Numbers	Country code + MOD 97 check digit
`WithIPAddr()`	IPv4 and IPv6 addresses	`net.ParseIP` + octet range
`WithSWIFTBIC()`	SWIFT/BIC codes	Format + country code validation
`WithABARouting()`	US ABA routing numbers	Prefix range + checksum
`WithUKSortCode()`	UK sort codes (XX-XX-XX)	Pattern + boundary checks
`WithCVV()`	Card verification values (CVV/CVC/CID)	Context keyword + digit length (context-based, weaker)
`WithCardExpiry()`	Card expiration dates	Context keyword + MM/YY validation (context-based, weaker)
`WithPaymentToken()`	Payment processor tokens (Stripe/PayPal/Square)	Prefix + minimum body length
`WithBankAccount()`	Bank account numbers (context-based)	Context keyword + digit range (context-based, weaker)
`WithACHTrace()`	ACH trace numbers	Context keyword + prefix range (context-based, weaker)
`WithMerchantID()`	Merchant/terminal IDs	Context keyword + format (context-based, weaker)
`WithBTC()`	Bitcoin addresses (P2PKH, P2SH, Bech32, Bech32m/Taproot)	Base58Check (double SHA-256) / Bech32 polynomial checksum
`WithETH()`	Ethereum addresses (0x + 40 hex)	EIP-55 mixed-case checksum (Keccak-256)
`WithAll()`	All of the above

Benchmarks

Measurement conditions:

Command: go test -bench BenchmarkScanner -benchmem -benchtime 3s -count 5 -run '^$'
Go version: 1.24 (linux/amd64)
GOMAXPROCS: 16
CPU: AMD RYZEN AI MAX+ 395 w/ Radeon 8060S
Commit: b7e0cdc

To reproduce, run the command above. Use -count 5 and take the median for stable results. Benchmark numbers are environment-sensitive. Expect variation across Go versions, CPUs, and background load, and refresh results periodically if you publish them for compliance or audit purposes.

Per-detector benchmarks (single detector enabled)

Benchmark	ns/op	B/op	allocs/op
PAN	286.7	944	16
Email	188.2	288	9
JPPhone	171.3	464	8
MyNumber	142.0	392	6
JWT	1001	1208	25
AWSKey	147.1	280	8
IBAN	205.7	226	6
IPAddr	209.8	312	10
SWIFTBIC	176.1	288	9
ABARouting	132.7	376	6
UKSortCode	128.4	248	8
CVV	289.6	568	18
CardExpiry	261.4	456	16
PaymentToken	276.7	688	20
BankAccount	435.1	760	22
ACHTrace	325.9	480	17
MerchantID	343.4	568	18
BTC	514.5	328	7
ETH	2118	329	7

Multi-detector and edge-case benchmarks

Benchmark	Description
`BenchmarkScannerNoMatch`	All detectors enabled, input with no sensitive data. Note: detectors with nil hints (IBAN, SWIFT/BIC, ABA, MyNumber) always run regardless of input content.
`BenchmarkScannerAllDetectors`	All detectors enabled, input containing email + PAN + IP
`BenchmarkScannerEmptyInput`	All detectors enabled, nil input
`BenchmarkScannerLargeInput`	All detectors enabled, ~4KB log block with no sensitive data
`BenchmarkScannerHintMatchNoDetection`	All detectors enabled, hints match but no valid sensitive data found
`BenchmarkScannerFullWidthInput`	All detectors enabled, full-width digit input requiring normalization

Scanning Streams

For log files and other line-oriented input, use ScanLines to process data incrementally without loading the entire content into memory. The callback is invoked only for lines that contain findings:

f, _ := os.Open("access.log")
defer f.Close()

scanner := sensitive.NewScanner(sensitive.WithAll())
err := scanner.ScanLines(f, func(lineNum int, line []byte, findings []sensitive.Finding) {
    for _, finding := range findings {
        fmt.Printf("line %d: %s (%s)\n", lineNum, finding.DetectorName, finding.RawValue)
    }
})
if err != nil {
    log.Fatal(err)
}

If the entire content fits in memory, ScanReader is a simpler alternative:

f, _ := os.Open("data.txt")
defer f.Close()

findings, err := scanner.ScanReader(f)

Confidence Filtering

Use WithMinConfidence to control the strictness of detection. Findings below the threshold are filtered out:

// Strict mode: only high-confidence findings (>= 0.8).
scanner := sensitive.NewScanner(sensitive.WithAll(), sensitive.WithMinConfidence(0.8))

// Loose mode: include medium-confidence and above (>= 0.4).
scanner = sensitive.NewScanner(sensitive.WithAll(), sensitive.WithMinConfidence(0.4))

This is useful for suppressing noise from context-based weak detectors (BankAccount, CVV, CardExpiry, etc.) while keeping strong checksum-validated results.

Classifying Findings by Kind

Each finding has a Kind() method that returns a broad semantic category (financial, pii, or credential), enabling downstream classification without switching on all detector names:

for _, f := range findings {
    switch f.Kind() {
    case detector.KindFinancial:
        // PAN, IBAN, ABA routing, sort code, CVV, card expiry, etc.
    case detector.KindPII:
        // email, phone, My Number, IP address
    case detector.KindCredential:
        // JWT, AWS key, payment token
    }
}

Working with Findings

Each Finding contains the detector name, byte offsets, confidence score (0.0--1.0), the raw matched string, and a Detail struct with detector-specific information.

Note: Start and End are byte offsets, not rune (character) offsets. For multi-byte UTF-8 text (e.g., Japanese), use the byte positions directly when slicing []byte data.

Context-based detectors (WithBankAccount, WithACHTrace, WithMerchantID, WithCVV, WithCardExpiry) rely on nearby keywords rather than checksums, so they are more prone to false positives than checksum-validated detectors. Confidence scores vary by detector: WithBankAccount returns 0.50--0.65, WithMerchantID and WithACHTrace return 0.70--0.75, and WithCVV and WithCardExpiry return 0.85.

Checking the detector type

for _, f := range findings {
    if f.IsPAN() {
        // handle credit card
    }
    if f.IsEmail() {
        // handle email
    }
}

There is also a generic Is method that takes a detector name constant:

if f.Is(detector.NamePAN) { ... }

Confidence levels

Confidence is a float between 0.0 and 1.0. When you do not need the exact score, use Level() to get a categorical assessment:

switch f.Level() {
case detector.ConfidenceHigh:   // >= 0.8
case detector.ConfidenceMedium: // >= 0.4
case detector.ConfidenceLow:    // < 0.4
}

Getting detector-specific details

Every finding carries a Detail field. Instead of type-asserting it yourself, use the typed accessor methods. Each returns a pointer and a boolean indicating success:

scanner := sensitive.NewScanner(sensitive.WithPAN())
findings := scanner.ScanString("4532015112830366")

if detail, ok := findings[0].PANDetail(); ok {
    fmt.Println(detail.Brand)  // "Visa"
    fmt.Println(detail.Last4)  // "0366"
    fmt.Println(detail.Luhn)   // true
}

The available accessors and their fields:

Method	Fields
`PANDetail()`	Brand, BIN, Last4, Luhn, Length
`EmailDetail()`	Local, Domain
`JPPhoneDetail()`	PhoneType (`JPPhoneTypeMobile`, `JPPhoneTypeLandline`, `JPPhoneTypeIPPhone`, `JPPhoneTypeTollFree`, `JPPhoneTypeM2M`, `JPPhoneTypeService`)
`JWTDetail()`	Algorithm (e.g. `HS256`, `RS256`)
`AWSKeyDetail()`	KeyType (`AWSKeyTypeLongTerm` or `AWSKeyTypeTemporary`)
`IBANDetail()`	CountryCode (ISO 3166-1 alpha-2)
`IPAddrDetail()`	Version (4 or 6)
`MyNumberDetail()`	CheckDigitValid
`BTCDetail()`	AddressType (`BTCAddressP2PKH`, `BTCAddressP2SH`, `BTCAddressBech32`, `BTCAddressBech32m`)
`ETHDetail()`	EIP55 (bool, whether EIP-55 checksum validated)

Masking

The mask sub-package provides five masking strategies:

Strategy	Example
`Redact`	`4532015112830366` -> `****************`
`Last4`	`4532015112830366` -> `************0366`
`First1Last4`	`4532015112830366` -> `4***********0366`
`Partial`	`tanaka@example.com` -> `t*****@example.com`
`Hash`	`4532015112830366` -> `a8f5f167` (SHA-256 prefix)

Use mask.Mask to apply different strategies per detector:

import (
    "github.com/nao1215/sensitive"
    "github.com/nao1215/sensitive/detector"
    "github.com/nao1215/sensitive/mask"
)

scanner := sensitive.NewScanner(sensitive.WithPAN(), sensitive.WithEmail())
text := "user tanaka@example.com paid with 4532015112830366"
findings := scanner.ScanString(text)

masked := mask.Mask(text, findings, map[sensitive.DetectorName]mask.Strategy{
    detector.NamePAN:   mask.Last4,
    detector.NameEmail: mask.Partial,
})

fmt.Println(masked)
// user t*****@example.com paid with ************0366

If you want the same strategy for everything, use mask.MaskAll:

masked := mask.MaskAll(text, findings, mask.Redact)
// user ****************** paid with ****************

Custom Detectors

You can add your own detectors. The simplest way is detector.NewRegex, which wraps a compiled regular expression:

import (
    "regexp"

    "github.com/nao1215/sensitive"
    "github.com/nao1215/sensitive/detector"
)

ticketDetector := detector.NewRegex(
    detector.DetectorName("ticket_id"),
    regexp.MustCompile(`TICKET-\d{4}`),
    [][]byte{[]byte("TICKET-")},   // hint for pre-filtering
    0.9,                            // fixed confidence
)

scanner := sensitive.NewScanner(
    sensitive.WithPAN(),
    sensitive.WithDetector(ticketDetector),
)

The hints parameter is important for performance. The scanner uses bytes.Contains to check hints before calling Scan, so a good hint lets the scanner skip the regex entirely for inputs that cannot match.

For more complex logic, implement the Detector interface directly:

type Detector interface {
    Name() detector.DetectorName
    Hints() [][]byte
    Scan(data []byte) []detector.Finding
}

Full-Width Digit Support

Japanese text often uses full-width digits (０-９). Detectors that parse digit sequences directly (PAN, JPPhone, MyNumber, ABA routing, BankAccount) normalize full-width digits to half-width before detection, so a phone number written as ０９０－１２３４－５６７８ or a bank account number written as 口座番号１２３４５６７８ is correctly recognized. IBAN and UK sort code do not normalize full-width digits because their formats are primarily used in Western contexts where full-width encoding is uncommon. Context-based detectors (CVV, CardExpiry, ACHTrace, MerchantID) also do not normalize full-width digits. The utility function is also available for direct use:

normalized, posMap := detector.NormalizeFullWidthDigits([]byte("０９０－１２３４－５６７８"))
fmt.Println(string(normalized)) // 090-1234-5678

How It Works

The scanner runs a multi-stage filtering pipeline to keep scan cost low.

sequenceDiagram
    participant Caller
    participant Scanner
    participant HintFilter as Hint Filter
    participant Detector
    participant Dedup as Dedup & Sort

    Caller->>Scanner: Scan(data)
    alt input is empty
        Scanner-->>Caller: nil
    end

    loop for each registered Detector
        Scanner->>HintFilter: bytes.Contains(data, hint) (~15 ns, SIMD)
        alt no hint matched
            HintFilter-->>Scanner: skip
        else hint matched
            HintFilter-->>Scanner: pass
            Scanner->>Detector: Scan(data)
            Note right of Detector: domain-specific validation<br/>(BIN, Luhn, MOD 97, etc.)
            Detector-->>Scanner: []Finding
        end
    end

    Scanner->>Dedup: merge all findings
    Note right of Dedup: dedup overlapping (keep highest confidence)<br/>sort by confidence desc
    Dedup-->>Scanner: []Finding
    Scanner-->>Caller: []Finding

Contributing

Contributions are welcome!

If you would like to send comments such as "find a bug" or "request for additional features" to the developer, please use one of the following contacts.

GitHub Issue

License

MIT LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github		.github
detector		detector
doc/images		doc/images
mask		mask
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.octocov.yml		.octocov.yml
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
benchmark_test.go		benchmark_test.go
doc.go		doc.go
example_test.go		example_test.go
go.mod		go.mod
option.go		option.go
option_test.go		option_test.go
scanner.go		scanner.go
scanner_test.go		scanner_test.go
sensitive.go		sensitive.go
stream.go		stream.go
stream_test.go		stream_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

sensitive

Requirements

Installation

Quick Start

Supported Detectors

Benchmarks

Per-detector benchmarks (single detector enabled)

Multi-detector and edge-case benchmarks

Scanning Streams

Confidence Filtering

Classifying Findings by Kind

Working with Findings

Checking the detector type

Confidence levels

Getting detector-specific details

Masking

Custom Detectors

Full-Width Digit Support

How It Works

Contributing

License

About

Uh oh!

Releases 1

Sponsor this project

Uh oh!

Packages

Uh oh!

Languages

Uh oh!

License

nao1215/sensitive

Folders and files

Latest commit

History

Repository files navigation

sensitive

Requirements

Installation

Quick Start

Supported Detectors

Benchmarks

Per-detector benchmarks (single detector enabled)

Multi-detector and edge-case benchmarks

Scanning Streams

Confidence Filtering

Classifying Findings by Kind

Working with Findings

Checking the detector type

Confidence levels

Getting detector-specific details

Masking

Custom Detectors

Full-Width Digit Support

How It Works

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Languages

Packages