sensitive is a Go library that detects sensitive data in text. It scans for credit card numbers, email addresses, Japanese phone numbers, Japanese My Number, JWTs, AWS access keys, IBANs, IP addresses, Bitcoin addresses, and Ethereum addresses, returning the position, type, and confidence level of each match. It also includes international and fintech-focused detectors such as SWIFT/BIC, US ABA routing numbers, UK sort codes, payment tokens, card CVV/expiry, and ACH trace numbers. Masking is available as an optional helper, but detection is the core focus.
The library has zero external dependencies and relies only on the Go standard library.
- Go Version: 1.24 or later
- Operating Systems (tested on):
- Linux
- macOS
- Windows
go get github.com/nao1215/sensitiveCreate a Scanner, choose which detectors to enable, call ScanString, and optionally mask findings:
package main
import (
"fmt"
"github.com/nao1215/sensitive"
"github.com/nao1215/sensitive/detector"
"github.com/nao1215/sensitive/mask"
)
func main() {
scanner := sensitive.NewScanner(sensitive.WithAll())
text := "user tanaka@example.com paid with 4532015112830366"
findings := scanner.ScanString(text)
for _, f := range findings {
fmt.Printf("type=%s raw=%s confidence=%.2f\n",
f.DetectorName, f.RawValue, f.Confidence)
}
masked := mask.Mask(text, findings, map[sensitive.DetectorName]mask.Strategy{
detector.NamePAN: mask.Last4,
detector.NameEmail: mask.Partial,
})
fmt.Println(masked)
}Output (order may vary):
type=pan raw=4532015112830366 confidence=1.00
type=email raw=tanaka@example.com confidence=1.00
user t*****@example.com paid with ************0366
WithAll() turns on every built-in detector. If you only care about specific types, pick them individually:
scanner := sensitive.NewScanner(sensitive.WithPAN(), sensitive.WithEmail())Caution on
WithAll():WithAll()enables all built-in detectors, including context-based weak detectors (WithBankAccount,WithACHTrace,WithMerchantID,WithCVV,WithCardExpiry). These detectors rely on nearby keywords rather than checksums and may produce false positives. In strict/financial-audit scenarios where false positive cost is high, avoidWithAll()and enable only the specific detectors you need.
Note:
NewScanner()with no options creates a scanner with zero detectors, soScanwill always return an empty result. You must pass at least oneWith*()option to enable detection.
Common mistakes:
// Mistake 1: No detectors — always returns empty results.
scanner := sensitive.NewScanner()
findings := scanner.ScanString("4532015112830366") // findings is empty!
// Mistake 2: WithAll() in strict mode produces noise from weak detectors.
// Use specific options instead.
scanner = sensitive.NewScanner(sensitive.WithPAN(), sensitive.WithEmail())| Option | Detects | Validation |
|---|---|---|
WithPAN() |
Credit card numbers (Visa, Mastercard, Amex, JCB, Discover, Diners, UnionPay) | BIN prefix + Luhn algorithm |
WithEmail() |
Email addresses | Structure + known TLD check |
WithJPPhone() |
Japanese phone numbers (mobile, landline, IP phone, toll-free, M2M/IoT, service) | Prefix classification + digit count |
WithMyNumber() |
Japanese My Number (12-digit individual number) | MOD 11 check digit |
WithJWT() |
JSON Web Tokens | Header decode + alg key check |
WithAWSKey() |
AWS Access Key IDs (AKIA... / ASIA...) |
Prefix + 20-char alphanumeric |
WithIBAN() |
International Bank Account Numbers | Country code + MOD 97 check digit |
WithIPAddr() |
IPv4 and IPv6 addresses | net.ParseIP + octet range |
WithSWIFTBIC() |
SWIFT/BIC codes | Format + country code validation |
WithABARouting() |
US ABA routing numbers | Prefix range + checksum |
WithUKSortCode() |
UK sort codes (XX-XX-XX) | Pattern + boundary checks |
WithCVV() |
Card verification values (CVV/CVC/CID) | Context keyword + digit length (context-based, weaker) |
WithCardExpiry() |
Card expiration dates | Context keyword + MM/YY validation (context-based, weaker) |
WithPaymentToken() |
Payment processor tokens (Stripe/PayPal/Square) | Prefix + minimum body length |
WithBankAccount() |
Bank account numbers (context-based) | Context keyword + digit range (context-based, weaker) |
WithACHTrace() |
ACH trace numbers | Context keyword + prefix range (context-based, weaker) |
WithMerchantID() |
Merchant/terminal IDs | Context keyword + format (context-based, weaker) |
WithBTC() |
Bitcoin addresses (P2PKH, P2SH, Bech32, Bech32m/Taproot) | Base58Check (double SHA-256) / Bech32 polynomial checksum |
WithETH() |
Ethereum addresses (0x + 40 hex) | EIP-55 mixed-case checksum (Keccak-256) |
WithAll() |
All of the above |
Measurement conditions:
- Command:
go test -bench BenchmarkScanner -benchmem -benchtime 3s -count 5 -run '^$' - Go version: 1.24 (linux/amd64)
- GOMAXPROCS: 16
- CPU: AMD RYZEN AI MAX+ 395 w/ Radeon 8060S
- Commit: b7e0cdc
To reproduce, run the command above. Use -count 5 and take the median for stable results.
Benchmark numbers are environment-sensitive. Expect variation across Go versions, CPUs, and background load, and refresh results periodically if you publish them for compliance or audit purposes.
| Benchmark | ns/op | B/op | allocs/op |
|---|---|---|---|
| PAN | 286.7 | 944 | 16 |
| 188.2 | 288 | 9 | |
| JPPhone | 171.3 | 464 | 8 |
| MyNumber | 142.0 | 392 | 6 |
| JWT | 1001 | 1208 | 25 |
| AWSKey | 147.1 | 280 | 8 |
| IBAN | 205.7 | 226 | 6 |
| IPAddr | 209.8 | 312 | 10 |
| SWIFTBIC | 176.1 | 288 | 9 |
| ABARouting | 132.7 | 376 | 6 |
| UKSortCode | 128.4 | 248 | 8 |
| CVV | 289.6 | 568 | 18 |
| CardExpiry | 261.4 | 456 | 16 |
| PaymentToken | 276.7 | 688 | 20 |
| BankAccount | 435.1 | 760 | 22 |
| ACHTrace | 325.9 | 480 | 17 |
| MerchantID | 343.4 | 568 | 18 |
| BTC | 514.5 | 328 | 7 |
| ETH | 2118 | 329 | 7 |
| Benchmark | Description |
|---|---|
BenchmarkScannerNoMatch |
All detectors enabled, input with no sensitive data. Note: detectors with nil hints (IBAN, SWIFT/BIC, ABA, MyNumber) always run regardless of input content. |
BenchmarkScannerAllDetectors |
All detectors enabled, input containing email + PAN + IP |
BenchmarkScannerEmptyInput |
All detectors enabled, nil input |
BenchmarkScannerLargeInput |
All detectors enabled, ~4KB log block with no sensitive data |
BenchmarkScannerHintMatchNoDetection |
All detectors enabled, hints match but no valid sensitive data found |
BenchmarkScannerFullWidthInput |
All detectors enabled, full-width digit input requiring normalization |
For log files and other line-oriented input, use ScanLines to process data incrementally without loading the entire content into memory. The callback is invoked only for lines that contain findings:
f, _ := os.Open("access.log")
defer f.Close()
scanner := sensitive.NewScanner(sensitive.WithAll())
err := scanner.ScanLines(f, func(lineNum int, line []byte, findings []sensitive.Finding) {
for _, finding := range findings {
fmt.Printf("line %d: %s (%s)\n", lineNum, finding.DetectorName, finding.RawValue)
}
})
if err != nil {
log.Fatal(err)
}If the entire content fits in memory, ScanReader is a simpler alternative:
f, _ := os.Open("data.txt")
defer f.Close()
findings, err := scanner.ScanReader(f)Use WithMinConfidence to control the strictness of detection. Findings below the threshold are filtered out:
// Strict mode: only high-confidence findings (>= 0.8).
scanner := sensitive.NewScanner(sensitive.WithAll(), sensitive.WithMinConfidence(0.8))
// Loose mode: include medium-confidence and above (>= 0.4).
scanner = sensitive.NewScanner(sensitive.WithAll(), sensitive.WithMinConfidence(0.4))This is useful for suppressing noise from context-based weak detectors (BankAccount, CVV, CardExpiry, etc.) while keeping strong checksum-validated results.
Each finding has a Kind() method that returns a broad semantic category (financial, pii, or credential), enabling downstream classification without switching on all detector names:
for _, f := range findings {
switch f.Kind() {
case detector.KindFinancial:
// PAN, IBAN, ABA routing, sort code, CVV, card expiry, etc.
case detector.KindPII:
// email, phone, My Number, IP address
case detector.KindCredential:
// JWT, AWS key, payment token
}
}Each Finding contains the detector name, byte offsets, confidence score (0.0--1.0), the raw matched string, and a Detail struct with detector-specific information.
Note:
StartandEndare byte offsets, not rune (character) offsets. For multi-byte UTF-8 text (e.g., Japanese), use the byte positions directly when slicing[]bytedata.Context-based detectors (
WithBankAccount,WithACHTrace,WithMerchantID,WithCVV,WithCardExpiry) rely on nearby keywords rather than checksums, so they are more prone to false positives than checksum-validated detectors. Confidence scores vary by detector:WithBankAccountreturns 0.50--0.65,WithMerchantIDandWithACHTracereturn 0.70--0.75, andWithCVVandWithCardExpiryreturn 0.85.
for _, f := range findings {
if f.IsPAN() {
// handle credit card
}
if f.IsEmail() {
// handle email
}
}There is also a generic Is method that takes a detector name constant:
if f.Is(detector.NamePAN) { ... }Confidence is a float between 0.0 and 1.0. When you do not need the exact score, use Level() to get a categorical assessment:
switch f.Level() {
case detector.ConfidenceHigh: // >= 0.8
case detector.ConfidenceMedium: // >= 0.4
case detector.ConfidenceLow: // < 0.4
}Every finding carries a Detail field. Instead of type-asserting it yourself, use the typed accessor methods. Each returns a pointer and a boolean indicating success:
scanner := sensitive.NewScanner(sensitive.WithPAN())
findings := scanner.ScanString("4532015112830366")
if detail, ok := findings[0].PANDetail(); ok {
fmt.Println(detail.Brand) // "Visa"
fmt.Println(detail.Last4) // "0366"
fmt.Println(detail.Luhn) // true
}The available accessors and their fields:
| Method | Fields |
|---|---|
PANDetail() |
Brand, BIN, Last4, Luhn, Length |
EmailDetail() |
Local, Domain |
JPPhoneDetail() |
PhoneType (JPPhoneTypeMobile, JPPhoneTypeLandline, JPPhoneTypeIPPhone, JPPhoneTypeTollFree, JPPhoneTypeM2M, JPPhoneTypeService) |
JWTDetail() |
Algorithm (e.g. HS256, RS256) |
AWSKeyDetail() |
KeyType (AWSKeyTypeLongTerm or AWSKeyTypeTemporary) |
IBANDetail() |
CountryCode (ISO 3166-1 alpha-2) |
IPAddrDetail() |
Version (4 or 6) |
MyNumberDetail() |
CheckDigitValid |
BTCDetail() |
AddressType (BTCAddressP2PKH, BTCAddressP2SH, BTCAddressBech32, BTCAddressBech32m) |
ETHDetail() |
EIP55 (bool, whether EIP-55 checksum validated) |
The mask sub-package provides five masking strategies:
| Strategy | Example |
|---|---|
Redact |
4532015112830366 -> **************** |
Last4 |
4532015112830366 -> ************0366 |
First1Last4 |
4532015112830366 -> 4***********0366 |
Partial |
tanaka@example.com -> t*****@example.com |
Hash |
4532015112830366 -> a8f5f167 (SHA-256 prefix) |
Use mask.Mask to apply different strategies per detector:
import (
"github.com/nao1215/sensitive"
"github.com/nao1215/sensitive/detector"
"github.com/nao1215/sensitive/mask"
)
scanner := sensitive.NewScanner(sensitive.WithPAN(), sensitive.WithEmail())
text := "user tanaka@example.com paid with 4532015112830366"
findings := scanner.ScanString(text)
masked := mask.Mask(text, findings, map[sensitive.DetectorName]mask.Strategy{
detector.NamePAN: mask.Last4,
detector.NameEmail: mask.Partial,
})
fmt.Println(masked)
// user t*****@example.com paid with ************0366If you want the same strategy for everything, use mask.MaskAll:
masked := mask.MaskAll(text, findings, mask.Redact)
// user ****************** paid with ****************You can add your own detectors. The simplest way is detector.NewRegex, which wraps a compiled regular expression:
import (
"regexp"
"github.com/nao1215/sensitive"
"github.com/nao1215/sensitive/detector"
)
ticketDetector := detector.NewRegex(
detector.DetectorName("ticket_id"),
regexp.MustCompile(`TICKET-\d{4}`),
[][]byte{[]byte("TICKET-")}, // hint for pre-filtering
0.9, // fixed confidence
)
scanner := sensitive.NewScanner(
sensitive.WithPAN(),
sensitive.WithDetector(ticketDetector),
)The hints parameter is important for performance. The scanner uses bytes.Contains to check hints before calling Scan, so a good hint lets the scanner skip the regex entirely for inputs that cannot match.
For more complex logic, implement the Detector interface directly:
type Detector interface {
Name() detector.DetectorName
Hints() [][]byte
Scan(data []byte) []detector.Finding
}Japanese text often uses full-width digits (0-9). Detectors that parse digit sequences directly (PAN, JPPhone, MyNumber, ABA routing, BankAccount) normalize full-width digits to half-width before detection, so a phone number written as 090-1234-5678 or a bank account number written as 口座番号 12345678 is correctly recognized. IBAN and UK sort code do not normalize full-width digits because their formats are primarily used in Western contexts where full-width encoding is uncommon. Context-based detectors (CVV, CardExpiry, ACHTrace, MerchantID) also do not normalize full-width digits. The utility function is also available for direct use:
normalized, posMap := detector.NormalizeFullWidthDigits([]byte("090-1234-5678"))
fmt.Println(string(normalized)) // 090-1234-5678The scanner runs a multi-stage filtering pipeline to keep scan cost low.
sequenceDiagram
participant Caller
participant Scanner
participant HintFilter as Hint Filter
participant Detector
participant Dedup as Dedup & Sort
Caller->>Scanner: Scan(data)
alt input is empty
Scanner-->>Caller: nil
end
loop for each registered Detector
Scanner->>HintFilter: bytes.Contains(data, hint) (~15 ns, SIMD)
alt no hint matched
HintFilter-->>Scanner: skip
else hint matched
HintFilter-->>Scanner: pass
Scanner->>Detector: Scan(data)
Note right of Detector: domain-specific validation<br/>(BIN, Luhn, MOD 97, etc.)
Detector-->>Scanner: []Finding
end
end
Scanner->>Dedup: merge all findings
Note right of Dedup: dedup overlapping (keep highest confidence)<br/>sort by confidence desc
Dedup-->>Scanner: []Finding
Scanner-->>Caller: []Finding
Contributions are welcome!
If you would like to send comments such as "find a bug" or "request for additional features" to the developer, please use one of the following contacts.
