# Name & Address Parser — Usage Examples

This notebook demonstrates how to use the `NameAddressParser` to extract structured data from OCR-scanned customer records.

**Prerequisites:** Make sure you have:
1. Set up the conda environment: `conda env create -f environment.yaml && conda activate name-parsing`
2. Installed the package: `pip install -e .`
3. A trained model in `models/onnx/quantized/` (either included in the repo or generated via the training pipeline)

## 1. Basic Usage

In [None]:
from name_parsing import NameAddressParser

# Load the model (this takes ~1 second the first time)
parser = NameAddressParser("../models/onnx/quantized")

In [None]:
# Simple single-name input
parser.parse("John Smith, 500 Oak Ave, Denver CO 80201")

In [None]:
# Multiple names with shared last name — extracts only the first person
parser.parse("Alex or Mary Doe, 1201 Braddock Ave, Richmond VA 22312")

In [None]:
# Separate full names — still extracts only the first
parser.parse("Alex Doe or Mary Smith, 500 Oak Ave, Denver CO 80201")

## 2. Handling OCR Noise

Real OCR output is messy. The model is trained on synthetic OCR noise and handles common errors.

In [None]:
# OCR-merged tokens: "37/harbor" instead of "37 Harbor"
# The model correctly identifies "harbor" as the street name
parser.parse("sarah martinez 37/harbor way coastal city, ca 90210")

In [None]:
# Middle initials and abbreviations
parser.parse("James R. Wilson, 742 Evergreen Ter, Springfield IL 62704")

In [None]:
# With email appended (common in scanned contact records)
parser.parse("Alex or Mary Doe, 1201 Braddock Ave, Richmond VA, 22312, contact@email.com")

## 3. Batch Processing

In [None]:
import pandas as pd
pd.set_option('display.max_colwidth', None)

texts = [
    "Robert Chen 450 Maplewood Dr, Springfield IL 62704",
    "sarah martinez 37/harbor way coastal city, ca 90210",
    "Kim Y. Wong David R. Wong 7607 270th Street, New Hyde Park NY 11040",
    "James T. Parker Lisa R. Parker 90 Silver Lake Dr, Port Richmond NY 10301",
    "Mike Reynolds DBA Reynolds Consulting 2100 Clearwater Blvd, Tampa FL 33601",
]

results = parser.parse_batch(texts)
pd.DataFrame({"input": texts, "output": results})

## 4. Latency Benchmark

In [None]:
import time

text = "Alex or Mary Doe, 1201 Braddock Ave, Richmond VA, 22312, contact@email.com"

# Warmup
for _ in range(10):
    parser.parse(text)

# Measure
latencies = []
for _ in range(200):
    start = time.perf_counter()
    parser.parse(text)
    latencies.append((time.perf_counter() - start) * 1000)

latencies.sort()
print(f"Latency over 200 runs:")
print(f"  p50:  {latencies[99]:.1f} ms")
print(f"  p95:  {latencies[189]:.1f} ms")
print(f"  p99:  {latencies[197]:.1f} ms")
print(f"  max:  {latencies[-1]:.1f} ms")

## 5. Edge Cases

In [None]:
# Empty input
print("Empty:", parser.parse(""))

# Whitespace only
print("Whitespace:", parser.parse("   "))

# All lowercase, no commas
print("No commas:", parser.parse("john smith 100 main street anytown ny 10001"))

# Business name pattern (DBA)
print("DBA:", parser.parse("mike reynolds dba reynolds consulting 2100 clearwater blvd tampa fl 33601"))