l8e-beam is a Python SDK that provides simple, powerful tools for preparing data for AI agent contexts.
Its primary features are robust PII (Personally Identifiable Information) protection through redaction and anonymization.
It is designed to be a flexible solution for developers who need to ensure data privacy before passing data to language models or other AI systems.
This package is not yet available on the Python Package Index (PyPI) and must be installed from the source code.
If you are actively developing the package, an "editable" install is recommended. This allows your changes to be reflected immediately without needing to reinstall.
- Clone the repository.
- Run the following command from the project's root directory:
pip install -e .
This method builds the package into a wheel file and then installs it, which is closer to how a user would install it from PyPI.
First, ensure you have the build tool installed (pip install build). Then, run the build command from the project root:
python -m build
This will create a dist/ directory containing the installable wheel file (e.g., l8e_beam-0.6.0-py3-none-any.whl).
Install the package using the wheel file from the dist/ directory:
pip install dist/l8e_beam-*.whl
This library offers two powerful methods for handling PII, catering to different needs:
-
1. The Decorator (
@redact_pii) The easiest way to protect data. Simply add it to your functions for automatic, "set-and-forget" PII protection on all inputs and outputs. β Best for simple, comprehensive coverage. -
2. The Direct API (
sanitize_pii) A flexible function for advanced use cases. It gives you granular control to process specific data blobs, add custom rules, or disable default ones on the fly. β Best for complex, dynamic, or non-decorator workflows.
The decorator is the quickest way to secure a function.
You can control the behavior by setting the action parameter.
Redaction replaces detected PII with a placeholder label.
from l8e_beam import redact_pii, PiiAction
@redact_pii(action=PiiAction.REDACT)
def process_incident_report(report: dict):
return report
incident_report = {"details": "Client Susan Miller reported an outage in Berlin."}
redacted_report = process_incident_report(incident_report)
# {'details': 'Client [PERSON] reported an outage in [GPE].'}Anonymization replaces detected PII with realistic-looking fake data.
from l8e_beam import redact_pii, PiiAction
@redact_pii(action=PiiAction.ANONYMIZE)
def create_test_user(profile: dict):
return profile
user_profile = {"name": "John Doe", "email": "john.doe@example.com"}
anonymized_profile = create_test_user(user_profile)
# {'name': 'Mary Smith', 'email': 'robertholmes@example.org'}For full control, use the sanitize_pii function directly. This is ideal when you need to process a piece of data dynamically or apply custom rules.
from l8e_beam import sanitize_pii, PiiAction
data = "Send the report to jane.doe@example.com"
processed_data = sanitize_pii(data, action=PiiAction.ANONYMIZE)
# 'Send the report to sarahjones@example.com'You can easily disable default recognizers by passing a list of DEFAULT_RECOGNIZERS enums.
This is useful for preventing certain PII types from being processed.
from l8e_beam import sanitize_pii, PiiAction, DEFAULT_RECOGNIZERS
report = "Contact support at help@example.com about the issue with John Smith."
# Redact the person's name, but leave the email address untouched
processed_report = sanitize_pii(
report,
action=PiiAction.REDACT,
disabled_recognizers=[DEFAULT_RECOGNIZERS.EMAIL]
)
# 'Contact support at help@example.com about the issue with [PERSON].'The most powerful feature of the API is the ability to add your own recognizers on the fly.
import re
import uuid
from l8e_beam import sanitize_pii, RegexRecognizer, PiiAction
# 1. Define your custom recognizer class
class UuidRecognizer(RegexRecognizer):
name = "UUID"
regex = re.compile(r"[a-f0-9]{8}-([a-f0-9]{4}-){3}[a-f0-9]{12}", re.I)
# 2. Create an instance of it
my_uuid_recognizer = UuidRecognizer()
log_entry = f"Request failed for user_id: {uuid.uuid4()}"
# 3. Pass it to the API
processed_log = sanitize_pii(
log_entry,
action=PiiAction.REDACT,
custom_recognizers=[my_uuid_recognizer]
)
# 'Request failed for user_id: [UUID]'The system uses a hybrid approach of machine learning models and regular expressions.
You can disable any of the following recognizers using the disabled_recognizers parameter and the DEFAULT_RECOGNIZERS enum:
EMAIL,PHONE,CREDIT_CARD(from Regex)PERSON,ORG,GPE,LOC,DATE(from spaCy NER)
For certain PII types like credit cards, a validation step (e.g., the Luhn algorithm) is performed after detection. This reduces false positives and increases accuracy.
For NER-based detection, you can choose a model that best fits your use case.
| Model | ModelType Member | Accuracy | Size | Speed | Best For |
|---|---|---|---|---|---|
| Small | ModelType.SM |
~0.86 | ~12 MB | Fast | General-purpose tasks |
| Transformer | ModelType.TRF |
>0.90 | ~400 MB | Slow | High-accuracy, complex text |
# The model can be specified in both the decorator and the direct API
from l8e_beam import sanitize_pii, ModelType
processed_text = sanitize_pii(text, model=ModelType.TRF)The l8e-beam processor can recursively traverse and sanitize nested data structures.
- Primitives: str
- Collections: dict, list, tuple
The processor will automatically handle any object that has a .dict() method, such as a Pydantic model. It converts the object to a dictionary, sanitizes its values, and then reconstructs the original object, preserving its type.
from pydantic import BaseModel
from l8e_beam import redact_pii, PiiAction
class UserContext(BaseModel):
user_id: str
full_name: str
@redact_pii(action=PiiAction.REDACT)
def process_context(context: UserContext):
# The 'context' object is sanitized before this code runs,
# but it remains a UserContext instance.
return context
user_data = UserContext(user_id="abc-123", full_name="Jane Doe")
processed_context = process_context(user_data)
# The returned object is still a UserContext, with sanitized data:
# UserContext(user_id='abc-123', full_name='[PERSON]')
-
Clone the repository.
-
Install dependencies:
pip install -r requirements.txt
-
Run tests:
pytest
- PII Redaction & Anonymization
- Hybrid Detection (NER + Regex) with Validation
- Decorator & Advanced API Interfaces
- Audit Logging: Add a
@log_auditdecorator. - Granular Control: Allow specifying which PII types to enable, not just disable.
Contributions are welcome! To set up a development environment, use the build.sh script:
chmod +x build.sh
./build.shMIT License Β© 2025 l8e