# Leksara End-to-End Playbook
This notebook weds the quickstart walkthrough with a deeper feature tour. Run it top-to-bottom to explore ingestion dashboards, cleaning primitives, PII redaction, review normalisation, custom pipelines, presets, benchmarking, runtime tweaks, and logging hooks.

## Quick Start: Install Leksara
Install the package from PyPI before running any examples:

```bash
pip install leksara
```

Tip: activate a virtual environment (for example `python -m venv .venv` then `.\.venv\Scripts\activate`) to keep project dependencies isolated.

## How to Use This Playbook
1. Ensure Leksara and its optional dependencies (`regex`, `emoji`, `Sastrawi`, `pandas`) are installed in your environment.
2. Execute the cells in order; each scenario is independent so you can rerun specific sections as needed.
3. Treat the quickstart portion as a template for pipelines, and the deeper sections as validation references you can adapt to your own datasets.

## Environment Setup
Run the next cell once to load every helper used across the playbook—`pandas` for tabular demos plus the Leksara presets, frames, and logging utilities.

In [1]:
# Core imports for every section
import json
from pathlib import Path

import pandas as pd

from leksara import leksara, ReviewChain, get_preset
from leksara.frames.cartboard import CartBoard, get_flags, get_stats, noise_detect
from leksara.function import (
    remove_tags,
    case_normal,
    remove_stopwords,
    remove_whitespace,
    remove_punctuation,
    remove_digits,
    remove_emoji,
    replace_url,
    replace_rating,
    shorten_elongation,
    replace_acronym,
    normalize_slangs,
    expand_contraction,
    word_normalization,
 )
from leksara.pattern import (
    replace_phone,
    replace_address,
    replace_email,
    replace_id,
 )
from leksara.core.logging import setup_logging, log_pipeline_step

## 1. CartBoard Dashboards
Spot anomalies before cleaning by profiling raw reviews—PII hot spots, rating noise, and general text hygiene—using the CartBoard utilities.

### 1.1 Sample Review Batch
Creates a tiny multi-channel dataframe and passes it to the CartBoard flag, stat, and noise scanners. The outputs highlight which rows need further redaction or normalisation.

In [2]:
raw_reviews = pd.DataFrame(
    {
        "review_id": [101, 102, 103],
        "channel": ["Tokopedia", "Shopee", "WhatsApp"],
        "text": [
            "Barang mantul!!! Email: user@example.com ⭐⭐⭐⭐⭐",
            "Pengiriman lambat :( Hubungi 0812-3456-7890 segera",
            "Halo admin, alamat saya Jl. Melati No. 8 RT 02 RW 04, Bandung",
        ],
    }
)

cartboard_flags = get_flags(raw_reviews, text_column="text")
cartboard_stats = get_stats(raw_reviews, text_column="text")
cartboard_noise = noise_detect(raw_reviews, text_column="text", include_normalized=False)

display(cartboard_flags[["review_id", "pii_flag", "rating_flag", "non_alphabetical_flag"]])
display(cartboard_stats[["review_id", "stats"]])
display(cartboard_noise[["review_id", "detect_noise"]])

single_card = CartBoard(raw_text=raw_reviews.loc[0, "text"], rating=5)
single_card.to_dict()

Unnamed: 0,review_id,pii_flag,rating_flag,non_alphabetical_flag
0,101,True,True,True
1,102,True,False,False
2,103,True,False,False


Unnamed: 0,review_id,stats
0,101,"{'length': 46, 'word_count': 6, 'stopwords': 0..."
1,102,"{'length': 50, 'word_count': 4, 'stopwords': 1..."
2,103,"{'length': 61, 'word_count': 10, 'stopwords': ..."


Unnamed: 0,review_id,detect_noise
0,101,"{'urls': [], 'html_tags': [], 'emails': ['user..."
1,102,"{'urls': [], 'html_tags': [], 'emails': [], 'p..."
2,103,"{'urls': [], 'html_tags': [], 'emails': [], 'p..."


{'original_text': 'Barang mantul!!! Email: user@example.com ⭐⭐⭐⭐⭐',
 'rating': 5,
 'pii_flag': True,
 'non_alphabetical_flag': True}

## 2. ReviewChain Pipelines
Walk through increasingly opinionated pipelines—from a handcrafted chain to preset shortcuts—so you can decide how much control you need.

### 2.1 Sample Chat Snippet
Sets up the reference dataframe used by every pipeline variant below so you can compare how each transformation behaves on identical inputs.

In [3]:
chat_df = pd.DataFrame({
    "chat_id": [1, 2],
    "chat_message": [
        "Halo! Nomor saya 0812-3456-7890. Email: x@y.com, Alamat: Jakarta",
        "Hubungi +6281234567890 ya — EMAIL saya: test@mail.co.id! Alamat saya di Bandung",
    ],
})
chat_df

Unnamed: 0,chat_id,chat_message
0,1,Halo! Nomor saya 0812-3456-7890. Email: x@y.co...
1,2,Hubungi +6281234567890 ya — EMAIL saya: test@m...


### 2.2 Custom Chat Sanitiser Pipeline
Explicitly strings together PII masking patterns followed by lightweight formatting functions. Useful when you must dictate exact ordering or parameter overrides.

In [4]:
custom_pipeline = {
    "patterns": [
        (replace_phone, {"mode": "replace"}),
        (replace_email, {"mode": "replace"}),
        (replace_address, {"mode": "replace"}),
        (replace_id, {"mode": "replace"}),
    ],
    "functions": [
        case_normal,
        remove_punctuation,
        remove_whitespace,
    ],
}

custom_chat_df = chat_df.copy()
custom_chat_df["safe_message"] = leksara(custom_chat_df["chat_message"], pipeline=custom_pipeline)
custom_chat_df[["chat_id", "safe_message"]]

Unnamed: 0,chat_id,safe_message
0,1,halo nomor saya [PHONE_NUMBER] email [EMAIL] a...
1,2,hubungi [PHONE_NUMBER] ya email saya [EMAIL] a...


### 2.3 Ecommerce Preset Shortcut
Switches to the out-of-the-box `ecommerce_review` preset; compare the outputs with the custom chain to verify parity and spot any missing tweaks.

In [5]:
preset_chat_df = chat_df.copy()
preset_chat_df["preset_clean"] = leksara(preset_chat_df["chat_message"], preset="ecommerce_review")
preset_chat_df[["chat_id", "preset_clean"]]

Unnamed: 0,chat_id,preset_clean
0,1,halo nomor [PHONE_NUMBER] email [EMAIL] alamat...
1,2,hubung [PHONE_NUMBER] ya email [EMAIL] alamat ...


### 2.4 Default Pipeline
Calls `leksara` with no arguments to show the baseline cleaning stack you get for free. Use this as a smoke test before customising anything.

In [6]:
default_chat_df = chat_df.copy()
default_chat_df["default_clean"] = leksara(default_chat_df["chat_message"])
default_chat_df[["chat_id", "default_clean"]]

Unnamed: 0,chat_id,default_clean
0,1,halo nomor saya 081234567890 email xycom alama...
1,2,hubungi 6281234567890 ya email saya testmailco...


### 2.5 Word Normalisation Dialled In
Demonstrates injecting a single function with custom arguments—here, stem-preserving word normalisation—to fine-tune vocabulary without rebuilding the preset.

In [7]:
normalisation_data = ["Produk Bagus sekali", "Saya membeli peralatan rumah tangga"]
normalisation_pipeline = {
    "patterns": [],
    "functions": [
        (word_normalization, {"word_list": ["Bagus"], "mode": "keep"}),
    ],
}
leksara(normalisation_data, pipeline=normalisation_pipeline)

['produk Bagus sekali', 'saya beli alat rumah tangga']

### 2.6 Minimal Case-Stopwords Pipeline
Illustrates a sentiment-friendly pipeline that keeps only text normalisation helpers—great for quick experiments where PII is already handled upstream.

In [8]:
feedback_df = pd.DataFrame({
    "chat_id": [1, 2, 3],
    "chat_message": [
        "Saya sangat suka produk ini, dan saya akan beli lagi!",
        "Produk ini bagus sekali untuk dipakai di rumah.",
        "Namun, harga-nya agak mahal ya...",
    ],
})

text_only_pipeline = {
    "functions": [
        case_normal,
        remove_stopwords,
        remove_punctuation,
        remove_whitespace,
    ],
}

feedback_df["cleaned"] = leksara(feedback_df["chat_message"], pipeline=text_only_pipeline)

print("=== Data Asli ===")
display(feedback_df[["chat_id", "chat_message"]])
print("\n=== Setelah Cleaning ===")
display(feedback_df[["chat_id", "cleaned"]])

=== Data Asli ===


Unnamed: 0,chat_id,chat_message
0,1,"Saya sangat suka produk ini, dan saya akan bel..."
1,2,Produk ini bagus sekali untuk dipakai di rumah.
2,3,"Namun, harga-nya agak mahal ya..."



=== Setelah Cleaning ===


Unnamed: 0,chat_id,cleaned
0,1,suka produk beli
1,2,produk bagus dipakai rumah
2,3,harganya mahal ya


### 2.7 Inspect Preset via ReviewChain
Rebuilds the preset through `ReviewChain.from_steps`, times every stage, and prints the assembled steps so you can audit or extend them.

In [9]:
preset_steps = get_preset("ecommerce_review")
review_chain = ReviewChain.from_steps(
    patterns=preset_steps["patterns"],
    functions=preset_steps["functions"],
)

probe_data = [
    "Produk baru saya: iphone12, harga 12 juta. Hubungi 0812-3456-7890.",
    "Email saya: test@example.com. Produk sangat berkualitas!",
]

timed_output, stage_metrics = review_chain.transform(probe_data, benchmark=True)

for original, cleaned in zip(probe_data, timed_output):
    print(f"Original: {original}")
    print(f"Cleaned : {cleaned}\n")

display(stage_metrics)
display(timed_output)

for idx, (name, step) in enumerate(review_chain.named_steps.items(), start=1):
    print(f"{idx}. {step}")

Original: Produk baru saya: iphone12, harga 12 juta. Hubungi 0812-3456-7890.
Cleaned : produk iphone12 harga 12 juta hubung [PHONE_NUMBER]

Original: Email saya: test@example.com. Produk sangat berkualitas!
Cleaned : email [EMAIL] produk kualitas



{'n_steps': 20,
 'total_time_sec': 0.09763129999191733,
 'per_step': [('word_normalization', 0.09705319999920903),
  ('wrapped', 0.00031820000731386244),
  ('remove_stopwords', 4.060000355821103e-05),
  ('expand_contraction', 3.5899996873922646e-05),
  ('unmask_whitelist', 3.210000431863591e-05),
  ('mask_whitelist', 3.029999788850546e-05),
  ('replace_phone', 2.680000034160912e-05),
  ('remove_punctuation', 1.799999881768599e-05),
  ('replace_id', 1.7799997294787318e-05),
  ('replace_address', 1.5399993571918458e-05),
  ('shorten_elongation', 1.5199999324977398e-05),
  ('replace_email', 8.39999847812578e-06),
  ('remove_whitespace', 8.100003469735384e-06),
  ('remove_tags', 6.799993570894003e-06),
  ('case_normal', 2.7999994927085936e-06),
  ('unmask_rating_tokens', 1.6999983927235007e-06)]}

['produk iphone12 harga 12 juta hubung [PHONE_NUMBER]',
 'email [EMAIL] produk kualitas']

1. replace_id
2. replace_phone
3. replace_email
4. replace_address
5. mask_whitelist
6. remove_tags
7. case_normal
8. wrapped
9. wrapped
10. wrapped
11. expand_contraction
12. wrapped
13. wrapped
14. word_normalization
15. remove_stopwords
16. shorten_elongation
17. remove_punctuation
18. remove_whitespace
19. unmask_whitelist
20. unmask_rating_tokens


## 3. Benchmark Pipelines
Run preset and custom chains side by side to compare outputs and timing metrics before you standardise on one approach.

In [10]:
reviews = pd.Series([
    "Email saya customer@mart.id, rating 5/5, kurir ramah.",
    "Alamat pengiriman: Jl. Durian No. 3 RT 05 RW 07, Bandung.",
])

preset_results, preset_metrics = leksara(reviews, preset="ecommerce_review", benchmark=True)

custom_benchmark_pipeline = {
    "patterns": [
        (replace_phone, {"mode": "replace"}),
        (replace_email, {"mode": "replace"}),
    ],
    "functions": [
        case_normal,
        replace_rating,
        remove_digits,
        remove_stopwords,
        remove_punctuation,
        remove_whitespace,
    ],
}

benchmark_chain = ReviewChain.from_steps(**custom_benchmark_pipeline)
chain_results, chain_metrics = benchmark_chain.transform(reviews, benchmark=True)

display(pd.DataFrame({
    "preset_output": preset_results,
    "custom_output": chain_results,
}))

print("Preset timings:", preset_metrics)
print("Custom timings:", chain_metrics)

Unnamed: 0,preset_output,custom_output
0,email [EMAIL] rating 5.0 kurir ramah,email [EMAIL] rating . kurir ramah
1,alamat kirim [ADDRESS],alamat pengiriman jl durian no rt rw bandung


Preset timings: {'n_steps': 20, 'total_time_sec': 0.11500969997723587, 'per_step': [('word_normalization', 0.11429829999542562), ('replace_rating', 0.00017859999934444204), ('replace_acronym', 0.00010949999705189839), ('replace_address', 7.94000006862916e-05), ('normalize_slangs', 6.420000136131421e-05), ('remove_stopwords', 4.159999662078917e-05), ('unmask_whitelist', 3.579999611247331e-05), ('expand_contraction', 3.5299992305226624e-05), ('mask_whitelist', 3.290000313427299e-05), ('remove_emoji', 2.6200003048870713e-05), ('remove_punctuation', 2.040000254055485e-05), ('replace_id', 1.6999998479150236e-05), ('replace_url', 1.5199999324977398e-05), ('shorten_elongation', 1.3899996702093631e-05), ('replace_phone', 1.1900003300979733e-05), ('replace_email', 9.099996532313526e-06), ('remove_whitespace', 7.200003892648965e-06), ('remove_tags', 6.599992047995329e-06), ('unmask_rating_tokens', 3.5999983083456755e-06), ('case_normal', 3.0000010156072676e-06)]}
Custom timings: {'n_steps': 11, 

## 4. Logging Hooks
Enable file-and-console logging, then emit per-step messages so observability stays intact when pipelines run in production.

In [11]:
setup_logging("pipeline_demo.log")

log_sample = "Hubungi saya di 0812-3456-7890 ya! Produk ini BAGUS banget???"
pipeline_steps = [
    ("replace_phone", lambda text: replace_phone(text, mode="replace")),
    ("case_normal", case_normal),
    ("remove_punctuation", remove_punctuation),
]

current_text = log_sample
for name, step in pipeline_steps:
    next_text = step(current_text)
    log_pipeline_step(name, current_text, next_text)
    current_text = next_text

print("Final output:", current_text)

2025-10-25 12:03:16,270 - INFO - replace_phone: Input: Hubungi saya di 0812-3456-7890 ya! Produk ini BAGUS banget??? | Output: Hubungi saya di [PHONE_NUMBER] ya! Produk ini BAGUS banget???
2025-10-25 12:03:16,270 - INFO - case_normal: Input: Hubungi saya di [PHONE_NUMBER] ya! Produk ini BAGUS banget??? | Output: hubungi saya di [phone_number] ya! produk ini bagus banget???
2025-10-25 12:03:16,270 - INFO - remove_punctuation: Input: hubungi saya di [phone_number] ya! produk ini bagus banget??? | Output: hubungi saya di phonenumber ya produk ini bagus banget


Final output: hubungi saya di phonenumber ya produk ini bagus banget


## 5. Basic Cleaning Primitives
Each helper below prints the original string alongside the cleaned version so you can quickly verify behaviour before composing them into a pipeline.

### remove_tags
Strips basic HTML markup without disturbing the surrounding text—perfect for templated channels that wrap copy in `<div>` elements.

In [12]:
tags_text = "<div>Halo <strong>GAN</strong>!!!</div>"
print("Original:", tags_text)
print("remove_tags:", remove_tags(tags_text))

Original: <div>Halo <strong>GAN</strong>!!!</div>
remove_tags: Halo GAN!!!


### case_normal
Lowercases strings while preserving accents, giving consistent downstream tokens for stopword or sentiment models.

In [13]:
case_text = "Produk INI BAGUS BANGET!!!"
print("Original:", case_text)
print("case_normal:", case_normal(case_text))

Original: Produk INI BAGUS BANGET!!!
case_normal: produk ini bagus banget!!!


### replace_url
Normalises or masks URLs so review content focuses on the message rather than outbound links.

In [14]:
url_text = "Detail promo di https://shop.id dan http://toko.co.id."
print("Original:", url_text)
print("replace_url:", replace_url(url_text, mode="replace"))

Original: Detail promo di https://shop.id dan http://toko.co.id.
replace_url: Detail promo di [URL] dan [URL]


### remove_emoji
Useful when downstream models expect ASCII; replaces emoji with blanks (configurable for replacement tokens).

In [15]:
emoji_text = "Pelayanan mantap 😍🔥👍"
print("Original:", emoji_text)
print("remove_emoji:", remove_emoji(emoji_text, mode="replace"))

Original: Pelayanan mantap 😍🔥👍
remove_emoji: Pelayanan mantap  suka banget  keren / hebat / mantap  bagus 


### remove_stopwords
Drops common Indonesian function words to tighten the signal before vectorisation.

In [16]:
stopwords_text = "Produk ini dan itu sangat dan benar-benar bagus"
print("Original:", stopwords_text)
print("remove_stopwords:", remove_stopwords(stopwords_text))

Original: Produk ini dan itu sangat dan benar-benar bagus
remove_stopwords: Produk      - bagus


### remove_punctuation
Cleans repeated punctuation marks while leaving alphanumeric characters untouched to simplify tokenisation.

In [17]:
punct_text = "Halo!!! Bagus, kan???"
print("Original:", punct_text)
print("remove_punctuation:", remove_punctuation(punct_text))

Original: Halo!!! Bagus, kan???
remove_punctuation: Halo Bagus kan


### remove_digits
Eliminates numeric artefacts like invoice numbers when analytics focuses on textual sentiment.

In [18]:
digits_text = "Order number 12345 akan dikirim 67890"
print("Original:", digits_text)
print("remove_digits:", remove_digits(digits_text))

Original: Order number 12345 akan dikirim 67890
remove_digits: Order number  akan dikirim 


### remove_whitespace
Condenses stray tabs, newlines, and double spaces to ensure consistent token boundaries.

In [19]:
whitespace_text = "  produk   ini\n\tbagus   banget  "
print("Original:", repr(whitespace_text))
print("remove_whitespace:", remove_whitespace(whitespace_text))

Original: '  produk   ini\n\tbagus   banget  '
remove_whitespace: produk ini bagus banget


## 6. Pattern-Based PII Masking
Validate each PII redaction helper in isolation.

### replace_id
Masks Indonesian ID numbers (NIK/KTP) using the pattern pack to keep compliance teams happy.

In [20]:
id_text = "NIK pelanggan: 3276120705010003 akan diverifikasi."
print("Original:", id_text)
print("replace_id:", replace_id(id_text, mode="replace"))

Original: NIK pelanggan: 3276120705010003 akan diverifikasi.
replace_id: NIK pelanggan: [NIK] akan diverifikasi.


### replace_phone
Redacts phone numbers while keeping surrounding context intact.

In [21]:
phone_text = "Hubungi saya di 0812-3456-7890 untuk info lebih lanjut."
print("Original:", phone_text)
print("replace_phone:", replace_phone(phone_text, mode="replace"))

Original: Hubungi saya di 0812-3456-7890 untuk info lebih lanjut.
replace_phone: Hubungi saya di [PHONE_NUMBER] untuk info lebih lanjut.


### replace_email
Finds email addresses via regex bundle and replaces them with a neutral token.

In [22]:
email_text = "Silakan kirim pertanyaan ke support@contoh.id."
print("Original:", email_text)
print("replace_email:", replace_email(email_text, mode="replace"))

Original: Silakan kirim pertanyaan ke support@contoh.id.
replace_email: Silakan kirim pertanyaan ke [EMAIL].


### replace_address
Uses multi-line ruleset to scrub street-level details from free-form addresses.

In [23]:
address_text = "Alamat lengkap: Jl. Melati No. 8 RT 02 RW 04, Bandung."
print("Original:", address_text)
print("replace_address:", replace_address(address_text, mode="replace"))

Original: Alamat lengkap: Jl. Melati No. 8 RT 02 RW 04, Bandung.
replace_address: Alamat lengkap: [ADDRESS].


## 7. Advanced Normalisation Helpers
Go beyond basic cleaning with rating extraction, acronym expansion, and word-level tweaks.

### replace_rating
Extracts numeric review scores, normalises decimal separators, and emits structured rating tokens.

In [24]:
rating_text = "Film ini dapet rating 4,5/5 di review pelanggan."
print("Original:", rating_text)
print("replace_rating:", replace_rating(rating_text))

Original: Film ini dapet rating 4,5/5 di review pelanggan.
replace_rating: Film ini dapet rating 4.5 di review pelanggan.


### shorten_elongation
Collapses repeated characters to an upper bound so "kereeeen" becomes a single emphasised token.

In [25]:
elongation_text = "Woooooow kereeeen bangetttt!!!"
print("Original:", elongation_text)
print("shorten_elongation:", shorten_elongation(elongation_text, max_repeat=2))

Original: Woooooow kereeeen bangetttt!!!
shorten_elongation: Woow kereen bangett!!


### normalize_slangs
Maps colloquial Indonesian chat shorthand into normalized dictionary terms.

In [26]:
slang_text = "Gw lg brb ya, sokap mau join call?"
print("Original:", slang_text)
print("normalize_slangs:", normalize_slangs(slang_text, mode="replace"))

Original: Gw lg brb ya, sokap mau join call?
normalize_slangs: Gw lg brb ya, tingkah mau join call?


### replace_acronym
Expands measurement acronyms (e.g. `jt`, `m`) so downstream NLP sees explicit units.

In [27]:
acronym_text = "Harga hp baru itu 5 jt, ukurannya 5 m panjangnya."
print("Original:", acronym_text)
print("replace_acronym:", replace_acronym(acronym_text, mode="replace"))

Original: Harga hp baru itu 5 jt, ukurannya 5 m panjangnya.
replace_acronym: Harga handphone baru itu 5 jt, ukurannya 5 meter panjangnya.


### expand_contraction
Handles English contractions so multilingual text normalises cleanly.

In [28]:
contraction_text = "Saya tdk percaya ini gak bekerja dgn baik"
print("Original:", contraction_text)
print("expand_contraction:", expand_contraction(contraction_text))

Original: Saya tdk percaya ini gak bekerja dgn baik
expand_contraction: Saya tdk percaya ini tidak bekerja dengan baik


### word_normalization
Switchable stemming/lemmatisation that tunes token granularity without losing the original.

In [29]:
normalization_text = "Pengiriman barang cepat dan penggunaannya bagus sekali."
print("Original:", normalization_text)
print("word_normalization:", word_normalization(normalization_text, method="stem", mode="keep"))

Original: Pengiriman barang cepat dan penggunaannya bagus sekali.
word_normalization: kirim barang cepat dan guna bagus sekali


## 8. Preset Listing and Extension
Inspect and extend the preset configuration that backs the earlier pipelines.

In [30]:
ecommerce_preset = get_preset("ecommerce_review")
print("Patterns:", ecommerce_preset["patterns"])
print("Functions:", ecommerce_preset["functions"])

# Extend preset with additional address masking depth
extended = get_preset("ecommerce_review")
extended["patterns"].append((replace_address, {"mode": "replace", "street": True, "city": True}))
extended_results = leksara(reviews, pipeline=extended)
extended_results

Patterns: [(<function replace_id at 0x000002B2CCA067A0>, {'mode': 'replace'}), (<function replace_phone at 0x000002B2CCA06520>, {'mode': 'replace'}), (<function replace_email at 0x000002B2CCA06700>, {'mode': 'replace'}), (<function replace_address at 0x000002B2CCA06660>, {'mode': 'replace'})]
Functions: [<function remove_tags at 0x000002B2CCA06B60>, <function case_normal at 0x000002B2CCA06C00>, (<function replace_url at 0x000002B2CCA06F20>, {'mode': 'remove'}), (<function remove_emoji at 0x000002B2CCA06FC0>, {'mode': 'replace'}), <function replace_rating at 0x000002B2CCA058A0>, <function expand_contraction at 0x000002B2CCA05EE0>, (<function normalize_slangs at 0x000002B2CCA05DA0>, {'mode': 'replace'}), (<function replace_acronym at 0x000002B2CCA05D00>, {'mode': 'replace'}), <function word_normalization at 0x000002B2CCA060C0>, <function remove_stopwords at 0x000002B2CCA06CA0>, <function shorten_elongation at 0x000002B2CCA05C60>, <function remove_punctuation at 0x000002B2CCA06E80>, <func

0    email [EMAIL] rating 5.0 kurir ramah
1                  alamat kirim [ADDRESS]
dtype: object

## Closing Insights
This playbook captures the end-to-end journey: monitoring data quality with CartBoard, assembling ReviewChain pipelines, and measuring performance plus logging. Here are ways to extend it and where to deploy the results:
- **Deeper customisation**: add domain dictionaries, bespoke PII patterns, or new normalisation utilities to serve verticals such as healthcare, fintech, and logistics.
- **Production integration**: embed the pipeline into content moderation services, real-time sentiment analytics, customer-support chatbots, or internal QA systems, backed by logging and recurring benchmarks.
- **Team operationalisation**: use presets as experiment baselines, document tweaks in a shared repository, and schedule regular data-cleanliness reviews to keep quality on track.
Happy experimenting—treat this notebook as the foundation for resilient Indonesian-language pipelines in your environment.