# Event Ingestion Pipeline Testing

This notebook tests the **config-driven** event ingestion pipeline.
All sources (Ra.co, Ticketmaster, etc.) are created through `PipelineFactory`
using YAML configuration — no source-specific code needed.

**Pipeline flow:**
1. Factory reads `ingestion.yaml` and creates pipelines
2. Each pipeline fetches raw data via its adapter (GraphQL / REST)
3. FieldMapper extracts + transforms fields per config
4. TaxonomyMapper assigns Human Experience Taxonomy dimensions
5. Events are normalized to `EventSchema` and optionally enriched by LLM

In [1]:
import sys
import os
import logging


# Setup path — point to services/api so src.* imports work
API_ROOT = os.path.abspath(os.path.join("..", "services", "api"))
if API_ROOT not in sys.path:
    sys.path.insert(0, API_ROOT)

# Setup path — point to services/scrapping so scrapping.* imports work
SCRAPPING_ROOT = os.path.abspath(os.path.join("..", "services", "scrapping"))
if SCRAPPING_ROOT not in sys.path:
    sys.path.insert(0, SCRAPPING_ROOT)

# Enable logging
logging.basicConfig(
    level=logging.INFO,
    format="%(name)s - %(levelname)s - %(message)s",
)


print(f"API root: {API_ROOT}")
# print(f"Scrapping root: {SCRAPPING_ROOT}")
print("Setup complete")

API root: /Users/josegarcia/Documents/GitHub/event-intelligence-platform/services/api
Setup complete


## Step 1: PipelineFactory — List All Configured Sources

The factory reads `ingestion.yaml` and can create pipelines for any enabled source.

In [2]:
from src.ingestion.factory import PipelineFactory

factory = PipelineFactory()

print("Configured Sources:")
print("=" * 50)
for name, info in factory.list_sources().items():
    status = "ENABLED" if info["enabled"] else "disabled"
    print(f"  {name:20} type={info['type']:10} [{status}]")

print(f"\nEnabled sources: {factory.list_enabled_sources()}")

Configured Sources:
  ra_co                type=api        [ENABLED]
  ticketmaster         type=api        [disabled]

Enabled sources: ['ra_co']


## Step 2: Ra.co Pipeline — Multi-City Ingestion

The Ra.co pipeline is created entirely from config. It uses:
- GraphQL API adapter
- Multi-city execution (Barcelona + Madrid via `defaults.areas`)
- Date-window splitting for complete coverage
- FieldMapper for extraction + transformations
- FeatureExtractor (LLM) for taxonomy enrichment

In [3]:
ra_co = factory.create_pipeline("ra_co")

print(f"Pipeline: {ra_co.config.source_name}")
print(f"Source type: {ra_co.source_type.value}")
print(f"Protocol: {ra_co.source_config.protocol}")
print(f"Endpoint: {ra_co.source_config.endpoint}")
print(f"Areas: {ra_co.source_config.defaults.get('areas', {})}")
print(f"Days ahead: {ra_co.source_config.defaults.get('days_ahead')}")
print(f"Feature extractor: {ra_co.feature_extractor is not None}")

Pipeline: ra_co
Source type: api
Protocol: graphql
Endpoint: https://ra.co/graphql
Areas: {'Barcelona': 20, 'Madrid': 41}
Days ahead: 1
Feature extractor: False


In [4]:
# Execute Ra.co pipeline (limited batch for faster notebook verification)
# You can remove these limits for full runs.
ra_co.source_config.defaults['areas'] = {'Barcelona': 20}
raco_result = ra_co.execute(max_pages=1, page_size=5)

print("Ra.co Pipeline Results")
print("=" * 60)
print(f"Status: {raco_result.status.value}")
print(f"Total raw events: {raco_result.total_events_processed}")
print(f"Successful: {raco_result.successful_events}")
print(f"Failed: {raco_result.failed_events}")
print(f"Duration: {raco_result.duration_seconds:.2f}s")
print(f"Success rate: {raco_result.success_rate:.1f}%")
print(f"Cities: {raco_result.metadata.get('cities', [])}")

if raco_result.errors:
    print(f"\nErrors: {raco_result.errors}")


pipeline.ra_co - INFO - Starting multi-city execution: ra_co_20260217_121214_1a68b258 (1 cities)
pipeline.ra_co - INFO - Fetching events for Barcelona (area_id=20)...
pipeline.ra_co - INFO -   Barcelona: sliding window fetch [2026-02-17..2026-02-18] (capacity=500/call, window=168h)
src.ingestion.pipelines.apis.base_api - INFO - Fetching page 1/1...
src.ingestion.pipelines.apis.base_api - INFO - Pagination complete: fetched 5 total events across 2 pages
pipeline.ra_co - INFO -   Barcelona: [2026-02-17..2026-02-18] 5/12 events (SATURATED — shrinking to 84h)
src.ingestion.pipelines.apis.base_api - INFO - Fetching page 1/1...
src.ingestion.pipelines.apis.base_api - INFO - Pagination complete: fetched 5 total events across 2 pages
pipeline.ra_co - INFO -   Barcelona: [2026-02-17..2026-02-18] 5/12 events (SATURATED — shrinking to 42h)
src.ingestion.pipelines.apis.base_api - INFO - Fetching page 1/1...
src.ingestion.pipelines.apis.base_api - INFO - Pagination complete: fetched 5 total events 

Ra.co Pipeline Results
Status: partial_success
Total raw events: 48
Successful: 12
Failed: 36
Duration: 29.81s
Success rate: 25.0%
Cities: ['Barcelona']


In [5]:
raco_result.events

[EventSchema(event_id='d623d0fb-e7d6-56c3-9990-03529f6837ab', title='Plastic Night', description='', taxonomy_dimension=TaxonomyDimension(primary_category='play_pure_fun', subcategory='1.4', subcategory_name=None, confidence=1.0, values=[], activity_id=None, activity_name=None, energy_level=None, social_intensity=None, cognitive_load=None, physical_involvement=None, cost_level=None, time_scale=None, environment=None, emotional_output=[], risk_level=None, age_accessibility=None, repeatability=None), start_datetime=datetime.datetime(2026, 2, 17, 23, 59, tzinfo=datetime.timezone.utc), end_datetime=datetime.datetime(2026, 2, 18, 5, 0, tzinfo=datetime.timezone.utc), duration_minutes=301, is_all_day=False, is_recurring=False, recurrence_pattern=None, location=LocationInfo(venue_name='Macarena Club', street_address='Carrer Nou de Sant Francesc, 5', city='Barcelona', state_or_region='Barcelona', postal_code='08002', country_code='ES', coordinates=Coordinates(latitude=41.379405, longitude=2.176

In [6]:
# Show sample normalized events with ARTISTS focus
if raco_result.events:
    print(f"Sample Events ({len(raco_result.events)} total):")
    print("=" * 70)

    for i, event in enumerate(raco_result.events[:10]):
        print(f"\n[{i+1}] {event.title}")
        print(f"    City: {event.location.city} | Venue: {event.location.venue_name}")
        print(f"    Date: {event.start_datetime}")
        print(f"    Type: {event.event_type} | Price: {event.price.price_raw_text}")
        print(f"    Artists: {[a.name for a in event.artists]}")
        print(f"    Source URL: {event.source.source_url}")
        desc = (event.description or 'N/A')[:120]
        print(f"    Description: {desc}...")
        print(f"    Quality: {event.data_quality_score:.2f}")
        print(f"    Engagement: going={event.engagement.going_count if event.engagement else 'N/A'}, interested={event.engagement.interested_count if event.engagement else 'N/A'}")
        print(f"    Custom fields: {event.custom_fields}")
else:
    print("No events fetched. Check pipeline logs above for errors.")

Sample Events (12 total):

[1] Plastic Night
    City: Barcelona | Venue: Macarena Club
    Date: 2026-02-17 23:59:00+00:00
    Type: nightlife | Price: 10€
    Artists: ['Kanedo', 'Jarod Fuentes']
    Source URL: https://ra.co/events/2348960
    Description: N/A...
    Quality: 0.60
    Engagement: going=18, interested=18
    Custom fields: {'is_ticketed': True}

[2] Luca Pernice All Night Long
    City: Barcelona | Venue: City Hall
    Date: 2026-02-17 23:59:00+00:00
    Type: nightlife | Price: None
    Artists: ['Luca Pernice']
    Source URL: https://ra.co/events/2369741
    Description: Electroma pres. LUCA PERNICE...
    Quality: 0.60
    Engagement: going=8, interested=8
    Custom fields: {'is_ticketed': True}

[3] MEN (All Night Long)
    City: Barcelona | Venue: Moog Club
    Date: 2026-02-17 23:59:00+00:00
    Type: nightlife | Price: None
    Artists: ['DJ MEN']
    Source URL: https://ra.co/events/2338578
    Description: Men llegó hace unos años a Barcelona después de re

## Step 3: Raw Event Inspection (pre-normalization)

Inspect how FieldMapper extracts raw fields to understand the pipeline internals.

In [7]:
# Inspect raw parsed_event dict BEFORE normalization
# This helps verify FieldMapper is extracting artists correctly
print("=" * 60)
print("RAW FIELD MAPPER OUTPUT (parsed_event dict)")
print("=" * 60)

if hasattr(ra_co, '_last_raw_events') and ra_co._last_raw_events:
    for i, raw in enumerate(ra_co._last_raw_events[:3]):
        print(f"\n--- Raw event {i+1} ---")
        for key in ['title', 'artists', 'attending', 'interested_count',
                     'flyer_front', 'pick_blurb', 'is_ticketed',
                     'venue_name', 'minimum_age', 'venue_latitude', 'venue_longitude']:
            print(f"  {key}: {raw.get(key, 'N/A')}")
else:
    print("No cached raw events — re-running field mapper on first page...")
    # Manually test the field mapper on a sample response
    from src.ingestion.normalization.field_mapper import FieldMapper

    mapper = FieldMapper(ra_co.source_config.field_mappings)
    print(f"  Configured field mappings: {list(ra_co.source_config.field_mappings.keys())}")
    print(f"  Artists mapping: {ra_co.source_config.field_mappings.get('artists')}")

RAW FIELD MAPPER OUTPUT (parsed_event dict)
No cached raw events — re-running field mapper on first page...
  Configured field mappings: ['source_event_id', 'title', 'description', 'date', 'start_time', 'end_time', 'venue_name', 'venue_address', 'venue_id', 'venue_content_url', 'city', 'country_name', 'country_code', 'minimum_age', 'artists', 'cost', 'content_url', 'image_filename', 'image_crop', 'flyer_front', 'venue_live', 'attending', 'interested_count', 'is_ticketed', 'pick_id', 'pick_blurb', 'source_updated_at', 'lineup', 'organizer_name', 'venue_phone', 'venue_website', 'venue_follower_count']
  Artists mapping: event.artists[*].name


## Step 5: Compressed HTML (raw_html) Inspection

Verify that the `compressed_html` enrichment is working for RA.co events.

## Step 4: Artists Field Inspection

Verify that `event.artists` is populated as `List[ArtistInfo]` (not stored in `custom_fields`).

In [8]:
# Deep inspection of the artists field
events = raco_result.events

# Count events with/without artists
events_with_artists = [e for e in events if e.artists]
events_without_artists = [e for e in events if not e.artists]

print("=" * 60)
print("ARTISTS FIELD MAPPING INSPECTION")
print("=" * 60)
print(f"Total events: {len(events)}")
print(f"Events WITH artists: {len(events_with_artists)} ({100*len(events_with_artists)/len(events):.1f}%)")
print(f"Events WITHOUT artists: {len(events_without_artists)} ({100*len(events_without_artists)/len(events):.1f}%)")

# Verify artists are ArtistInfo objects, not in custom_fields
print("\n--- Artist type check ---")
if events_with_artists:
    sample = events_with_artists[0]
    print(f"  Type of event.artists: {type(sample.artists)}")
    print(f"  Type of first artist: {type(sample.artists[0])}")
    print(f"  First artist name: {sample.artists[0].name}")
    print(f"  'artists' in custom_fields? {'artists' in (sample.custom_fields or {})}")

# Check custom_fields does NOT contain artists anymore
print("\n--- custom_fields check (should NOT contain 'artists' key) ---")
events_with_artists_in_cf = [
    e for e in events
    if e.custom_fields and "artists" in e.custom_fields
]
print(f"Events with 'artists' in custom_fields: {events_with_artists_in_cf}")

# Show top events with most artists
print("\n--- Events with most artists ---")
sorted_by_artists = sorted(events, key=lambda e: len(e.artists), reverse=True)
for e in sorted_by_artists[:10]:
    names = [a.name for a in e.artists]
    print(f"  [{len(names)} artists] {e.title}: {names}")

ARTISTS FIELD MAPPING INSPECTION
Total events: 12
Events WITH artists: 12 (100.0%)
Events WITHOUT artists: 0 (0.0%)

--- Artist type check ---
  Type of event.artists: <class 'list'>
  Type of first artist: <class 'src.schemas.event.ArtistInfo'>
  First artist name: Kanedo
  'artists' in custom_fields? False

--- custom_fields check (should NOT contain 'artists' key) ---
Events with 'artists' in custom_fields: []

--- Events with most artists ---
  [3 artists] Plaiia Parties: ['Saulo Pisa', 'Miguel Silva', 'Civaro']
  [3 artists] Hurtado + Rubén Seoane: ['Rubén Seoane', 'Hurtado', 'Rubén Seoane\xa0Hurtado']
  [2 artists] Plastic Night: ['Kanedo', 'Jarod Fuentes']
  [2 artists] HiFi: Vultur, Moray: ['Moray', 'Vultur']
  [1 artists] Luca Pernice All Night Long: ['Luca Pernice']
  [1 artists] MEN (All Night Long): ['DJ MEN']
  [1 artists] Dr. Dou Social Club meet Jùlio Marks: ['Jùlio Marks']
  [1 artists] Laurence Guy en microdosis - Razzmatazz 3, Barcelona: ['Laurence Guy']
  [1 artists]

In [9]:
# Inspect compressed_html field on events
print("=" * 60)
print("COMPRESSED HTML (raw_html) INSPECTION")
print("=" * 60)

events_with_html = [e for e in events if e.source.compressed_html]
events_without_html = [e for e in events if not e.source.compressed_html]

print(f"Total events: {len(events)}")
print(f"Events WITH compressed_html: {len(events_with_html)} ({100*len(events_with_html)/len(events):.1f}%)")
print(f"Events WITHOUT compressed_html: {len(events_without_html)} ({100*len(events_without_html)/len(events):.1f}%)")

if events_with_html:
    # Show sample compressed_html
    sample = events_with_html[0]
    html_text = sample.source.compressed_html
    print(f"\n--- Sample compressed_html (first event with data) ---")
    print(f"  Title: {sample.title}")
    print(f"  Source URL: {sample.source.source_url}")
    print(f"  HTML length: {len(html_text)} chars")
    print(f"  First 500 chars: {html_text[:500]}...")

    # Stats
    lengths = [len(e.source.compressed_html) for e in events_with_html]
    avg_len = sum(lengths) / len(lengths)
    print(f"\n--- Compressed HTML size stats ---")
    print(f"  Min: {min(lengths)} chars")
    print(f"  Max: {max(lengths)} chars")
    print(f"  Avg: {avg_len:.0f} chars")

    # Content quality assessment
    MIN_AVG_LEN = 200
    short_events = [e for e in events_with_html if len(e.source.compressed_html) < MIN_AVG_LEN]
    print(f"\n--- Content quality ---")
    print(f"  Events with < {MIN_AVG_LEN} chars: {len(short_events)} / {len(events_with_html)}")
    if avg_len >= MIN_AVG_LEN:
        print(f"  Average content length ({avg_len:.0f}) >= {MIN_AVG_LEN}: GOOD")
    else:
        print(f"  WARNING: Average content length ({avg_len:.0f}) < {MIN_AVG_LEN}")
        print(f"  This may indicate the scraping engine is not rendering JavaScript.")
        print(f"  Consider using browser/hybrid engine for SPA sources.")
    if short_events:
        print(f"  Short content samples:")
        for e in short_events[:5]:
            print(f"    [{len(e.source.compressed_html)} chars] {e.title}: {e.source.compressed_html[:80]}...")
else:
    print("\nNO events have compressed_html!")
    print("Check that:")
    print("  1. 'scrapping' service is importable")
    print("  2. enrichment.compressed_html.enabled = true in ingestion.yaml")
    print("  3. RA.co pages are accessible with the configured engine")
    print("  4. Check pipeline logs for HTML enrichment errors")

COMPRESSED HTML (raw_html) INSPECTION
Total events: 12
Events WITH compressed_html: 0 (0.0%)
Events WITHOUT compressed_html: 12 (100.0%)

NO events have compressed_html!
Check that:
  1. 'scrapping' service is importable
  2. enrichment.compressed_html.enabled = true in ingestion.yaml
  3. RA.co pages are accessible with the configured engine
  4. Check pipeline logs for HTML enrichment errors


In [10]:
# Strict verification checks
if not events:
    raise AssertionError('No events ingested; cannot verify compressed_html coverage.')

missing_html = [e.source.source_event_id for e in events if not (e.source.compressed_html or '').strip()]

print(f'Total events checked: {len(events)}')
print(f'Events missing compressed_html: {len(missing_html)}')

if missing_html:
    print('Sample missing compressed_html IDs:', missing_html[:10])

assert not missing_html, 'Not all ingested events have compressed_html populated.'
print('Verification passed: all events have compressed_html.')

Total events checked: 12
Events missing compressed_html: 12
Sample missing compressed_html IDs: ['2348960', '2369741', '2338578', '2372518', '2348963', '2338673', '2297033', '2209661', '2372186', '2367466']


AssertionError: Not all ingested events have compressed_html populated.

In [None]:
# Normalization error severity breakdown
from collections import Counter

print("=" * 60)
print("NORMALIZATION ERROR SEVERITY BREAKDOWN")
print("=" * 60)

all_errors = []
for e in events:
    all_errors.extend(e.normalization_errors)

print(f"Total normalization messages: {len(all_errors)}")

if all_errors:
    # By severity
    severity_counts = Counter(err.severity.value if hasattr(err.severity, 'value') else str(err.severity) for err in all_errors)
    print(f"\n--- By Severity ---")
    for sev, count in severity_counts.most_common():
        print(f"  {sev:10}: {count}")

    # Show sample messages per severity
    print(f"\n--- Sample Messages ---")
    for sev in severity_counts:
        msgs = [err.message for err in all_errors if (err.severity.value if hasattr(err.severity, 'value') else str(err.severity)) == sev]
        print(f"  [{sev}] {msgs[0][:100]}")
else:
    print("No normalization errors found.")

## Step 6: Deduplication

Apply `ExactMatchDeduplicator` to the pipeline results and compare before/after.

In [None]:
from src.ingestion.deduplication import ExactMatchDeduplicator, get_deduplicator, DeduplicationStrategy

# Apply exact match deduplication
deduplicator = ExactMatchDeduplicator()
deduplicated_events = deduplicator.deduplicate(events)

print("=" * 60)
print("DEDUPLICATION RESULTS")
print("=" * 60)
print(f"Events before dedup: {len(events)}")
print(f"Events after dedup:  {len(deduplicated_events)}")
print(f"Duplicates removed:  {len(events) - len(deduplicated_events)}")
print(f"Dedup ratio:         {100*(len(events) - len(deduplicated_events))/len(events):.1f}%")

# Show duplicates if any
if len(events) != len(deduplicated_events):
    seen = set()
    duplicates = []
    for event in events:
        venue_name = event.location.venue_name or "unknown_venue"
        key = (event.title, venue_name, str(event.start_datetime))
        if key in seen:
            duplicates.append(event)
        else:
            seen.add(key)

    print(f"\n--- Duplicate events ---")
    for d in duplicates[:20]:
        print(f"  DUP: {d.title} @ {d.location.venue_name} ({d.start_datetime})")
else:
    print("\nNo duplicates found — all events are unique.")

In [None]:
# Build DataFrame from deduplicated events
df = ra_co.to_dataframe(deduplicated_events)

print(f"DataFrame shape: {df.shape}")
print(f"\nColumns ({len(df.columns)} total):")
for col in df.columns:
    print(f"  {col}")

# Show key fields for artists verification
key_cols = ["title", "artists", "custom_fields_json", "event_type", "data_quality_score"]
available = [c for c in key_cols if c in df.columns]
df[available].head(15)

## Step 7: DataFrame Visualization & Summary

In [None]:
# Show engagement and artist columns together
focus_cols = [
    "title", "artists", "event_type", "city",
    "engagement_going_count", "engagement_interested_count",
    "age_restriction", "data_quality_score",
]
available = [c for c in focus_cols if c in df.columns]
df[available].head(20)

## Step 7: Summary Statistics

In [None]:
if df.empty:
    print("No events ingested — summary statistics unavailable.")
else:
    print("=" * 60)
    print("INGESTION SUMMARY (after deduplication)")
    print("=" * 60)

    print(f"\nTotal events: {len(df)}")
    print(f"Average quality score: {df['data_quality_score'].mean():.3f}")

    print("\n--- By Source ---")
    print(df.groupby("source_name").size().to_string())

    print("\n--- By City ---")
    print(df.groupby("city").size().sort_values(ascending=False).to_string())

    print("\n--- By Event Type ---")
    print(df.groupby("event_type").size().sort_values(ascending=False).to_string())

    print("\n--- Free vs Paid ---")
    print(df.groupby("price_is_free").size().to_string())

    # Artists stats
    artists_col = df["artists"].fillna("")
    events_with_artists_df = artists_col[artists_col != ""]
    print(f"\n--- Artists ---")
    print(f"Events with artist data: {len(events_with_artists_df)} / {len(df)}")

    print("\n--- Date Range ---")
    print(f"Earliest: {df['start_datetime'].min()}")
    print(f"Latest:   {df['start_datetime'].max()}")

## Step 8: Save Results (Optional)

In [None]:
if not df.empty:
    output_dir = "../data/raw"
    os.makedirs(output_dir, exist_ok=True)
    output_path = f"{output_dir}/events_all_sources.parquet"
    try:
        df.to_parquet(output_path, index=False, engine='pyarrow')
    except ImportError:
        df.to_parquet(output_path, index=False, engine='fastparquet')
    print(f"Saved {len(df)} events to {output_path}")
else:
    print("DataFrame is empty — skipping save.")

In [None]:
import pickle

if raco_result:
    output_dir = "../data/raw"
    os.makedirs(output_dir, exist_ok=True)
    pkl_path = f"{output_dir}/raco_result.pkl"
    with open(pkl_path, "wb") as f:
        pickle.dump(raco_result, f, protocol=pickle.HIGHEST_PROTOCOL)
    print(f"Saved PipelineExecutionResult to {pkl_path}")
    print(f"  Events: {raco_result.successful_events}")
    print(f"  Status: {raco_result.status.value}")
else:
    print("No raco_result to save.")

## Step 9: Location Enrichment Verification

Verify that `LocationParser` correctly populates `postal_code`, `state_or_region`, and `coordinates` on ingested events.

In [None]:
# Location enrichment coverage analysis
print("=" * 60)
print("LOCATION ENRICHMENT VERIFICATION")
print("=" * 60)

events = raco_result.events

# Count fields
has_coords = [e for e in events if e.location.coordinates is not None]
has_postal = [e for e in events if e.location.postal_code is not None]
has_state = [e for e in events if e.location.state_or_region is not None]
has_address = [e for e in events if e.location.street_address is not None]

total = len(events)
print(f"Total events: {total}")
print(f"Events with street_address: {len(has_address)} ({100*len(has_address)/total:.1f}%)")
print(f"Events with postal_code:    {len(has_postal)} ({100*len(has_postal)/total:.1f}%)")
print(f"Events with state_or_region:{len(has_state)} ({100*len(has_state)/total:.1f}%)")
print(f"Events with coordinates:    {len(has_coords)} ({100*len(has_coords)/total:.1f}%)")

# Show sample LocationInfo with enriched fields
print(f"\n--- Sample Enriched Locations ---")
for e in events[:5]:
    loc = e.location
    coord_str = f"({loc.coordinates.latitude}, {loc.coordinates.longitude})" if loc.coordinates else "None"
    print(f"  {e.title[:50]}")
    print(f"    venue: {loc.venue_name} | city: {loc.city} | country: {loc.country_code}")
    print(f"    address: {loc.street_address}")
    print(f"    postal_code: {loc.postal_code} | state: {loc.state_or_region}")
    print(f"    coordinates: {coord_str}")
    print()

# Assert postal_code coverage > 50% (only for events that have street_address)
if has_address:
    postal_rate = len(has_postal) / len(has_address)
    print(f"Postal code coverage (of events with address): {postal_rate:.1%}")
    assert postal_rate > 0.5, f"Postal code coverage too low: {postal_rate:.1%} (expected > 50%)"
    print("PASS: postal_code coverage > 50%")
else:
    print("WARNING: No events with street_address — cannot verify postal code extraction")

## Cleanup

In [None]:
# Close pipeline resources
ra_co.close()
print("Resources released.")