feat: TableauUpload (new Odin job for publishing data to Tableau) by ealexa05 · Pull Request #78 · mbta/odin

ealexa05 · 2025-12-04T18:58:26Z

Summary

This PR introduces a new Odin job class (TableauUpload), which coordinates incremental batch uploading of Cubic ODS data to Tableau.

During a run, Odin iterates through TABLES_TO_SYNC list, and sets up a TableauUpload job for each in the schedule. Each job then:

Downloads the high watermark checkpoint from s3://{bucket}/odin/state/tableau_checkpoints.json if available
Scans S3 for all parquet files across partitions (e.g., odin_year=2023/, odin_year=2024/)
Filters found data by metadata to only download files which contain new records
Applies table-specific rules for type casting and column dropping using Polars LazyFrames
Constructs hyper files and publishes to Tableau in batches

Each job runs on a recurring schedule, based on its exit status:

retries in 4 hours after a successful sync with data
retries in 12 hours if no new data is found

Adding and configuring tables

To start syncing a new table, add the table name to TABLES_TO_SYNC and, if applicable, to TABLE_CONFIG.

TABLE_CONFIG includes:

casts: Column type overrides (e.g., force token_id to Int64)
drops: Columns to exclude (e.g., sensitive data like restricted_purse_id, or just unneeded columns to cut size)
index_column: The monotonically increasing column used for watermarking

If you have included an index_column for your table, you can also specify a minimum value in s3://{bucket}/odin/state/tableau_checkpoints.json prior to the first sync so that this job will ignore all data before that point. tableau_checkpoints.json is a file containing table names and high watermarks, e.g. {"EDW.ABP_TAP": 2404993, "EDW.TRAINS": 94128}

Dependencies

This job introduces 2 additional dependencies:

tableauhyperapi (for building hyper extract files)
tableauserverclient (for publishing to the Tableau server)

Unit tests

These tests assert that the job correctly handles:

type conversion (including both explictly supported and unknown types)
hyper file creation and schema conversion
nested project id resolution
TABLE_CONFIG structure

Change history

Change log for 4364f75

Rather than overwriting by default, pipeline now checks state of table on S3 and appends only new data to existing Tableau data source. This works by

When each batched hyper file is successfully sent to Tableau, the max value of index_column is recorded to s3://{bucket}/odin/state/tableau_checkpoints.json under the table name
If new data is added to synced S3 file, only new data (per index_column) is uploaded

Also, the pipeline now points to tables rather than individual files. For each table, it finds all contained partitions and files and scans parquet metadata to determine if it contains relevant data based on index_column

Additionally:

Synced files are split into separate hyper files if they are larger than BATCH_SIZE (500_000)
Casting and dropping of columns is now handled in a lazyframe for memory efficiency
index_column can now be defined in TABLE_CONFIG. Index must be numeric and monotonically increasing
--overwrite-table can be used to force a whole table to resync and overwrite
You can also manually provide a value in s3://{bucket}/odin/state/tableau_checkpoints.json to start syncing after that point

Change log for e075e27

Implements TableauUpload, which extends OdinJob, allowing Odin to run Tableau uploads on a schedule. Structure is based on ods_fact.py. Key features

Job scheduler sets up each table on its own retry schedule (with shorter reruns for tables that synced data during the last run)
Tableau upload jobs run only if configured in run.py
Errors are handled appropriately

Change log for a93d078

Overhaul of logging functions to use ProcessLog consistently (and in line with other Odin jobs).

You can now filter by uuid in Splunk to get all logs relating to a given iteration of a function
Logging conforms to pattern of log.add_metadata / log.add_metadata / log.failed for respective circumstances

Testing plan:

Modify time constants to something low so I can observe rerun behavior
Start job for table=EDW.JOURNAL_ENTRY, allow to complete initial transfer of data
Allow to rerun. Likely no new data will have synced, as dev tables are not as often updated. See that no new data is synced
Before next run, update watermark in s3://mbta-ctd-dataplatform-dev-springboard/odin/state/tableau_checkpoints.json to a lower value
Allow to rerun again. See that it syncs over data newer than updated watermark as expected
Delete data source from Tableau and watermark from json state file

Logs for testing run 1: Testing job rerun and incremental updating

➜  odin git:(odin-tableau-job) ✗ poetry run python src/odin/run.py                                                                              
2026-01-08T16:57:04-0500     INFO uuid=e0a197b6-9f10-402a-837b-ca7f4a84b735, parent=odin, process=validate_env_vars, process_id=3920, status=started, disk_free_mb=271915, sys_mem_free_pct=17, proc_mem_used_mb=114
2026-01-08T16:57:04-0500     INFO uuid=e0a197b6-9f10-402a-837b-ca7f4a84b735, parent=odin, process=validate_env_vars, process_id=3920, status=complete, disk_free_mb=271915, sys_mem_free_pct=17, proc_mem_used_mb=114, duration=0.00, DATA_ERROR=mbta-ctd-dataplatform-dev-error, DATA_SPRINGBOARD=mbta-ctd-dataplatform-dev-springboard, DATA_INCOMING=mbta-ctd-dataplatform-dev-incoming, AFC_API_CLIENT_ID=**********, AFC_API_CLIENT_SECRET=**********, DATA_ARCHIVE=mbta-ctd-dataplatform-dev-archive
2026-01-08T16:57:04-0500     INFO uuid=7e858d91-8a9b-48d3-98b3-5e80c67e9d44, parent=odin, process=odin_event_loop, process_id=3920, status=started, disk_free_mb=271915, sys_mem_free_pct=17, proc_mem_used_mb=114
2026-01-08T16:57:05-0500     INFO uuid=f667685a-1e23-4e32-b6ab-de05da138c34, parent=odin, process=TableauUpload, process_id=3926, status=started, disk_free_mb=271915, sys_mem_free_pct=15, proc_mem_used_mb=114, table=EDW.JOURNAL_ENTRY
2026-01-08T16:57:05-0500     INFO Fetching watermark for EDW.JOURNAL_ENTRY from s3://mbta-ctd-dataplatform-dev-springboard/odin/state/tableau_checkpoints.json
2026-01-08T16:57:05-0500     INFO Current watermark: 1070000
2026-01-08T16:57:05-0500     INFO uuid=217fa607-3ad0-4748-b41b-6cd42ef42beb, parent=odin, process=discover_partitions, process_id=3926, status=started, disk_free_mb=271915, sys_mem_free_pct=15, proc_mem_used_mb=133, table=EDW.JOURNAL_ENTRY
2026-01-08T16:57:05-0500     INFO Discovering partitions under s3://mbta-ctd-dataplatform-dev-springboard/odin/data/cubic/ods/EDW.JOURNAL_ENTRY/
2026-01-08T16:57:05-0500     INFO uuid=0af1bac2-5598-40b8-a425-5fa961bf7141, parent=odin, process=list_partitions, process_id=3926, status=started, disk_free_mb=271915, sys_mem_free_pct=15, proc_mem_used_mb=133, partition=s3://mbta-ctd-dataplatform-dev-springboard/odin/data/cubic/ods/EDW.JOURNAL_ENTRY/, max_objects=10000
2026-01-08T16:57:05-0500     INFO uuid=0af1bac2-5598-40b8-a425-5fa961bf7141, parent=odin, process=list_partitions, process_id=3926, status=complete, disk_free_mb=271915, sys_mem_free_pct=15, proc_mem_used_mb=140, duration=0.19, partition=s3://mbta-ctd-dataplatform-dev-springboard/odin/data/cubic/ods/EDW.JOURNAL_ENTRY/, max_objects=10000, partitions_found=2
2026-01-08T16:57:05-0500     INFO Found 2 partition(s): ['odin_year=2024', 'odin_year=2025']
2026-01-08T16:57:05-0500     INFO uuid=b21bf878-6422-4af0-b404-476a18a8ea38, parent=odin, process=list_objects, process_id=3926, status=started, disk_free_mb=271915, sys_mem_free_pct=15, proc_mem_used_mb=140, partition=s3://mbta-ctd-dataplatform-dev-springboard/odin/data/cubic/ods/EDW.JOURNAL_ENTRY/odin_year=2024/, max_objects=1000000, in_filter=.parquet
2026-01-08T16:57:05-0500     INFO uuid=b21bf878-6422-4af0-b404-476a18a8ea38, parent=odin, process=list_objects, process_id=3926, status=complete, disk_free_mb=271915, sys_mem_free_pct=15, proc_mem_used_mb=140, duration=0.03, partition=s3://mbta-ctd-dataplatform-dev-springboard/odin/data/cubic/ods/EDW.JOURNAL_ENTRY/odin_year=2024/, max_objects=1000000, in_filter=.parquet, objects_found=1
2026-01-08T16:57:05-0500     INFO uuid=947d24c3-356c-4529-8da1-3812a6de0019, parent=odin, process=list_objects, process_id=3926, status=started, disk_free_mb=271915, sys_mem_free_pct=15, proc_mem_used_mb=140, partition=s3://mbta-ctd-dataplatform-dev-springboard/odin/data/cubic/ods/EDW.JOURNAL_ENTRY/odin_year=2025/, max_objects=1000000, in_filter=.parquet
2026-01-08T16:57:05-0500     INFO uuid=947d24c3-356c-4529-8da1-3812a6de0019, parent=odin, process=list_objects, process_id=3926, status=complete, disk_free_mb=271915, sys_mem_free_pct=15, proc_mem_used_mb=140, duration=0.04, partition=s3://mbta-ctd-dataplatform-dev-springboard/odin/data/cubic/ods/EDW.JOURNAL_ENTRY/odin_year=2025/, max_objects=1000000, in_filter=.parquet, objects_found=1
2026-01-08T16:57:05-0500     INFO Total parquet files discovered: 2
2026-01-08T16:57:05-0500     INFO uuid=15a998ac-f50e-405c-a4a5-3fb757c46ae1, parent=odin, process=filter_files_by_metadata, process_id=3926, status=started, disk_free_mb=271915, sys_mem_free_pct=15, proc_mem_used_mb=140, table=EDW.JOURNAL_ENTRY, watermark=1070000
2026-01-08T16:57:05-0500     INFO Filtering files using metadata where journal_entry_key max > 1070000
2026-01-08T16:57:05-0500     INFO Skip s3://mbta-ctd-dataplatform-dev-springboard/odin/data/cubic/ods/EDW.JOURNAL_ENTRY/odin_year=2024/year_001.parquet: max=827909 <= watermark=1070000
2026-01-08T16:57:06-0500     INFO Include s3://mbta-ctd-dataplatform-dev-springboard/odin/data/cubic/ods/EDW.JOURNAL_ENTRY/odin_year=2025/year_001.parquet: max=1075187 > watermark=1070000
2026-01-08T16:57:06-0500     INFO Files after metadata filtering: 1 of 2
2026-01-08T16:57:06-0500     INFO uuid=71decd24-ca31-494c-ab65-28d7933ecca2, parent=odin, process=load_filtered_data, process_id=3926, status=started, disk_free_mb=271915, sys_mem_free_pct=15, proc_mem_used_mb=147, table=EDW.JOURNAL_ENTRY, file_count=1
2026-01-08T16:57:06-0500     INFO Loading data from 1 S3 files
2026-01-08T16:57:06-0500     INFO Applying filter: journal_entry_key > 1070000
2026-01-08T16:57:06-0500     INFO uuid=a3e1a750-cb21-4144-abb8-73ce8e698e52, parent=odin, process=process_batches, process_id=3926, status=started, disk_free_mb=271915, sys_mem_free_pct=15, proc_mem_used_mb=147, table=EDW.JOURNAL_ENTRY, total_rows=5166
2026-01-08T16:57:06-0500     INFO Processing 5166 rows for EDW.JOURNAL_ENTRY in batches of 500000
2026-01-08T16:57:06-0500     INFO uuid=d646751e-73f2-4ce6-bc13-489395b4334c, parent=odin, process=process_batch, process_id=3926, status=started, disk_free_mb=271915, sys_mem_free_pct=15, proc_mem_used_mb=147, table=EDW.JOURNAL_ENTRY, batch_num=1, offset=0, batch_end=5166
2026-01-08T16:57:06-0500     INFO Preparing batch 1 (rows 0 to 5166)
2026-01-08T16:57:07-0500     INFO Hyper extract contains 5166 rows
2026-01-08T16:57:07-0500     INFO Publishing batch 1 to Tableau (Mode: Append)
2026-01-08T16:57:11-0500     INFO Updated watermark for EDW.JOURNAL_ENTRY to 1075187
2026-01-08T16:57:11-0500     INFO uuid=f667685a-1e23-4e32-b6ab-de05da138c34, parent=odin, process=TableauUpload, process_id=3926, status=complete, disk_free_mb=271926, sys_mem_free_pct=16, proc_mem_used_mb=188, duration=5.96, table=EDW.JOURNAL_ENTRY, run_delay_mins=2.00, overwrite=False
2026-01-08T16:59:12-0500     INFO uuid=78b42042-5586-4c15-8aea-6e9fdf9ab530, parent=odin, process=TableauUpload, process_id=4222, status=started, disk_free_mb=271919, sys_mem_free_pct=15, proc_mem_used_mb=114, table=EDW.JOURNAL_ENTRY
2026-01-08T16:59:12-0500     INFO Fetching watermark for EDW.JOURNAL_ENTRY from s3://mbta-ctd-dataplatform-dev-springboard/odin/state/tableau_checkpoints.json
2026-01-08T16:59:12-0500     INFO Current watermark: 1075187
2026-01-08T16:59:12-0500     INFO uuid=a37aef1a-2270-470b-afe3-9b0f5cb8f2c8, parent=odin, process=discover_partitions, process_id=4222, status=started, disk_free_mb=271919, sys_mem_free_pct=14, proc_mem_used_mb=122, table=EDW.JOURNAL_ENTRY
2026-01-08T16:59:12-0500     INFO Discovering partitions under s3://mbta-ctd-dataplatform-dev-springboard/odin/data/cubic/ods/EDW.JOURNAL_ENTRY/
2026-01-08T16:59:12-0500     INFO uuid=55387621-4c28-49fe-aec7-79493578e589, parent=odin, process=list_partitions, process_id=4222, status=started, disk_free_mb=271919, sys_mem_free_pct=14, proc_mem_used_mb=122, partition=s3://mbta-ctd-dataplatform-dev-springboard/odin/data/cubic/ods/EDW.JOURNAL_ENTRY/, max_objects=10000
2026-01-08T16:59:12-0500     INFO uuid=55387621-4c28-49fe-aec7-79493578e589, parent=odin, process=list_partitions, process_id=4222, status=complete, disk_free_mb=271919, sys_mem_free_pct=14, proc_mem_used_mb=133, duration=0.20, partition=s3://mbta-ctd-dataplatform-dev-springboard/odin/data/cubic/ods/EDW.JOURNAL_ENTRY/, max_objects=10000, partitions_found=2
2026-01-08T16:59:12-0500     INFO Found 2 partition(s): ['odin_year=2024', 'odin_year=2025']
2026-01-08T16:59:12-0500     INFO uuid=e12473a1-91d2-471e-b900-318472327659, parent=odin, process=list_objects, process_id=4222, status=started, disk_free_mb=271919, sys_mem_free_pct=14, proc_mem_used_mb=133, partition=s3://mbta-ctd-dataplatform-dev-springboard/odin/data/cubic/ods/EDW.JOURNAL_ENTRY/odin_year=2024/, max_objects=1000000, in_filter=.parquet
2026-01-08T16:59:12-0500     INFO uuid=e12473a1-91d2-471e-b900-318472327659, parent=odin, process=list_objects, process_id=4222, status=complete, disk_free_mb=271919, sys_mem_free_pct=14, proc_mem_used_mb=133, duration=0.04, partition=s3://mbta-ctd-dataplatform-dev-springboard/odin/data/cubic/ods/EDW.JOURNAL_ENTRY/odin_year=2024/, max_objects=1000000, in_filter=.parquet, objects_found=1
2026-01-08T16:59:12-0500     INFO uuid=5831f38a-70d8-4c66-a480-68ae496f2299, parent=odin, process=list_objects, process_id=4222, status=started, disk_free_mb=271919, sys_mem_free_pct=14, proc_mem_used_mb=133, partition=s3://mbta-ctd-dataplatform-dev-springboard/odin/data/cubic/ods/EDW.JOURNAL_ENTRY/odin_year=2025/, max_objects=1000000, in_filter=.parquet
2026-01-08T16:59:12-0500     INFO uuid=5831f38a-70d8-4c66-a480-68ae496f2299, parent=odin, process=list_objects, process_id=4222, status=complete, disk_free_mb=271919, sys_mem_free_pct=14, proc_mem_used_mb=130, duration=0.04, partition=s3://mbta-ctd-dataplatform-dev-springboard/odin/data/cubic/ods/EDW.JOURNAL_ENTRY/odin_year=2025/, max_objects=1000000, in_filter=.parquet, objects_found=1
2026-01-08T16:59:12-0500     INFO Total parquet files discovered: 2
2026-01-08T16:59:12-0500     INFO uuid=9d216c1c-23cf-4cbe-bb55-73d74874bb03, parent=odin, process=filter_files_by_metadata, process_id=4222, status=started, disk_free_mb=271919, sys_mem_free_pct=14, proc_mem_used_mb=130, table=EDW.JOURNAL_ENTRY, watermark=1075187
2026-01-08T16:59:12-0500     INFO Filtering files using metadata where journal_entry_key max > 1075187
2026-01-08T16:59:13-0500     INFO Skip s3://mbta-ctd-dataplatform-dev-springboard/odin/data/cubic/ods/EDW.JOURNAL_ENTRY/odin_year=2024/year_001.parquet: max=827909 <= watermark=1075187
2026-01-08T16:59:13-0500     INFO Skip s3://mbta-ctd-dataplatform-dev-springboard/odin/data/cubic/ods/EDW.JOURNAL_ENTRY/odin_year=2025/year_001.parquet: max=1075187 <= watermark=1075187
2026-01-08T16:59:13-0500     INFO Files after metadata filtering: 0 of 2
2026-01-08T16:59:13-0500     INFO No files contain data above watermark for EDW.JOURNAL_ENTRY
2026-01-08T16:59:13-0500     INFO uuid=78b42042-5586-4c15-8aea-6e9fdf9ab530, parent=odin, process=TableauUpload, process_id=4222, status=complete, disk_free_mb=271918, sys_mem_free_pct=14, proc_mem_used_mb=137, duration=1.08, table=EDW.JOURNAL_ENTRY, run_delay_mins=1.00, overwrite=False
2026-01-08T17:00:14-0500     INFO uuid=8f2f75f4-0c6f-40d3-8306-2c6bdd15f3e3, parent=odin, process=TableauUpload, process_id=4396, status=started, disk_free_mb=271919, sys_mem_free_pct=16, proc_mem_used_mb=114, table=EDW.JOURNAL_ENTRY
2026-01-08T17:00:14-0500     INFO Fetching watermark for EDW.JOURNAL_ENTRY from s3://mbta-ctd-dataplatform-dev-springboard/odin/state/tableau_checkpoints.json
2026-01-08T17:00:14-0500     INFO Current watermark: 1070000
2026-01-08T17:00:14-0500     INFO uuid=179fa00c-de4c-42ca-a3ab-bef0b614a104, parent=odin, process=discover_partitions, process_id=4396, status=started, disk_free_mb=271919, sys_mem_free_pct=16, proc_mem_used_mb=130, table=EDW.JOURNAL_ENTRY
2026-01-08T17:00:14-0500     INFO Discovering partitions under s3://mbta-ctd-dataplatform-dev-springboard/odin/data/cubic/ods/EDW.JOURNAL_ENTRY/
2026-01-08T17:00:14-0500     INFO uuid=f7698cb2-7446-4eab-88d1-46d88ae095c9, parent=odin, process=list_partitions, process_id=4396, status=started, disk_free_mb=271919, sys_mem_free_pct=16, proc_mem_used_mb=130, partition=s3://mbta-ctd-dataplatform-dev-springboard/odin/data/cubic/ods/EDW.JOURNAL_ENTRY/, max_objects=10000
2026-01-08T17:00:14-0500     INFO uuid=f7698cb2-7446-4eab-88d1-46d88ae095c9, parent=odin, process=list_partitions, process_id=4396, status=complete, disk_free_mb=271919, sys_mem_free_pct=16, proc_mem_used_mb=142, duration=0.19, partition=s3://mbta-ctd-dataplatform-dev-springboard/odin/data/cubic/ods/EDW.JOURNAL_ENTRY/, max_objects=10000, partitions_found=2
2026-01-08T17:00:14-0500     INFO Found 2 partition(s): ['odin_year=2024', 'odin_year=2025']
2026-01-08T17:00:14-0500     INFO uuid=787d6211-3de4-4327-96a0-6baa3b40c05d, parent=odin, process=list_objects, process_id=4396, status=started, disk_free_mb=271919, sys_mem_free_pct=16, proc_mem_used_mb=142, partition=s3://mbta-ctd-dataplatform-dev-springboard/odin/data/cubic/ods/EDW.JOURNAL_ENTRY/odin_year=2024/, max_objects=1000000, in_filter=.parquet
2026-01-08T17:00:14-0500     INFO uuid=787d6211-3de4-4327-96a0-6baa3b40c05d, parent=odin, process=list_objects, process_id=4396, status=complete, disk_free_mb=271919, sys_mem_free_pct=16, proc_mem_used_mb=142, duration=0.05, partition=s3://mbta-ctd-dataplatform-dev-springboard/odin/data/cubic/ods/EDW.JOURNAL_ENTRY/odin_year=2024/, max_objects=1000000, in_filter=.parquet, objects_found=1
2026-01-08T17:00:14-0500     INFO uuid=2f09b93c-f18f-41c1-bdba-9f52a7df2ad9, parent=odin, process=list_objects, process_id=4396, status=started, disk_free_mb=271919, sys_mem_free_pct=16, proc_mem_used_mb=142, partition=s3://mbta-ctd-dataplatform-dev-springboard/odin/data/cubic/ods/EDW.JOURNAL_ENTRY/odin_year=2025/, max_objects=1000000, in_filter=.parquet
2026-01-08T17:00:14-0500     INFO uuid=2f09b93c-f18f-41c1-bdba-9f52a7df2ad9, parent=odin, process=list_objects, process_id=4396, status=complete, disk_free_mb=271919, sys_mem_free_pct=16, proc_mem_used_mb=139, duration=0.04, partition=s3://mbta-ctd-dataplatform-dev-springboard/odin/data/cubic/ods/EDW.JOURNAL_ENTRY/odin_year=2025/, max_objects=1000000, in_filter=.parquet, objects_found=1
2026-01-08T17:00:14-0500     INFO Total parquet files discovered: 2
2026-01-08T17:00:14-0500     INFO uuid=8513611f-2514-4b90-aa7e-c39970679e70, parent=odin, process=filter_files_by_metadata, process_id=4396, status=started, disk_free_mb=271919, sys_mem_free_pct=16, proc_mem_used_mb=139, table=EDW.JOURNAL_ENTRY, watermark=1070000
2026-01-08T17:00:14-0500     INFO Filtering files using metadata where journal_entry_key max > 1070000
2026-01-08T17:00:15-0500     INFO Skip s3://mbta-ctd-dataplatform-dev-springboard/odin/data/cubic/ods/EDW.JOURNAL_ENTRY/odin_year=2024/year_001.parquet: max=827909 <= watermark=1070000
2026-01-08T17:00:15-0500     INFO Include s3://mbta-ctd-dataplatform-dev-springboard/odin/data/cubic/ods/EDW.JOURNAL_ENTRY/odin_year=2025/year_001.parquet: max=1075187 > watermark=1070000
2026-01-08T17:00:15-0500     INFO Files after metadata filtering: 1 of 2
2026-01-08T17:00:15-0500     INFO uuid=ae0a7d41-1417-4f55-a486-a0dfd9db53f8, parent=odin, process=load_filtered_data, process_id=4396, status=started, disk_free_mb=271919, sys_mem_free_pct=15, proc_mem_used_mb=143, table=EDW.JOURNAL_ENTRY, file_count=1
2026-01-08T17:00:15-0500     INFO Loading data from 1 S3 files
2026-01-08T17:00:15-0500     INFO Applying filter: journal_entry_key > 1070000
2026-01-08T17:00:16-0500     INFO uuid=3697999d-b944-4b30-ba3f-662fdf82b56e, parent=odin, process=process_batches, process_id=4396, status=started, disk_free_mb=271919, sys_mem_free_pct=15, proc_mem_used_mb=157, table=EDW.JOURNAL_ENTRY, total_rows=5166
2026-01-08T17:00:16-0500     INFO Processing 5166 rows for EDW.JOURNAL_ENTRY in batches of 500000
2026-01-08T17:00:16-0500     INFO uuid=17c30fc2-6799-425e-8b9f-6752b34f8928, parent=odin, process=process_batch, process_id=4396, status=started, disk_free_mb=271919, sys_mem_free_pct=15, proc_mem_used_mb=157, table=EDW.JOURNAL_ENTRY, batch_num=1, offset=0, batch_end=5166
2026-01-08T17:00:16-0500     INFO Preparing batch 1 (rows 0 to 5166)
2026-01-08T17:00:16-0500     INFO Hyper extract contains 5166 rows
2026-01-08T17:00:16-0500     INFO Publishing batch 1 to Tableau (Mode: Append)
2026-01-08T17:00:21-0500     INFO Updated watermark for EDW.JOURNAL_ENTRY to 1075187
2026-01-08T17:00:21-0500     INFO uuid=8f2f75f4-0c6f-40d3-8306-2c6bdd15f3e3, parent=odin, process=TableauUpload, process_id=4396, status=complete, disk_free_mb=271921, sys_mem_free_pct=15, proc_mem_used_mb=235, duration=6.70, table=EDW.JOURNAL_ENTRY, run_delay_mins=2.00, overwrite=False
...

Logs for testing run 2: Deleting Tableau table and watermark and rerunning

➜  odin git:(odin-tableau-job) ✗ poetry run python src/odin/ingestion/tableau/tableau_upload_test.py
2026-01-09T11:51:19-0500     INFO uuid=7bb338f5-f8fe-4d3b-ae83-b554f804e951, parent=odin, process=TableauUpload, process_id=31579, status=started, disk_free_mb=268748, sys_mem_free_pct=17, proc_mem_used_mb=99, table=EDW.JOURNAL_ENTRY
2026-01-09T11:51:19-0500     INFO Fetching watermark for EDW.JOURNAL_ENTRY from s3://mbta-ctd-dataplatform-dev-springboard/odin/state/tableau_checkpoints.json
2026-01-09T11:51:19-0500     INFO Current watermark: None
2026-01-09T11:51:19-0500     INFO uuid=538dd11f-fc95-4916-8336-ca85d51e1ad1, parent=odin, process=discover_partitions, process_id=31579, status=started, disk_free_mb=268748, sys_mem_free_pct=17, proc_mem_used_mb=119, table=EDW.JOURNAL_ENTRY
2026-01-09T11:51:19-0500     INFO Discovering partitions under s3://mbta-ctd-dataplatform-dev-springboard/odin/data/cubic/ods/EDW.JOURNAL_ENTRY/
2026-01-09T11:51:19-0500     INFO uuid=97576ad5-8f4c-4061-9231-994f7e23c8e6, parent=odin, process=list_partitions, process_id=31579, status=started, disk_free_mb=268748, sys_mem_free_pct=17, proc_mem_used_mb=119, partition=s3://mbta-ctd-dataplatform-dev-springboard/odin/data/cubic/ods/EDW.JOURNAL_ENTRY/, max_objects=10000
2026-01-09T11:51:19-0500     INFO uuid=97576ad5-8f4c-4061-9231-994f7e23c8e6, parent=odin, process=list_partitions, process_id=31579, status=complete, disk_free_mb=268748, sys_mem_free_pct=17, proc_mem_used_mb=131, duration=0.21, partition=s3://mbta-ctd-dataplatform-dev-springboard/odin/data/cubic/ods/EDW.JOURNAL_ENTRY/, max_objects=10000, partitions_found=2
2026-01-09T11:51:19-0500     INFO Found 2 partition(s): ['odin_year=2024', 'odin_year=2025']
2026-01-09T11:51:19-0500     INFO uuid=43423430-8946-43c9-b6de-57c7823f0fe4, parent=odin, process=list_objects, process_id=31579, status=started, disk_free_mb=268748, sys_mem_free_pct=17, proc_mem_used_mb=131, partition=s3://mbta-ctd-dataplatform-dev-springboard/odin/data/cubic/ods/EDW.JOURNAL_ENTRY/odin_year=2024/, max_objects=1000000, in_filter=.parquet
2026-01-09T11:51:19-0500     INFO uuid=43423430-8946-43c9-b6de-57c7823f0fe4, parent=odin, process=list_objects, process_id=31579, status=complete, disk_free_mb=268748, sys_mem_free_pct=17, proc_mem_used_mb=131, duration=0.04, partition=s3://mbta-ctd-dataplatform-dev-springboard/odin/data/cubic/ods/EDW.JOURNAL_ENTRY/odin_year=2024/, max_objects=1000000, in_filter=.parquet, objects_found=1
2026-01-09T11:51:19-0500     INFO uuid=d073200b-a934-4b1a-84c6-0ace3c84e0b3, parent=odin, process=list_objects, process_id=31579, status=started, disk_free_mb=268748, sys_mem_free_pct=17, proc_mem_used_mb=131, partition=s3://mbta-ctd-dataplatform-dev-springboard/odin/data/cubic/ods/EDW.JOURNAL_ENTRY/odin_year=2025/, max_objects=1000000, in_filter=.parquet
2026-01-09T11:51:19-0500     INFO uuid=d073200b-a934-4b1a-84c6-0ace3c84e0b3, parent=odin, process=list_objects, process_id=31579, status=complete, disk_free_mb=268748, sys_mem_free_pct=17, proc_mem_used_mb=131, duration=0.04, partition=s3://mbta-ctd-dataplatform-dev-springboard/odin/data/cubic/ods/EDW.JOURNAL_ENTRY/odin_year=2025/, max_objects=1000000, in_filter=.parquet, objects_found=1
2026-01-09T11:51:19-0500     INFO Total parquet files discovered: 2
2026-01-09T11:51:19-0500     INFO uuid=9c83d1d3-f923-44c5-b581-e91c0dc072a3, parent=odin, process=load_filtered_data, process_id=31579, status=started, disk_free_mb=268748, sys_mem_free_pct=17, proc_mem_used_mb=131, table=EDW.JOURNAL_ENTRY, file_count=2
2026-01-09T11:51:19-0500     INFO Loading data from 2 S3 files
2026-01-09T11:51:20-0500     INFO uuid=ef3c6178-319b-4373-8a4f-0e1666529786, parent=odin, process=process_batches, process_id=31579, status=started, disk_free_mb=268747, sys_mem_free_pct=16, proc_mem_used_mb=150, table=EDW.JOURNAL_ENTRY, total_rows=272634
2026-01-09T11:51:20-0500     INFO Processing 272634 rows for EDW.JOURNAL_ENTRY in batches of 500000
2026-01-09T11:51:20-0500     INFO uuid=6eedbdf2-899d-4f9c-a81b-328357c3891a, parent=odin, process=process_batch, process_id=31579, status=started, disk_free_mb=268747, sys_mem_free_pct=16, proc_mem_used_mb=150, table=EDW.JOURNAL_ENTRY, batch_num=1, offset=0, batch_end=272634
2026-01-09T11:51:20-0500     INFO Preparing batch 1 (rows 0 to 272634)
2026-01-09T11:51:22-0500     INFO Hyper extract contains 272634 rows
2026-01-09T11:51:22-0500     INFO Publishing batch 1 to Tableau (Mode: Overwrite)
2026-01-09T11:51:27-0500     INFO Updated watermark for EDW.JOURNAL_ENTRY to 1075187
2026-01-09T11:51:27-0500     INFO uuid=7bb338f5-f8fe-4d3b-ae83-b554f804e951, parent=odin, process=TableauUpload, process_id=31579, status=complete, disk_free_mb=268745, sys_mem_free_pct=15, proc_mem_used_mb=119, duration=8.76, table=EDW.JOURNAL_ENTRY, run_delay_mins=240.00, overwrite=False

Results

Initial data transfer completes
Job scheduler assigns appropriate resync times to each table individually
No data is downloaded from S3 or transmitted to Tableau when no data newer than watermark is available
Updating watermark between runs causes script to sync over data newer than updated watermark as expected

Remaining features:

Update names for clarity
Remove --table arg (also redundant)
Remove run as main
Write full, human readable docs at top that can be easily understood exactly what this is doing and how/why (e.g., it solves X problem by having Y config)
Add full, human readable docs from above to Notion (https://www.notion.so/mbta-downtown-crossing/Data-Platform-9f78ea9ad675432c87ab08d6d38280c2)
Establish naming convention more broadly understandable in data engineering world
Overhaul logging functions to use ProcessLog consistently (and in line with other Odin jobs) and unify uuid across all logs relating to a given iteration of a function

Out of scope

Add optional whitelist to TABLE_CONFIG to more easily limit synced columns to only those needed
(maybe) Add update frequency to TABLE_CONFIG configurable per table
Remove --overwrite-table argument (not usable in prod, and redundant with watermark)

Notes

Do not upload to dev or prod yet. Need to coordinate with infra to get new required Tableau variables:

src/odin/ingestion/tableau/tableau_upload_test.py

src/odin/ingestion/tableau/tableau_upload.py

* Adding incrementality and batching to pipeline * Adding handling for partions (#88)

* Refactoring as Odin job * Adding Tableau job to Odin * Adding Tableau variables to required startup vars

runkelcorey

this looks more or less fine but it leaves me wondering: is there a reason you didn't include unit tests for these functions? the functions that don't do any uploading or downloading could be tested pretty easily and I think mocking the s3 and TSC clients would be pretty straightforward. speaking from LAMP's experience, I would love to have a test suite for our Tableau modules

src/odin/ingestion/tableau/tableau_upload.py

Co-authored-by: Corey Runkel <39202587+runkelcorey@users.noreply.github.com>

ealexa05 · 2026-01-15T20:49:31Z

this looks more or less fine but it leaves me wondering: is there a reason you didn't include unit tests for these functions? the functions that don't do any uploading or downloading could be tested pretty easily and I think mocking the s3 and TSC clients would be pretty straightforward. speaking from LAMP's experience, I would love to have a test suite for our Tableau modules

Just a short cut to get this out sooner, but it's a good point. Working on some basic coverage

…nversion, nested project id resolution, table conig structure

…ovements)

ealexa05 · 2026-01-21T19:22:28Z

@runkelcorey I added unit tests for the major things I think could get disrupted, and updated the logging. Could you give the new changes a look when you have a chance?

runkelcorey

these changes look great!

src/odin/ingestion/tableau/tableau_upload.py

.env.template

src/odin/ingestion/tableau/tableau_upload.py

…removing defaults in code

ealexa05 added 3 commits December 4, 2025 13:17

Init commit for Tableau upload test

4116fd9

Cleaning up

55523cc

Formatting

045e7c6

ealexa05 commented Dec 5, 2025

View reviewed changes

src/odin/ingestion/tableau/tableau_upload_test.py Outdated Show resolved Hide resolved

ealexa05 commented Dec 5, 2025

View reviewed changes

src/odin/ingestion/tableau/tableau_upload_test.py Outdated Show resolved Hide resolved

ealexa05 commented Dec 5, 2025

View reviewed changes

src/odin/ingestion/tableau/tableau_upload.py Show resolved Hide resolved

ealexa05 added 3 commits December 8, 2025 13:35

Adding Tableau dependencies

fc6e84b

adding generated requirements files

484db28

Solving type issues

58c5574

ealexa05 mentioned this pull request Dec 24, 2025

Adding incrementality and batching to Tableau pipeline #84

Merged

ealexa05 and others added 3 commits January 7, 2026 10:23

Adding incrementality and batching to Tableau pipeline (#84)

4364f75

* Adding incrementality and batching to pipeline * Adding handling for partions (#88)

Adding PAT token variables to .env.template

bb62e6b

Odin tableau job (#90)

e075e27

* Refactoring as Odin job * Adding Tableau job to Odin * Adding Tableau variables to required startup vars

ealexa05 changed the title ~~Tableau upload test~~ Job for publishing data to Tableau Jan 9, 2026

ealexa05 added 5 commits January 9, 2026 12:46

Updating name to match new name on Tableau

cc5a989

Updating name of file

236dca1

Updating name of file in import

6655e71

Removing debug run path

58dc1c5

Updating bucket config and renaming scrubrules -> TABLE_CONFIG

84202b7

ealexa05 changed the title ~~Job for publishing data to Tableau~~ New Odin job: publishing data to Tableau Jan 9, 2026

ealexa05 added 2 commits January 12, 2026 10:12

Renaming state json file

346ae35

Formatting lines

3666c88

ealexa05 changed the title ~~New Odin job: publishing data to Tableau~~ TableauUpload (new Odin job for publishing data to Tableau) Jan 12, 2026

Removing unnecessary env var / updating comments

5a1aa0f

ealexa05 requested a review from runkelcorey January 14, 2026 17:06

runkelcorey reviewed Jan 14, 2026

View reviewed changes

Enforce polars data types for casting

97b59e5

Co-authored-by: Corey Runkel <39202587+runkelcorey@users.noreply.github.com>

ealexa05 added 2 commits January 15, 2026 16:34

Fixing type check for TableConfig

af91d75

Adding unit tests (type conversion, hyper file creation and schema co…

b8f5ec7

…nversion, nested project id resolution, table conig structure

Fixing ruff format

b57950a

ealexa05 changed the title ~~TableauUpload (new Odin job for publishing data to Tableau)~~ feat: TableauUpload (new Odin job for publishing data to Tableau) Jan 16, 2026

Updating logging to unify uuid for each process (and general log impr…

a93d078

…ovements)

ealexa05 requested a review from runkelcorey January 21, 2026 19:20

runkelcorey approved these changes Jan 21, 2026

View reviewed changes

grejdi-mbta reviewed Jan 22, 2026

View reviewed changes

src/odin/ingestion/tableau/tableau_upload.py Outdated Show resolved Hide resolved

.env.template Show resolved Hide resolved

src/odin/ingestion/tableau/tableau_upload.py Outdated Show resolved Hide resolved

src/odin/ingestion/tableau/tableau_upload.py Show resolved Hide resolved

ealexa05 added 2 commits January 30, 2026 14:51

Uncommenting, alphabetizing tables

efc45b2

Adding TABLEAU_SERVER_URL/TABLEAU_WORKBOOK_PROJECT to .env.template, …

17f9881

…removing defaults in code

ealexa05 merged commit e7cfb35 into main Jan 30, 2026
5 checks passed

ealexa05 deleted the tableau_test branch January 30, 2026 20:04

ealexa05 mentioned this pull request Mar 18, 2026

feat: adding views to Tableau upload (and migrating to cloud) #114

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: TableauUpload (new Odin job for publishing data to Tableau)#78

feat: TableauUpload (new Odin job for publishing data to Tableau)#78
ealexa05 merged 24 commits intomainfrom
tableau_test

ealexa05 commented Dec 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

runkelcorey left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ealexa05 commented Jan 15, 2026 •

edited

Loading

Uh oh!

ealexa05 commented Jan 21, 2026

Uh oh!

runkelcorey left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ealexa05 commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Adding and configuring tables

Dependencies

Unit tests

Change history

Change log for 4364f75

Change log for e075e27

Change log for a93d078

Testing plan:

Results

Remaining features:

Out of scope

Notes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

runkelcorey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ealexa05 commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ealexa05 commented Jan 21, 2026

Uh oh!

runkelcorey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ealexa05 commented Dec 4, 2025 •

edited

Loading

ealexa05 commented Jan 15, 2026 •

edited

Loading