Postgres Canary Test Env by alexdriedger · Pull Request #19 · pontoon-data/Pontoon

alexdriedger · 2025-07-29T08:34:14Z

Moved docker-compose.test-pg.yml to it's own test-env directory
Added 3 tables that automatically get created for the source database in the test docker compose
Added a mock data generator to the test docker compose that writes 100 records to each of the 3 tables every 5 minutes
Fixed a variety of sql errors with reading/writing to postgres source/destinations in the canary
Fixed an issue with booleans in the sqllite cache
Fixed an issue with date column types not being compatible on the second write
Updated schema checking to allow for schemas to have different column order
Added a lot of unit tests for the sql connectors

kmacrow

Left some comments but approving - test env looks great. The AI unit tests are kind of slop imho, but it's fine for now. The only thing that might be worth a follow up look is the datetime.time mapping issue, but it's not worse than it already was.

kmacrow · 2025-07-29T13:38:56Z

        bytes: pa.binary(),
-        datetime.date: pa.date32(),
+        date: pa.date32(),
        datetime.time: pa.time64('us'),


I think there's an issue here that's similar to the previous bug I had with the date type:

>>> import datetime >>> datetime.time <class 'datetime.time'> # we want this >>> from datetime import datetime >>> datetime.time <method 'time' of 'datetime.datetime' objects> # we're getting this

The workaround I had was to do from datetime import datetime, timedelta, timezone, date, time as dt_time so it doesn't conflict with time but you get the type, not the method

kmacrow · 2025-07-29T13:39:45Z

        UUID: str,
        Decimal: float,
+        dict: str,  # JSONB and other dict-like types get converted to string
+        date: date,  # Date type maps to itself (already in PY_TO_PYARROW_MAP)


This 1:1 mapping doesn't hurt anything but it's kind of redundant, maybe confusing

kmacrow · 2025-07-29T13:46:05Z

            if py_type is datetime:
                return val.astimezone(timezone.utc)
+            elif py_type is date:
+                return val  # date objects should be passed through as-is


I'm not sure this special case is needed for date? - the val would be returned as-is for date anyway if that 1:1 mapping wasn't in the conversion table. The datetime has that special case because it needs the timezone adjustment and I didn't have a generalized way to mutate the val on the way through

kmacrow · 2025-07-29T13:53:52Z


+            # Check if there are any records to process
+            stream_size = ds.size(stream)
+            if stream_size == 0:


We should really add this to all of the destinations

kmacrow · 2025-07-29T13:57:45Z

        Boolean: pa.bool_(),
-        Date: pa.timestamp('us', tz='UTC'),
+        Date: pa.date32(),
        Time: pa.timestamp('us', tz='UTC'),


I think Time needs to be mapped to pa.time64('us') here (and inverse in the PYARROW_TO_ALCHEMY table?

kmacrow · 2025-07-29T14:10:10Z

+from sqlalchemy.dialects.postgresql import UUID, JSONB, TIMESTAMP
+
+
+class TestPostgresSource:


There are some good test ideas in here, but I'd probably want to refactor it at some point:

I think a bunch of these test cases should be in the Stream tests

the SqliteCache cases should probably be in a class of their own, independent of any specific connector

there's some kind of duplication of the code/logic being tested vs. actually calling the real code

some of the cases are Postgres specific (JSONB) but most could probably be generic SQLSource tests

there's quite a bit of SQL destination testing that should be in a different class

modules being imported in methods - sometimes valid use for that but I'm not sure it's needed here, and kind of obscures what's being tested looking at the filename and header

alexdriedger added 7 commits July 27, 2025 22:42

Mock data generator for local canary

e7b10f9

Add destination schema and user for transfer

c0469bd

Merge branch 'master' into test-postgres-canary

b9f5850

Update transfer library to use dates

ae78293

Fix bool col

09ce7f2

Fixing canary transfer

6b26c7c

Update date types for sql destinations

49a53a3

kmacrow approved these changes Jul 29, 2025

View reviewed changes

kmacrow merged commit 6c2ff2b into master Jul 29, 2025

kmacrow deleted the data-transfer-dates branch July 29, 2025 15:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Postgres Canary Test Env#19

Postgres Canary Test Env#19
kmacrow merged 7 commits intomasterfrom
data-transfer-dates

alexdriedger commented Jul 29, 2025

Uh oh!

kmacrow left a comment

Uh oh!

kmacrow Jul 29, 2025

Uh oh!

kmacrow Jul 29, 2025

Uh oh!

kmacrow Jul 29, 2025

Uh oh!

kmacrow Jul 29, 2025

Uh oh!

kmacrow Jul 29, 2025

Uh oh!

kmacrow Jul 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		from sqlalchemy.dialects.postgresql import UUID, JSONB, TIMESTAMP


		class TestPostgresSource:

Conversation

alexdriedger commented Jul 29, 2025

Uh oh!

kmacrow left a comment

Choose a reason for hiding this comment

Uh oh!

kmacrow Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

kmacrow Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

kmacrow Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

kmacrow Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

kmacrow Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

kmacrow Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants