Conversation
kmacrow
left a comment
There was a problem hiding this comment.
Left some comments but approving - test env looks great. The AI unit tests are kind of slop imho, but it's fine for now. The only thing that might be worth a follow up look is the datetime.time mapping issue, but it's not worse than it already was.
| bytes: pa.binary(), | ||
| datetime.date: pa.date32(), | ||
| date: pa.date32(), | ||
| datetime.time: pa.time64('us'), |
There was a problem hiding this comment.
I think there's an issue here that's similar to the previous bug I had with the date type:
>>> import datetime
>>> datetime.time
<class 'datetime.time'> # we want this
>>> from datetime import datetime
>>> datetime.time
<method 'time' of 'datetime.datetime' objects> # we're getting this
The workaround I had was to do from datetime import datetime, timedelta, timezone, date, time as dt_time so it doesn't conflict with time but you get the type, not the method
| UUID: str, | ||
| Decimal: float, | ||
| dict: str, # JSONB and other dict-like types get converted to string | ||
| date: date, # Date type maps to itself (already in PY_TO_PYARROW_MAP) |
There was a problem hiding this comment.
This 1:1 mapping doesn't hurt anything but it's kind of redundant, maybe confusing
| if py_type is datetime: | ||
| return val.astimezone(timezone.utc) | ||
| elif py_type is date: | ||
| return val # date objects should be passed through as-is |
There was a problem hiding this comment.
I'm not sure this special case is needed for date? - the val would be returned as-is for date anyway if that 1:1 mapping wasn't in the conversion table. The datetime has that special case because it needs the timezone adjustment and I didn't have a generalized way to mutate the val on the way through
|
|
||
| # Check if there are any records to process | ||
| stream_size = ds.size(stream) | ||
| if stream_size == 0: |
There was a problem hiding this comment.
We should really add this to all of the destinations
| Boolean: pa.bool_(), | ||
| Date: pa.timestamp('us', tz='UTC'), | ||
| Date: pa.date32(), | ||
| Time: pa.timestamp('us', tz='UTC'), |
There was a problem hiding this comment.
I think Time needs to be mapped to pa.time64('us') here (and inverse in the PYARROW_TO_ALCHEMY table?
| from sqlalchemy.dialects.postgresql import UUID, JSONB, TIMESTAMP | ||
|
|
||
|
|
||
| class TestPostgresSource: |
There was a problem hiding this comment.
There are some good test ideas in here, but I'd probably want to refactor it at some point:
- I think a bunch of these test cases should be in the Stream tests
- the
SqliteCachecases should probably be in a class of their own, independent of any specific connector - there's some kind of duplication of the code/logic being tested vs. actually calling the real code
- some of the cases are Postgres specific (JSONB) but most could probably be generic SQLSource tests
- there's quite a bit of SQL destination testing that should be in a different class
- modules being imported in methods - sometimes valid use for that but I'm not sure it's needed here, and kind of obscures what's being tested looking at the filename and header
Moved
docker-compose.test-pg.ymlto it's owntest-envdirectoryAdded 3 tables that automatically get created for the source database in the test docker compose
Added a mock data generator to the test docker compose that writes 100 records to each of the 3 tables every 5 minutes
Fixed a variety of sql errors with reading/writing to postgres source/destinations in the canary
Fixed an issue with booleans in the sqllite cache
Fixed an issue with date column types not being compatible on the second write
Updated schema checking to allow for schemas to have different column order
Added a lot of unit tests for the sql connectors