implement Trino source #585

domnikl · 2024-03-15T12:51:02Z

I'd really like to see Trino being implemented as a source in ConnectorX and thus have begun working on it. It uses prusto as a client and currently supports all basic types. Support for tuple, array, row and uuid isn't implemented yet and there's a bug with time columns not being mapped correctly. Also I want to look into fetching results more efficiently and not everything at once.

What do you think, is it worth continuing work on it? Is there anything else I need to consider eg. regarding mapping the results?

wangxiaoying · 2024-03-18T23:50:21Z

Thanks @domnikl for the PR. I think it in general looks great and complete!

Also I want to look into fetching results more efficiently and not everything at once.

Yes, I think this might be an issue in terms of performance. I don't have experience of using prusto before, but it seems like the get and get_next APIs are working for gradually fetching the results. Not sure whether it is easy to switch from get_all to these two.

time columns not being mapped correctly

I'm not sure what exactly the bug is. Currently we are converting the NaiveTime type in rust to String type in pandas in general (and also I see it in your pr), it is done by the function here. If you want to customize the conversion or mapping it to another type in pandas, it can be done by modifying the mapping in the macro as well as this function.

It would be great to also submit the corresponding seed database script here so others can run the python test_trino locally.

wangxiaoying · 2024-03-18T23:50:56Z

connectorx-python/connectorx/tests/test_trino.py

+
+
+@pytest.fixture(scope="module")  # type: ignore
+def mysql_url() -> str:


Are these functions should be rename to test_trino_xxx from test_mysql_xxx?

Copy & paste error, I'll rename it 👍🏻

domnikl · 2024-03-22T13:36:53Z

@wangxiaoying thanks for looking into it!

I'm not sure what exactly the bug is

The problem is with reading timestamps with time zone types, prusto does not support it (yet) by the looks of it. Instead it returns EmptyData and it appears the result is just empty. But its not that important anyway I think to get starting. I'll create an issue in their project.

It would be great to also submit the corresponding seed database script

Will do in the next days and also rework fetching from the database with the get/get_next calls if possible.

domnikl · 2024-04-19T14:30:50Z

@wangxiaoying I implemented the missing partitioning as well as switched to get/get_next for prusto and fixed the tests and added test data and squashed some bugs along the way. Could you have a look at it again please?

wangxiaoying · 2024-04-19T23:40:45Z

connectorx-python/connectorx/tests/test_trino.py

Since we probably are not going to make trino in our CI on github workflow. Can you make all these tests skipped if TRINO_URL is not set? Something similar to oracle here.

wangxiaoying · 2024-04-19T23:55:58Z

connectorx/src/sources/trino/mod.rs

+
+#[throws(TrinoSourceError)]
+fn get_total_rows(rt: Arc<Runtime>, client: Arc<Client>, query: &CXQuery<String>) -> usize {
+    rt.block_on(client.get_all::<Row>(query.to_string()))


It seems like this will execute the entire query and get all the results, which may be a bottleneck in terms of performance (since the query will be ran twice). Can we make it to using the SELECT COUNT(*) query to get the total rows? We have a util function named count_query. An example can be found here.

wangxiaoying · 2024-04-20T00:02:46Z

Hi @domnikl , thanks so much for completing the PR!

I've tested the test cases locally and it seems works well. I have two minor comments above but in general I think the code looks good and complete. We probably also need to add a documentation page for trino here.

I think after fixing the above minor issues, we can merge it and make it into our next release!

domnikl · 2024-04-22T14:03:55Z

@wangxiaoying thanks for having a look! Added the missing pieces and a bit of documentation for it.

wangxiaoying

Thanks @domnikl , the update looks good to me! We can merge this PR after the CI is passed : )

domnikl added 5 commits January 19, 2024 16:06

implemented type system for Trino

925423a

hard-coded schema

95630aa

WIP Trino in Python

b338022

Merge remote-tracking branch 'upstream/main' into trino-source

c81077d

update deps for Trino source

5fca5ca

wangxiaoying reviewed Mar 18, 2024

View reviewed changes

domnikl added 5 commits April 12, 2024 14:23

added tests for Trino and connecting without auth

d374400

fixed copy/paste for trino tests

3099395

Merge remote-tracking branch 'upstream/main' into trino-source

aaab7de

implemented partitioning for Trino

6979c65

fetch results for Trino more efficiently

52da49a

domnikl force-pushed the trino-source branch from 599e317 to 52da49a Compare April 19, 2024 14:27

domnikl changed the title ~~Draft: implement Trino source~~ implement Trino source Apr 19, 2024

wangxiaoying reviewed Apr 19, 2024

View reviewed changes

domnikl added 2 commits April 22, 2024 15:42

Trino use count_query for get_total_rows

b279cdb

added Trino documentation

cbe16b5

wangxiaoying approved these changes Apr 22, 2024

View reviewed changes

wangxiaoying merged commit d20ddc5 into sfu-db:main Apr 22, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement Trino source #585

implement Trino source #585

domnikl commented Mar 15, 2024

wangxiaoying commented Mar 18, 2024

wangxiaoying Mar 18, 2024 •

edited

Loading

domnikl Mar 22, 2024

domnikl commented Mar 22, 2024

domnikl commented Apr 19, 2024

wangxiaoying Apr 19, 2024

wangxiaoying Apr 19, 2024

wangxiaoying commented Apr 20, 2024

domnikl commented Apr 22, 2024

wangxiaoying left a comment



		@pytest.fixture(scope="module") # type: ignore
		def mysql_url() -> str:

implement Trino source #585

implement Trino source #585

Conversation

domnikl commented Mar 15, 2024

wangxiaoying commented Mar 18, 2024

wangxiaoying Mar 18, 2024 • edited Loading

Choose a reason for hiding this comment

domnikl Mar 22, 2024

Choose a reason for hiding this comment

domnikl commented Mar 22, 2024

domnikl commented Apr 19, 2024

wangxiaoying Apr 19, 2024

Choose a reason for hiding this comment

wangxiaoying Apr 19, 2024

Choose a reason for hiding this comment

wangxiaoying commented Apr 20, 2024

domnikl commented Apr 22, 2024

wangxiaoying left a comment

Choose a reason for hiding this comment

wangxiaoying Mar 18, 2024 •

edited

Loading