Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a data source for Trino/Presto #1201

Merged
merged 9 commits into from Sep 25, 2023

Conversation

maxstreese
Copy link
Contributor

@maxstreese maxstreese commented Sep 21, 2023

Description

This PR adds a data source for Trino/Presto.

Resolves #1095.

Adding Trino/Presto significantly increases the number of data sources supported by Evidence, as they function as data source aggregators.

Checklist

  • For UI or styling changes, I have added a screenshot or gif showing before & after
  • I have added a changeset

Screenshots

The new connection form with a valid connection

image

The new connection form with an invalid connection

image

A query running against the new data source

image

@changeset-bot
Copy link

changeset-bot bot commented Sep 21, 2023

🦋 Changeset detected

Latest commit: c931cdc

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 9 packages
Name Type
@evidence-dev/core-components Patch
@evidence-dev/db-orchestrator Patch
@evidence-dev/evidence Patch
@evidence-dev/postgres Patch
@evidence-dev/trino Patch
evidence-docs Patch
@evidence-dev/components Patch
evidence-test-environment Patch
@evidence-dev/redshift Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link

vercel bot commented Sep 21, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
evidence-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Sep 25, 2023 0:25am

@netlify
Copy link

netlify bot commented Sep 21, 2023

Deploy Preview for evidence-development-workspace ready!

Name Link
🔨 Latest commit c931cdc
🔍 Latest deploy log https://app.netlify.com/sites/evidence-development-workspace/deploys/65117bdfc4029400080fb860
😎 Deploy Preview https://deploy-preview-1201--evidence-development-workspace.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@maxstreese maxstreese changed the title WIP: Add a datasource for Trino (formerly PrestoDB) Add a datasource for Trino (formerly PrestoDB) Sep 21, 2023
@maxstreese
Copy link
Contributor Author

Hi all!

Let me know if there is anything else I can do here at this point. For now I would consider this as final from my side

Best,
Max

@maxstreese
Copy link
Contributor Author

Just thinking out loud, it may also be and idea to ask the Trino folks to be included in their list of supported clients once the datasource is supported by Evidence. Might attract some attention.

@archiewood
Copy link
Member

archiewood commented Sep 22, 2023

Thanks for the contribution! 🙏🏼

I don't have any experience with presto trino, so I set up a free account on Starburst to give this a test with.

I ran into issues connecting:

CleanShot 2023-09-21 at 22 49 42@2x
  1. The error message when connecting does not seem to get unpacked. Do we need to unpack this somehow?
  2. Checking in the console, the error I'm specifically getting is around sending an HTTP request to an HTTPS port:
Object { message: "execution error:<html>\r\n<head><title>400 The plain HTTP request was sent to HTTPS port</title></head>\r\n<body>\r\n<center><h1>400 Bad Request</h1></center>\r\n<center>The plain HTTP request was sent to HTTPS port</center>\r\n<hr><center>cloudflare</center>\r\n</body>\r\n</html>\r\n", code: 400 }

Are there other config options I should be passing in here? I feel like starburst should be requiring some kind of password, but adding my starburst login pw didn't change the above behaviour.

Apologies if I'm being slow here!

@maxstreese
Copy link
Contributor Author

maxstreese commented Sep 22, 2023

Hey, awesome! You need to set SSL to "true" in this case. Then things should work (you also need to provide user and password). I tried this against my company's Trino installation and that worked.

My understanding is that HTTPS uses SSL/TLS so therefore requiring the user to set the SSL option for an HTTPS connection seems reasonable. It's also in line with how the client library underneath works. Additionally this allows for the freedom to run Trino in HTTPS mode but on a different port than 443. That being said, if you'd like me to silently set the SSL option for the user in case the port is 443, I could also do that.

Let me know 😀

@maxstreese maxstreese changed the title Add a datasource for Trino (formerly PrestoDB) Add a datasource for Trino/PrestoDB Sep 22, 2023
@maxstreese maxstreese changed the title Add a datasource for Trino/PrestoDB Add a datasource for Trino/Presto Sep 22, 2023
@maxstreese
Copy link
Contributor Author

maxstreese commented Sep 22, 2023

Hi @archiewood,

I have made three changes now:

  1. Unpack the error message as you suggested
  2. Provide user instructions in the form for the SSL parameter (stating that it must be set for HTTPS connections)
  3. Include an engine option which should allow this data source to be used for both Trino and Presto. The underlying client library allows for this. I have not tested against a Presto installation, but only against both a local (HTTP) as well as a remote (HTTPS) Trino installation

I have updated the PR title and description accordingly.

@maxstreese maxstreese changed the title Add a datasource for Trino/Presto Add a data source for Trino/Presto Sep 22, 2023
@archiewood
Copy link
Member

archiewood commented Sep 22, 2023

Include an engine option which should allow this data source to be used for both Trino and Presto.

Sorry I misspoke in my above comment. Starburst is cloud hosted trino. I wasn't aware they were different aside from the naming tbh

So I've made progress, I now have a new error message after setting ssl=true

CleanShot 2023-09-22 at 10 15 21@2x

Again, this feels like a fair point, there seems to be no password or auth. So I threw in my starburst password. Still no dice:

CleanShot 2023-09-22 at 10 20 06@2x

I'm not really sure. Maybe I should be using a service account for this?
EDIT: No luck with a service account either

@maxstreese
Copy link
Contributor Author

Sorry I misspoke in my above comment. Starburst is cloud hosted trino. I wasn't aware they were different aside from the naming tbh

No problem and you didn't. I was aware that Starburst is Trino. But I realized that it was silly to exclude Presto. So I added the engine option. This was not related to your comment at all.

So I threw in my starburst password. Still no dice

Ah my bad, that's embarrassing. I had a bug in the auth function. Unfortunately I do not have a local test setup for this but I hope with the changes I just pushed things should work. Please let me know!

@archiewood
Copy link
Member

Hmm, still not getting there my side.
Now I'm getting a rather unhelpful, unspecified "execution error"

Unfortunately I do not have a local test setup for this but I hope with the changes I just pushed things should work. Please let me know!

FWIW you can set up a free starburst account in about 10 mins, which might help with testing

@archiewood
Copy link
Member

archiewood commented Sep 22, 2023

my main sticking point is that I feel like trino should be authing me somehow. I can see if you have a local setup then maybe that's not an issue, but presumably any prod connection will need to be authed?

So why does starburst not give me any auth params - all these things are very guessable:
CleanShot 2023-09-22 at 12 08 21@2x

I'm guessing something to do with HTTPS/SSL/TLS solves this?

I feel like I'm missing something?

Maybe some clues

CleanShot 2023-09-22 at 12 14 18@2x

@maxstreese
Copy link
Contributor Author

Let me check later this weekend. Right now when user and password are both provided, I set the basic_auth property of the client library I use (presto-client-node). I believe this should be supported by Starburst somehow. I'll sign myself up for a free account as you suggested.

@maxstreese
Copy link
Contributor Author

Hi @archiewood !

I actually think the implementation is working as is right now. I signed up with Starburst and this is what I got:

image

The password I provided is the regular password of my account.

If I then go ahead and create a sample site in the example project I get:

image

Finally, if I check the query history on Starburst I can see both, the query that Evidence uses to test the connection initially, as well as my sample query:

image

@archiewood
Copy link
Member

archiewood commented Sep 22, 2023

Nice! It's also working for me!

I hadn't realised but the free trino clusters auto suspend after 15 mins. Weird that the error didn't tell me this. Feels like it should.

CleanShot 2023-09-22 at 15 45 38@2x

@archiewood
Copy link
Member

Okay so having been through this process, I think it would be great to add a bit of docs help / explanation to go alongside this.

  1. IIUC your connector only supports password authentication? (which is fine if so) - we should say this in the docs section for trino, as Trino has many auth methods: https://trino.io/docs/current/security/authentication-types.html
  2. The other gotchas I had were using starburst specifically. I have no idea how popular starburst is vs other ways of running trino, but if its a meaningful percentage, we should maybe include a note on this too:
  • knowing that the password was required AND that the password was the one i use to login to the UI client, rather than some other key
  • knowing that I should set SSL=true
  1. more generally - should the password be required? Or are there circumstances where it is not (eg local dev)

@maxstreese
Copy link
Contributor Author

maxstreese commented Sep 25, 2023

1. RE: Authentication Methods

Correct, my implementation only supports password (i.e. basic) authentication. Two comments about this below.

The limitation here is in the available Trino client library for Node/JS, namely presto-client-node. As this does currently only support password authentication, so does my connector. One could theoretically either implement a fully custom Trino client for Evidence, or extend presto-client-node to support more authentication methods. I am currently not willing/able to do this.

Looking at the data consumer list provided by Starburst, you will find that quite a few of the available Trino clients actually only support the password based authentication types. So I think that Evidence would be in good company here.

2. RE: Should The Password Be Required?

No. There are setups besides local development where you do not need a password. You can for instance self host Trino and give access via VPN.

3. RE: Starburst Specific Documentation

I have no clue how many companies use Starburst vs how many self host but it certainly can't hurt to include information about Starburst in the docs.

The password is not necessarily the one you use to log in to the Starburst UI by the way. You could also create a dedicated service account and use that one for connecting:

image

4. Providing Documentation

Let me try to come up with something

@maxstreese
Copy link
Contributor Author

Done. Let me know if I should add/change something.

@archiewood
Copy link
Member

Looks great - thanks for the contribution!

@archiewood archiewood merged commit 84cf521 into evidence-dev:main Sep 25, 2023
19 checks passed
@mcrascal
Copy link
Member

Amazing! Thanks @maxstreese @archiewood 🎉

@maxstreese
Copy link
Contributor Author

Awesome, thanks! In case anything turns out to be amiss just let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Trino/Presto
3 participants