New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect URL and Email in unspecified strings #4503
Conversation
@@ -109,7 +109,7 @@ export function formatEmail(value, { jsx } = {}) { | |||
if (jsx && EMAIL_WHITELIST_REGEX.test(value)) { | |||
return <ExternalLink href={"mailto:" + value}>{value}</ExternalLink>; | |||
} else { | |||
value; | |||
return value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this fixes an existing bug with email addresses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Thanks for doing this. I think the URL and email regexes we use when we know the column should be a URL or email are way too lax for when we don't know the type. e.x. the email regex will match any string with a If we were to do this I think we'd want much more strict regexes for the fallback version. Also, in the not so distant future you'll also be able to mark columns in SQL query with the Email/URL (and other) types. @salsakran What are your thoughts on whether we should do this? |
Also, I'm wondering if we should just run the same or similar metadata detection on native/SQL query results that we run on the db schema sync? |
long term we'll be allowing the use of a query to be used as the source table for another query (#1822), and we should try to guess subsequent type info if we can. In the interim, I'm not opposed to treating result urls as urls in situations where we can guess them. Are there any security issues opened up by this? The only thing I can think of is someone crafting a weird sql generated url that someone elses clicks. This would be a GET, and to the best of my recollection we don't have side effecting GET api endpoints do we? @tlrobinson @camsaul ? |
I can't think of any additional security implications. We're already URL-ifying user data when the column is marked as a URL or email, and if Metabase is vulnerable to CSRF that should be fixed regardless. Our If anyone can think of any other issues let me know. |
Thanks for the feedback! Good point regarding the lax regexes. I'd be happy to dig up some stricter (and hopefully RFC-compliant) regexes for URLs and email addresses. |
@salsakran Our GET endpoints don't have any side effects besides adding records to the ViewLog and QueryExecution tables |
cool, like I said, I'm 👍 on the approach (for now, until we figure out a table analysis story for #1822), haven't inspected the code yet. |
To check performance impact of stricter regexes, I ran a simple benchmark (using
I think this is acceptable. |
@wvengen Thanks for finding some better regexes. I think we should keep the same limited set of protocols for the URL regex, though. Just replace |
@tlrobinson thanks for your reply. I'm still for the more extended protocol list, but will put that in a different PR. |
@wvengen I'm open to it, we just wanted to be start out pretty conservative. @metabase/core-developers What are your thoughts on the expanded list of protocols? |
@tlrobinson I'm with you, I don't think a lot of people are using |
Since I've amended the commit: this protocol list comes from MediaWiki. |
Can we merge the current PR, without the longer list of protocols? It looks like we agree on that. |
FWIW I think the regexes, e.g. const URL_WHITELIST_REGEX = /^(https?|mailto):\/*(?:[^:@]+(?::[^@]+)?@)?(?:[^\s:/?#]+|\[[a-f\d:]+])(?::\d+)?(?:\/[^?#]*)?(?:\?[^#]*)?(?:#.*)?$/i; would be a good thing to have some tests around 😺 It's not really feasible to verify that it's correct just by looking at it |
Sounds good. There were some tests already, I've expanded them a bit more. |
implements #4493
(if you like this feature, please add a thumbs-up reaction here and in the issue)