Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to do more precise ACL checks #15269

Merged
merged 1 commit into from
Oct 17, 2020

Conversation

prithvip
Copy link
Contributor

@prithvip prithvip commented Oct 5, 2020

Before this change, Presto would check for column access permission on
all columns referenced in any part of the query.
This behavior can sometimes be undesirable, for example, in this query:

SELECT name FROM (SELECT * FROM nation)

During execution of this query, access checks would be performed on all
columns in the table nation, even though only the column name would
actually be read during the execution of the query, and the other
columns in the table have no impact on the query results.

This change introduces a new sesion property,
check_access_control_on_utilized_columns_only, which, when
enabled, will only perform access control checks on columns that would
actually be required to produce the query output, ignoring columns that
are referenced in the query, but are not required to compute the query
results.

Test plan -

  1. Unit tests to cover a good range of query shapes.
  2. Test on production workload queries
== RELEASE NOTES ==

General Changes
* Add session property ``check_access_control_on_utilized_columns_only``, which, when enabled, only performs access control checks on columns that would actually be required to produce the query output, ignoring columns that are referenced in the query, but are not required to compute the query results.

Copy link
Contributor

@kaikalur kaikalur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First set of comments.

@kaikalur
Copy link
Contributor

kaikalur commented Oct 6, 2020

Will be good to have more expression coverage - look at the g4 file and add all non-standard places that can have expressions - patterns like lambdas, using, grouping sets, order by of aggs etc.

@kaikalur
Copy link
Contributor

kaikalur commented Oct 6, 2020

Also update the title to say make the ACL checks more precise to include any and all that impact the final output.

Copy link
Contributor

@rongrong rongrong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test failures seem to be related?

@prithvip prithvip force-pushed the fix-column-access branch 3 times, most recently from cbd8b0f to bbc4184 Compare October 10, 2020 04:02
@prithvip prithvip changed the title Add option to do ACL checks only on those columns that won't be pruned Add option to do more precise ACL checks Oct 10, 2020
@prithvip prithvip force-pushed the fix-column-access branch 2 times, most recently from f641d38 to b8e6165 Compare October 12, 2020 17:24
@prithvip prithvip force-pushed the fix-column-access branch 2 times, most recently from e0b6bf9 to a600db5 Compare October 12, 2020 22:48
@rongrong rongrong marked this pull request as ready for review October 13, 2020 19:05
@prithvip
Copy link
Contributor Author

Will be good to have more expression coverage - look at the g4 file and add all non-standard places that can have expressions - patterns like lambdas, using, grouping sets, order by of aggs etc.

Done, added a bunch of tests to cover expressions

Also update the title to say make the ACL checks more precise to include any and all that impact the final output.

Done

Copy link
Contributor

@kaikalur kaikalur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the extensive set of tests! A few more widely used cases missing that you should add those.

Copy link
Contributor

@kaikalur kaikalur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add window functions - both in aggs as well as partition by and order by without referring to them in the select list.

@prithvip
Copy link
Contributor Author

Also add window functions - both in aggs as well as partition by and order by without referring to them in the select list.

I have a testWindowFunction which covers column references in a PARTITION BY and ORDER BY clause in a window function

@kaikalur
Copy link
Contributor

Also add window functions - both in aggs as well as partition by and order by without referring to them in the select list.

Also add window functions - both in aggs as well as partition by and order by without referring to them in the select list.

I have a testWindowFunction which covers column references in a PARTITION BY and ORDER BY clause in a window function

+1

Copy link
Contributor

@kaikalur kaikalur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just add one more test for struct/row types - I know we do only top level structs but good to have coverage. Similarly that are potentially short circuited like:

IF(FALSE, x, y) -> ACL both x and y
WHERE x > 0 AND false AND y < 10 -> ACL both x,y

@prithvip
Copy link
Contributor Author

Just add one more test for struct/row types - I know we do only top level structs but good to have coverage. Similarly that are potentially short circuited like:

IF(FALSE, x, y) -> ACL both x and y
WHERE x > 0 AND false AND y < 10 -> ACL both x,y

I have test for ROW in testConstructors, and I added tests covering these short-circuit cases.

Before this change, Presto would check for column access permission on
all columns referenced in any part of the query.
This behavior can sometimes be undesirable, for example, in this query:

``SELECT name FROM (SELECT * FROM nation)``

During execution of this query, access checks would be performed on all
columns in the table nation, even though only the column ``name`` would
actually be read during the execution of the query, and the other
columns in the table have no impact on the query results.

This change introduces a new sesion property,
``check_access_control_on_utilized_columns_only``, which, when
enabled, will only perform access control checks on columns that would
actually be required to produce the query output, ignoring columns that
are referenced in the query, but are not required to compute the query
results.
@rongrong rongrong merged commit bef17e6 into prestodb:master Oct 17, 2020
@caithagoras caithagoras mentioned this pull request Oct 19, 2020
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants