Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add column name suggestions to presto validator #1330

Merged
merged 6 commits into from Sep 26, 2023

Conversation

kgopal492
Copy link
Contributor

Add column name suggestions to presto optimizing validator:

  1. determine if error message returned by presto explain validator is a column name error by searching the regex
  2. if so, get invalid column name, and use elasticsearch fuzzy search to search for the correct column name in all of the tables used in the query
  3. if only 1 result is returned by the elasticsearch query (meaning that there is one possible correct column name) - return that column name as a suggestion

class PrestoOptimizingValidator(BaseQueryValidator):
def languages(self):
return ["presto", "trino"]
class ColumnNameSuggester(BasePrestoSQLGlotDecorator):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess column name suggester is not specific to presto, other engines, liek sparksql can also benefit from it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored so that we can use it with other query engines


return validation_suggestions
def _get_column_name_from_position(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just get the column name from the error message "Column .* cannot be resolved"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use that regex

) -> List[QueryValidationResult]:
return self._get_explain_validator().validate(query, uid, engine_id)
def _search_columns_for_suggestion(self, columns: List[str], suggestion: str):
"""Return the case-sensitive column name by searching the table's columns for the suggestion text"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the highlighted column name always one of the columns? wondering if this function is necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The highlighted column is always lowercase, this makes sure we get the case-sensitive version of the column name

return PrestoExplainValidator("")

def _get_decorated_validator(self) -> BaseQueryValidator:
return UnionAllValidator(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the reason of changing from a list of validators to a chain of validators?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the decorator pattern so that we can add suggestions on top of the validation messages

Copy link
Collaborator

@jczhong84 jczhong84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for adding test cases!

@kgopal492 kgopal492 merged commit 5ffcb26 into pinterest:master Sep 26, 2023
3 checks passed
@kgopal492 kgopal492 deleted the query-metadata-opt branch September 26, 2023 19:41
aidenprice pushed a commit to arrowtail-precision/querybook that referenced this pull request Jan 3, 2024
* Add table & column name suggestions to presto validator
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants