Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix SQLQueryUtils to extract multiple tables #2784

Merged
merged 2 commits into from
Jun 28, 2024

Conversation

ykmr1224
Copy link
Collaborator

@ykmr1224 ykmr1224 commented Jun 27, 2024

Description

  • Fix SQLQueryUtils to extract multiple tables
    • Originally had some issue where alias was also extracted as part of table name, and this change fixes it.
    • This method will be used to identify datasources from the query.

Issues Resolved

n/a

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • [-] New functionality has been documented.
    • [-] New functionality has javadoc added
    • [-] New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@ykmr1224 ykmr1224 added backport 2.x maintenance Improves code quality, but not the product labels Jun 27, 2024
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Copy link

codecov bot commented Jun 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.72%. Comparing base (8eae36f) to head (6cba94b).
Report is 1 commits behind head on main.

Current head 6cba94b differs from pull request most recent head f240702

Please upload reports for the commit f240702 to get more accurate results.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2784      +/-   ##
============================================
- Coverage     92.72%   92.72%   -0.01%     
  Complexity     4987     4987              
============================================
  Files           498      498              
  Lines         14350    14349       -1     
  Branches        942      942              
============================================
- Hits          13306    13305       -1     
  Misses         1010     1010              
  Partials         34       34              
Flag Coverage Δ
sql-engine 92.72% <100.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
@@ -32,16 +35,15 @@
@UtilityClass
public class SQLQueryUtils {

// TODO Handle cases where the query has multiple table Names.
public static FullyQualifiedTableName extractFullyQualifiedTableName(String sqlQuery) {
public static List<FullyQualifiedTableName> extractFullyQualifiedTableNames(String sqlQuery) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we show an example SQL query where there is a bug? What is the usecase for multiple table names.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bug was in case the identifier has some more elements such as alias, version, etc.
I added test cases for that (ref).

Simplest example is:

SELECT * FROM my_glue.default.http_logs alias;

Where table name was extracted as "http_logsalias".

DQS wants to extract datasource names from the input query and verify them, and that would require all the table names to be extracted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we solving the use case with multiple table names?

Copy link
Collaborator Author

@ykmr1224 ykmr1224 Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, we collect table names to a list in the visitor and return all.
I've added test cases for multi table queries.

@ykmr1224 ykmr1224 merged commit 883cc7e into opensearch-project:main Jun 28, 2024
13 of 20 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jun 28, 2024
* Fix SQLQueryUtils to extract multiple tables

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Improve test coverage

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

---------

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
(cherry picked from commit 883cc7e)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@ykmr1224 ykmr1224 deleted the dqs/parse-multiple-table branch June 28, 2024 23:10
ykmr1224 pushed a commit that referenced this pull request Jul 11, 2024
* Fix SQLQueryUtils to extract multiple tables



* Improve test coverage



---------


(cherry picked from commit 883cc7e)

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x maintenance Improves code quality, but not the product
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants