Skip to content

Conversation

@gorskysd
Copy link
Contributor

Sometimes SQL Submission or Completion events are missing which throws an error. However, these events are not critical for the predictor run. This solution bypasses the problem by neglecting the SQL data for any queries for which either the Submission or Completion events is missing.

Also added a test for this case which indicates the effect on the sqlData dataframe.

Also modified version.

@gorskysd gorskysd requested review from NKSync, malino and rmoneys August 19, 2022 01:13
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

# Sometimes an SQL event will be missing. To be informative, both
# events must be present. But this information is not critical, so
# if either event is missing then simply reject the SQL data
if "start_time" not in sql.keys() or "end_time" not in sql.keys():
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not critical, but just to understand:

Could you do just if "end_time" not in sql.keys(). This is on the assumption that if an end_time exists, then a start_time should as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work because "start_time" and "end_time" come from separate events in the Spark eventlog. So it's possible for one, or other other, or both to be missing.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see.

Copy link

@NKSync NKSync left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Nice 👍

Copy link
Contributor

@rmoneys rmoneys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@gorskysd gorskysd merged commit 2c25284 into main Aug 19, 2022
@gorskysd gorskysd deleted the PROD-411-hui-adobe-parser-error-databricks-on-azure branch August 19, 2022 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants