-
Notifications
You must be signed in to change notification settings - Fork 3
[PROD-399] Handle edge case of single rollover file with index > 0 #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Kudos, SonarCloud Quality Gate passed!
|
| if not os.path.isdir(args.result_dir): | ||
| if not args.result_dir.is_dir(): | ||
| logger.error("%s is not a directory", args.result_dir) | ||
| sys.exit(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should refrain from using sys.exit() here. In the case the spark_log_parser is used anywhere in the backend or celery task, it would terminate the process, and likely cause unintended side effects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're ok here. This code is only executed by users on the command line, e.g.
python -m spark_log_parser
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, makes sense. It caught my eye because some areas we raise a ValueError but here we exit the process.
|
I assume these scenarios encompass the different possibilities we can see. Is it still the case where we're unable to detect if the last rollover file is missing? a.) The first rollover file is missing |
|
Yeah, we can't tell if we're missing the last rollover log files |
NKSync
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on my initial glance, the PR looks good to me. Nice work on handling the different cases!
|
|
||
| diffs = df.rollover_index.diff()[1:] | ||
|
|
||
| if any(diffs > 1) or df.rollover_index[0] > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember seeing this in the last PR to check for the case this PR is addressing. Did this logic just not get run if there was only one log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. With only 1 log file rollover validation was skipped.
gorskysd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
I can confirm that we can't outright detect if a log is missing at the end. However, if that is the case another error will be thrown downstream during the parsing which indicates missing rollover as a possibility.








https://synccomputing.atlassian.net/browse/PROD-399
Handles edge case in which a single rollover log file is provide and it has a non-zero index