Skip to content

Conversation

@rmoneys
Copy link
Contributor

@rmoneys rmoneys commented Aug 4, 2022

https://synccomputing.atlassian.net/browse/PROD-399

Handles edge case in which a single rollover log file is provide and it has a non-zero index

@sonarqubecloud
Copy link

sonarqubecloud bot commented Aug 4, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 2 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@rmoneys rmoneys requested review from NKSync and gorskysd August 4, 2022 03:19
if not os.path.isdir(args.result_dir):
if not args.result_dir.is_dir():
logger.error("%s is not a directory", args.result_dir)
sys.exit(1)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should refrain from using sys.exit() here. In the case the spark_log_parser is used anywhere in the backend or celery task, it would terminate the process, and likely cause unintended side effects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're ok here. This code is only executed by users on the command line, e.g.

python -m spark_log_parser

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, makes sense. It caught my eye because some areas we raise a ValueError but here we exit the process.

@NKSync
Copy link

NKSync commented Aug 4, 2022

I assume these scenarios encompass the different possibilities we can see. Is it still the case where we're unable to detect if the last rollover file is missing?

a.) The first rollover file is missing
b.) a rollover file in-between is missing
c.) the last rollover file is missing

@rmoneys
Copy link
Contributor Author

rmoneys commented Aug 4, 2022

Yeah, we can't tell if we're missing the last rollover log files

@NKSync NKSync self-requested a review August 4, 2022 18:50
Copy link

@NKSync NKSync left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my initial glance, the PR looks good to me. Nice work on handling the different cases!


diffs = df.rollover_index.diff()[1:]

if any(diffs > 1) or df.rollover_index[0] > 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember seeing this in the last PR to check for the case this PR is addressing. Did this logic just not get run if there was only one log?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. With only 1 log file rollover validation was skipped.

Copy link
Contributor

@gorskysd gorskysd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

I can confirm that we can't outright detect if a log is missing at the end. However, if that is the case another error will be thrown downstream during the parsing which indicates missing rollover as a possibility.

@rmoneys rmoneys merged commit d4525b9 into main Aug 4, 2022
@rmoneys rmoneys deleted the PROD-399-validation branch August 4, 2022 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants