Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-18305 Avoid follow on crash after splitter prepare exception #10441

Merged
merged 1 commit into from Sep 27, 2017

Conversation

@jakesmith
Copy link
Member

jakesmith commented Sep 21, 2017

Signed-off-by: Jake Smith jake.smith@lexisnexisrisk.com

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Testing:

The original report, was a query failing because physical parts were missing.
The error was logged, but the splitter crashed due to this issue.
Reproduced the crash by deleting the physical parts and running their example archive.
Verified fix and correct error reported.

@hpcc-jirabot

This comment has been minimized.

Copy link

hpcc-jirabot commented Sep 21, 2017

@jakesmith jakesmith force-pushed the jakesmith:hpcc-18305 branch from 0ef24a4 to fca9ce4 Sep 21, 2017
@jakesmith jakesmith changed the title HPCC-18305 Avoid follow crash after splitter prepare exception HPCC-18305 Avoid follow on crash after splitter prepare exception Sep 21, 2017
@jakesmith

This comment has been minimized.

Copy link
Member Author

jakesmith commented Sep 21, 2017

@ghalliday - please review

}
if (writeAheadException)

This comment has been minimized.

Copy link
@ghalliday

ghalliday Sep 22, 2017

Member

What happens if writeAheadException is set by reading rows ahead - isn't this failing too early? It would be a problem if the exception was being caught on one branch.

Similarly I'm not sure the nextrow logic is correct - it may throw the exception too early. E.g. with code something like:

ds := LIMIT(something, 2);
output(choose(ds,1));
output(catch(ds,...));

This comment has been minimized.

Copy link
@jakesmith

jakesmith Sep 22, 2017

Author Member

What happens if writeAheadException is set by reading rows ahead

well, yes I think there's a problem if the async thread (CWriter) hits an exception whilst reading rows (or preparing input)
Not a new problem though..
It should probably tuck the exception behind it's ear and only throw it when the smart look ahead buffer is exhausted by a reader.

Similarly I'm not sure the nextrow logic is correct - it may throw the exception too early.

So this change would cause the exception to be return on both branches of the splitter, even if e.g. the choosen(ds,1) wasn't the cause..
I guess similarly, it should only throw it on a branch if it 'reaches' the exception previous caught/tucked away.

@ghalliday

This comment has been minimized.

Copy link
Member

ghalliday commented Sep 22, 2017

@jakesmith see comment.

@ghalliday

This comment has been minimized.

Copy link
Member

ghalliday commented Sep 22, 2017

Also is this the correct target, or should it be 6.4.4?

@jakesmith

This comment has been minimized.

Copy link
Member Author

jakesmith commented Sep 22, 2017

Also is this the correct target, or should it be 6.4.4?

Wasn't sure there was going to be one, but yes probably.

Signed-off-by: Jake Smith <jake.smith@lexisnexisrisk.com>
@jakesmith jakesmith force-pushed the jakesmith:hpcc-18305 branch from fca9ce4 to f7e9fdd Sep 25, 2017
@jakesmith

This comment has been minimized.

Copy link
Member Author

jakesmith commented Sep 25, 2017

@ghalliday - please re-review

@jakesmith jakesmith closed this Sep 25, 2017
@jakesmith jakesmith reopened this Sep 25, 2017
@hpcc-jirabot

This comment has been minimized.

Copy link

hpcc-jirabot commented Sep 25, 2017

https://track.hpccsystems.com/browse/HPCC-18305
Jira not updated (pull request already registered)

@jakesmith

This comment has been minimized.

Copy link
Member Author

jakesmith commented Sep 25, 2017

accidentally closed.
@ghalliday - please re-review

Left at targeting master for now. It could be retargeted at 6.4.4, but it's not a regression.

@HPCCSmoketest

This comment has been minimized.

Copy link
Contributor

HPCCSmoketest commented Sep 25, 2017

Automated Smoketest:
Sha: f7e9fdd
Build: success
Build: success
ECL Watch: Rebuilding Site

errors warnings build time
0 75 43.333 seconds

Install hpccsystems-platform-community_6.5.0-trunk0.el7.x86_64.rpm
HPCC Start: OK

Unit tests result:

Test total passed failed errors timeout
unittest 92 92 0 0 0
wutoolTest(Dali) 19 19 0 0 0
wutoolTest(Cassandra) 19 19 0 0 0

Regression test result:

phase total pass fail
setup (hthor) 11 11 0
setup (thor) 11 11 0
setup (roxie) 11 11 0
test (hthor) 744 744 0
test (thor) 639 639 0
test (roxie) 772 772 0

HPCC Stop: OK
HPCC Uninstall: OK

}
if (!spill)
writer.start(); // writer keeps writing ahead as much as possible, the readahead impl. will block when has too much
}
return true;

This comment has been minimized.

Copy link
@ghalliday

ghalliday Sep 25, 2017

Member

@jakesmith Should this be returning false if the first instance failed to start? Should it be using a different exception member to distinguish between a problem starting and a problem when reading rows.

This comment has been minimized.

Copy link
@jakesmith

jakesmith Sep 26, 2017

Author Member

Should this be returning false if the first instance failed to start?

The failure in parepareInput triggers eofHit and any subsequent caller to writeahead will exit at start.

Should it be using a different exception member to distinguish between a problem starting and a problem when reading rows.

I considered it, but a failure on start is a failure at 0 == recsReady, i.e. the condition met by the common test in NSplitterSlaveActivity::nextRow, so think it's cleaner like it is.

@jakesmith

This comment has been minimized.

Copy link
Member Author

jakesmith commented Sep 26, 2017

@ghalliday - see my replies.

@ghalliday

This comment has been minimized.

Copy link
Member

ghalliday commented Sep 27, 2017

@richardkchapman ready to merge - but should it target 6.4.4?

@richardkchapman richardkchapman merged commit 85c5e2a into hpcc-systems:master Sep 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.