Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-17967 Prevent XREF scanning duplicate nodes and misreporting #10407

Merged
merged 1 commit into from Sep 11, 2017

Conversation

Projects
None yet
5 participants
@jakesmith
Copy link
Member

jakesmith commented Sep 7, 2017

XREF was treating every entry in the cluster groups as a separate
node to scan. As a consequence it would scan the same node
directories N times, on clusters with slavesPerNode>1
As a side effect, the overscanning caused some reports to be
inaccurate, e.g. caused number of orphan file parts to be N times
larger than they actually were.

Signed-off-by: Jake Smith jake.smith@lexisnexisrisk.com

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Testing:

Tested by creating fake issues for XREF to find.
e.g. creating spurious orphan physical files

@hpcc-jirabot

This comment has been minimized.

Copy link

hpcc-jirabot commented Sep 7, 2017

Jake Smith
HPCC-17967 Prevent XREF scanning duplicate nodes and misreporting
XREF was treating every entry in the cluster groups as a separate
node to scan. As a consequence it would scan the same node
directories N times, on clusters with slavesPerNode>1
As a side effect, the overscanning caused some reports to be
inaccurate, e.g. caused number of orphan file parts to be N times
larger than they actually were.

Signed-off-by: Jake Smith <jake.smith@lexisnexisrisk.com>

@jakesmith jakesmith force-pushed the jakesmith:hpcc-17967 branch from b9d1d31 to 013a769 Sep 7, 2017

@jakesmith jakesmith changed the title HPCC-17967 Prevent XREF scanning duplicates nodes and misreporting HPCC-17967 Prevent XREF scanning duplicate nodes and misreporting Sep 7, 2017

@jakesmith

This comment has been minimized.

Copy link
Member Author

jakesmith commented Sep 7, 2017

@richardkchapman - this is targeting master, please move JIRA fix version to 7.0.0

@jakesmith

This comment has been minimized.

Copy link
Member Author

jakesmith commented Sep 7, 2017

@AttilaVamos - please review

@HPCCSmoketest

This comment has been minimized.

Copy link
Contributor

HPCCSmoketest commented Sep 7, 2017

Automated Smoketest:
Sha: 013a769
Build: success
Build: success
ECL Watch: Rebuilding Site

errors warnings build time
0 75 43.483 seconds

Install hpccsystems-platform-community_6.5.0-trunk0.el7.x86_64.rpm
HPCC Start: OK

Unit tests result:

Test total passed failed errors timeout
unittest 92 92 0 0 0
wutoolTest(Dali) 19 19 0 0 0
wutoolTest(Cassandra) 19 19 0 0 0

Regression test result:

phase total pass fail
setup (hthor) 11 11 0
setup (thor) 11 11 0
setup (roxie) 11 11 0
test (hthor) 744 744 0
test (thor) 639 639 0
test (roxie) 772 772 0

HPCC Stop: OK
HPCC Uninstall: OK

@AttilaVamos
Copy link
Contributor

AttilaVamos left a comment

It seems good but is there a way to test it? I mean can we add a test into Regression Suite to test this functionality?

@jakesmith

This comment has been minimized.

Copy link
Member Author

jakesmith commented Sep 8, 2017

It seems good but is there a way to test it? I mean can we add a test into Regression Suite to test this functionality?

It would be a lot of work to do so, could open a separate JIRA to work on a test suite for XREF, but it's would be far from trivial.

@AttilaVamos

This comment has been minimized.

Copy link
Contributor

AttilaVamos commented Sep 8, 2017

@jakesmith all right, your changes seem good.

@richardkchapman richardkchapman merged commit a165d1a into hpcc-systems:master Sep 11, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.