Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-16965 Performance warnings from Cassandra workunit code #9542

Merged

Conversation

richardkchapman
Copy link
Member

No description provided.

Fix "Aggregation query used without partition key" warning when counting total
number of workunits. Execute counts independently and asynchronously on all
partitions instead.

Signed-off-by: Richard Chapman <rchapman@hpccsystems.com>
@hpcc-jirabot
Copy link

@richardkchapman
Copy link
Member Author

@ghalliday Please review

@@ -3670,39 +3712,31 @@ class CCasssandraWorkUnitFactory : public CWorkUnitFactory, implements ICassandr
unsigned validateRepository(bool fix)
{
unsigned errCount = 0;
// MORE - if the batch gets too big you may need to flush it occasionally
CassandraBatch batch(fix ? cass_batch_new(CASS_BATCH_TYPE_LOGGED) : NULL);
CIArrayOf<CassandraStatement> secondaryBatch;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: unused I think

@ghalliday
Copy link
Member

@richardkchapman all looks good to me. One comment about an unused variable.

Change partitioning so that all tables use same partitioning column as far as
possible. Updates to the parent and all children can thus be done efficiently
in a single unlogged batch.

Changes to secondary tables used for searching are moved into a separate
batch, implemented as independent async Cassandra calls for best performance.

Change validateRepository code to use much smaller and more appropriate
batches

Signed-off-by: Richard Chapman <rchapman@hpccsystems.com>
Deleting and then adding some exceptions within a single workunit commit would
end up losing the newly added ones - this is because of how Cassandra batches
work when rows are both deleted and updated within the same batch. I doubt
this would cause any issues in practice outside of test suite code.

Also, removing certain information from a workunit - specifically the file
associations but there may be other instances - would result in this
information not being properly committed. We need to bind NULL to any columns
that we want cleared in an update.

Signed-off-by: Richard Chapman <rchapman@hpccsystems.com>
@richardkchapman
Copy link
Member Author

@ghalliday I have removed the unused variable and repushed

@HPCCSmoketest
Copy link
Contributor

Automated Smoketest
Sha: dab7202
Build: success
ECL Watch: Rebuilding Site

errors warnings build time
0 65 93.334 seconds

Install hpccsystems-platform-community_6.3.0-trunk0.el7.x86_64.rpm
HPCC Start: OK

Unittest result:

total passed failed timeout
86 86 0 0

HPCC Stop: OK
HPCC Uninstall: OK

@ghalliday ghalliday merged commit a1de081 into hpcc-systems:candidate-6.4.0 Feb 1, 2017
@richardkchapman richardkchapman deleted the cassandra-warn4 branch February 10, 2017 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants