Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented logic to prevent tasks from having more than specified number of partitions #837

Merged
merged 9 commits into from Jun 22, 2021

Conversation

jzakaryan
Copy link
Collaborator

Important: DO NOT REPORT SECURITY ISSUES DIRECTLY ON GITHUB.
For reporting security issues and contributing security fixes,
please, email security@linkedin.com instead, as described in
the contribution guidelines.

Please, take a minute to review the contribution guidelines at:
https://github.com/linkedin/Brooklin/blob/master/CONTRIBUTING.md

Copy link
Collaborator

@vmaheshw vmaheshw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the review for the code modification, pointed one TODO item and one change in already committed code.

int findTaskWithRoomForAPartition(List<String> tasks, Map<String, Set<String>> partitionMap, int startIndex,
int maxPartitionsPerTask) {
for (int i = 0; i < tasks.size(); i++) {
int currentIndex = (startIndex + i) % tasks.size();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a scope of improvement here, especially if the task count is large and most of them are full. We can improve this piece in future if needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to suggestions on how to improve it.

@@ -117,7 +117,7 @@ public LoadBasedPartitionAssignmentStrategy(PartitionThroughputProvider throughp
// Doing assignment
LoadBasedPartitionAssigner partitionAssigner = new LoadBasedPartitionAssigner();
return partitionAssigner.assignPartitions(clusterThroughputInfo, currentAssignment,
unassignedPartitions, datastreamPartitions);
unassignedPartitions, datastreamPartitions, _partitionsPerTask);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_partitionsPerTask represents the ideal number of partition distribution per task which is used along with _partitionFullnessFactorPct which represent how much percent to fill. This count is not a hard check, but a desired count. _maxPartitionsPerTask is the ultimate count is enforced. If you enforce based on _partitionsPerTask and numTasks are getting limited by smaller maxTask configuration, the entire pipeline will be stuck till the maxTasks configuration is not corrected. One of the most important purpose of enforcing _maxPartitionsPerTask is to avoid zk node size going > 1MB.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add to above point, If _enablePartitionNumBasedTaskCountEstimation is enabled, then we can consider honoring _partitionsPerTask and _partitionFullnessFactorPct along with _maxPartitionsPerTask, which means we will first try to fill in upto _partitionsPerTask if we can, and if we cannot then will stretch upto _maxPartitionsPerTask.

There is a scope of reusing the alert here PARTITIONS_PER_TASK_NEEDS_ADJUSTMENT

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, I'm simply using _maxPartitionsPerTask. We can discuss usage of partitionsPerTask and partitionFullnessFactorPct offline.

@jzakaryan jzakaryan requested a review from vmaheshw June 18, 2021 22:15
Copy link
Collaborator

@vmaheshw vmaheshw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One javadoc nit and keeping maxPartitionsPerTask and partitionsPerTask discussion open. But, this PR does not have to be blocked on that and can be incremental fix.

@@ -40,9 +44,10 @@
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new param missing in the javadoc

@jzakaryan jzakaryan merged commit 3b16a2f into linkedin:master Jun 22, 2021
vmaheshw pushed a commit to vmaheshw/brooklin that referenced this pull request Mar 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants