Fix scheduling for splits with locality requirement in Tardigrade #11581

arhimondr · 2022-03-19T19:41:11Z

Description

Fixes several problems related to scheduling of non remotely accessible splits

Is this change a fix, improvement, new feature, refactoring, or other?

Fix

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Core engine (Tardigrade)

How would you describe this change to a non-technical end user or system administrator?

Fixes scheduling for non remotely accessible splits in certain corner cases. Prior this fix in some queries scanning over non remotely accessible splits might've been failing.

Related issues, pull requests, and links

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

() No release notes entries required.
(x) Release notes entries required with the following suggested text:

# Tardigrade
* Fix scheduling for non remotely accessible splits

...ino-main/src/main/java/io/trino/execution/scheduler/FullNodeCapableNodeAllocatorService.java

plugin/trino-tpcds/src/main/java/io/trino/plugin/tpcds/TpcdsNodePartitioningProvider.java

core/trino-main/src/main/java/io/trino/execution/scheduler/SqlQueryScheduler.java

losipiuk · 2022-03-21T19:45:40Z

core/trino-main/src/main/java/io/trino/execution/scheduler/SqlQueryScheduler.java

+                        InternalNode node = bucketNodeMap.getAssignedNode(bucket)
+                                .orElseThrow(() -> new IllegalStateException("Nodes are expected to be assigned for non dynamic BucketNodeMap"));
+                        Integer partitionId = nodeToPartition.get(node);
+                        if (partitionId == null) {


would that make sense to have more than one partition on single node?

Hmm, interesting question. For example to make tasks smaller for more granular retries?

Yeah for example. For partitions, we should probably opt for those to be similarly sized. Building partitions on top of the enforced bucket<->node mapping does not necessarily imply that. Not something that we need to address here. Just a random thought.

core/trino-main/src/main/java/io/trino/memory/ClusterMemoryManager.java

linzebing · 2022-03-21T21:26:08Z

core/trino-main/src/main/java/io/trino/execution/scheduler/StageTaskSourceFactory.java

-                            HostAddress existingValue = partitionToNodeMap.put(partition, bucketNodeMap.getAssignedNode(split).get().getHostAndPort());
-                            checkState(existingValue == null, "host already assigned for partition %s: %s", partition, existingValue);
+                            HostAddress requiredAddress = bucketNodeMap.getAssignedNode(split).get().getHostAndPort();
+                            Set<HostAddress> existingRequirement = partitionToNodeMap.get(partition);


Seems existingRequirement will have at most one element. Then we don't really need a set?

A split has a list of hosts in it's requirement. The set is needed to support split specific requirements.

linzebing · 2022-03-21T22:07:01Z

core/trino-main/src/main/java/io/trino/execution/scheduler/SqlQueryScheduler.java

+                    }
+                }
+                else {
+                    BiMap<InternalNode, Integer> nodeToPartition = HashBiMap.create();


I think it' beneficial to add a comment explaining the logic, e.g. "make sure all buckets mapped to the same node map to the same partition, such that locality requirements are respected in scheduling".

Otherwise queries like "SHOW TABLES" won't work

Make it consistent with locality requirements defined in splits

To allow scheduling of coordinator only tasks and splits

cla-bot bot added the cla-signed label Mar 19, 2022

arhimondr force-pushed the scheduling-fixes branch 2 times, most recently from 3b17f93 to 473aea1 Compare March 21, 2022 17:56

arhimondr requested review from losipiuk and linzebing March 21, 2022 17:59

arhimondr changed the title ~~[WIP] Various scheduling related fixes for Tardigrade~~ Fix scheduling for splits with locality requirement in Tardigrade Mar 21, 2022

losipiuk approved these changes Mar 21, 2022

View reviewed changes

linzebing reviewed Mar 21, 2022

View reviewed changes

arhimondr added 5 commits March 21, 2022 19:04

Allow scheduling of coordinator only splits

6242974

Otherwise queries like "SHOW TABLES" won't work

Respect host requirements for bucketed splits

938d048

Fix ConnectorBucketNodeMap for TPC-H and TPC-DS connectors

a9b9e13

Make it consistent with locality requirements defined in splits

Respect static BucketNodeMap in SqlQueryScheduler

0c6a923

Always update coordinator memory pools

c587f05

To allow scheduling of coordinator only tasks and splits

arhimondr force-pushed the scheduling-fixes branch from 473aea1 to c587f05 Compare March 21, 2022 23:17

linzebing approved these changes Mar 22, 2022

View reviewed changes

arhimondr merged commit d877d73 into trinodb:master Mar 22, 2022

mosabua mentioned this pull request Mar 22, 2022

Add Trino 375 release notes #11528

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix scheduling for splits with locality requirement in Tardigrade #11581

Fix scheduling for splits with locality requirement in Tardigrade #11581

arhimondr commented Mar 19, 2022 •

edited

losipiuk Mar 21, 2022

arhimondr Mar 21, 2022

losipiuk Mar 22, 2022

linzebing Mar 21, 2022

arhimondr Mar 21, 2022

linzebing Mar 21, 2022

Fix scheduling for splits with locality requirement in Tardigrade #11581

Fix scheduling for splits with locality requirement in Tardigrade #11581

Conversation

arhimondr commented Mar 19, 2022 • edited

Description

Related issues, pull requests, and links

Documentation

Release notes

losipiuk Mar 21, 2022

Choose a reason for hiding this comment

arhimondr Mar 21, 2022

Choose a reason for hiding this comment

losipiuk Mar 22, 2022

Choose a reason for hiding this comment

linzebing Mar 21, 2022

Choose a reason for hiding this comment

arhimondr Mar 21, 2022

Choose a reason for hiding this comment

linzebing Mar 21, 2022

Choose a reason for hiding this comment

arhimondr commented Mar 19, 2022 •

edited