Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce tidb_tiflash_node_selection_policy to support different policies of TiFlash Node Selection #44106

Closed
XuHuaiyu opened this issue May 23, 2023 · 0 comments · Fixed by #44107
Labels
type/feature-request Categorizes issue or PR as related to a new feature.

Comments

@XuHuaiyu
Copy link
Contributor

Background

  • Big customers always have some IDCs. They want to deploy the cluster in multiple IDCs to improve HA. Network traffic between IDC needs to be avoided, but now TiFlash always tries to use all the nodes to compute to run faster. This will create lots of network traffic in mutli-IDC scenarios.
  • If the customer can distribute two TiFlash replicas in two IDCs through the placement rule, the TiFlash can only choose the TiFlash nodes in the same zone as TiDB node, and there will be no network traffic in the cluster.
  • There is no work around.

Goals (Must)
Support different policies of TiFlash Node Selection with variable tidb_tiflash_node_selection_policy
This feature is only for On-Primise.

tidb_tiflash_node_selection_policy

  • Scope: GLOBAL | SESSION
  • Type: Enumeration
  • Default value: all_nodes
  • Value options: all_nodes, priority_local_zone_nodes, only_local_zone_nodes
  • This variable is used to set the policy of TiFlash node selection when the query needs the TiFlash engine.
    • all_nodes means using all the available nodes to do analytic computing, regardless of local zone or other zones.
    • priority_local_zone_nodes means using the nodes in the same zone as the entry TiDB. If not all the tiflash data can be accessed, the query will involve the tiflash nodes from other zones.
    • only_local_zone_nodes means using only the nodes in the same zone as the entry TiDB. If not all the tiflash data can be accessed, the query will report an error.

Target user / role (Must)

  • The big customers whose cluster is in multi-IDCs.
  • Run TiFlash in every IDC and avoid network traffic between IDCs.

Scenarios (Option)

  1. Deploy one cluster in multi-IDCs, the TiFlash nodes are set to different zone attributes according to TiUP settings.
  2. Load tiflash data and set placement rules, make sure that each zone has a whole TiFlash replica.
    Steps above, see https://docs.pingcap.com/tidb/stable/configure-placement-rules
    If there's some data disobeying the placement rule, we can move the replica data to target nodes in the zone. See https://docs.pingcap.com/tidb/stable/pd-control#operator-check--show--add--remove
  3. Set the TiFlash node selection policy
  4. Query the table involves tiflash.
    set @@tidb_tiflash_node_selection_policy=priority_local_zone_nodes;
    select ... from table_tiflash;

Functional Specs
Add a new variable to specify the policy of TiFlash node selection when the query needs the TiFlash engine.

  • all_nodes means using all the available nodes to do analytic computing, regardless of local zone or other zones. It is the same as the default policy now.

  • priority_local_zone_nodes means using the nodes in the same zone as the entry TiDB. If not all the tiflash data can be accessed, the query will involve the tiflash nodes from other zones. And show the data can not be accessed in the local zone in the warning message.

    • RISK: If the local zone does not have all TiFlash replicas needed in the query, the execution will involve the nodes from other zones, and will cause lots of network traffic between zones.
    • Warning message: Total xx region(s) can not be accessed by TiFlash in the zone [zone_name]: region1, region2, ... (list no more than 3 regions)
  • only_local_zone_nodes means using only the nodes in the same zone as the entry TiDB. If not all the tiflash data can be accessed, the query will report an error, and show an error message. Because of the feature of TiFlash remote read, a small number of regions in other zones is acceptable, but performance will be affected. And show the data can not be accessed in the local zone in the warning message.
    Note: The threshold is fixed, 3 regions per tiflash node.
    If each tiflash node needs to remote read no more than 3 regions, we will plan the execution in the local zone.

    • RISK: If the local zone does not have all TiFlash replicas needed in the query, the execution will fail. This may affect business stability.
    • If the query reports an error because of zones, there will be an error message: No less than xx region(s) can not be accessed by TiFlash in the zone [zone_name]: region1, region2, ...(list no more than 3 regions).
    • If the query is executed successfully and activates TiFlash Remote Read, there will be a warning message: Total xx region(s) can not be accessed by TiFlash in the zone [zone_name]: region1, region2, ...(list no more than 3 regions).

Corner case:

  • If TiDB nodes do not set zone attributes and the policy of TiFlash node selection is not all_nodes, the policy of TiFlash node selection will be ignored, all the tiflash nodes will be used in the tiflash query. And there will be a warning message: The variable tidb_tiflash_node_selection_policy is ignored.
  • If TiFlash nodes do not set zone attributes, these nodes will be treated as nodes not in any zone.

Other Risk

  • Network traffic
    Because the regions may be moving, and the tiflash nodes will have to do remote read. The remote read may choose the nodes from the other zone. So if we choose only_local_zone_nodes, there may be some network traffic between zones.
    1. Remote read of TiFlash only affects the scan of regions, and will not involve more nodes to do MPP computing.
    2. Remote read is very slow, we will try to avoid this.

This traffic is very small. In normal cases, the cross-zone traffic is less than 1% of all the regions scanned, most of the time it's nearly 0.

@XuHuaiyu XuHuaiyu added the type/feature-request Categorizes issue or PR as related to a new feature. label May 23, 2023
ti-chi-bot bot pushed a commit that referenced this issue Jul 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature-request Categorizes issue or PR as related to a new feature.
Projects
None yet
1 participant