Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make TiSpark's Explain clearer and easier to read #2439

Merged
merged 46 commits into from
Jul 20, 2022
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
8742286
change string() in TiDAGRequest
qidi1 Jun 22, 2022
0ba3dd5
change name of scan
qidi1 Jun 28, 2022
d8fe1c8
merge master
qidi1 Jul 1, 2022
f04abcb
repaire test error
qidi1 Jul 1, 2022
dcdae3c
format code
qidi1 Jul 1, 2022
46cb114
repair word error
qidi1 Jul 1, 2022
4441c17
repair test error
qidi1 Jul 2, 2022
9a38567
add doc of execution plan in tispark
qidi1 Jul 4, 2022
0a508cf
add github action of alter primary key false
qidi1 Jul 2, 2022
e637e81
set alter-primary-key true
qidi1 Jul 4, 2022
41dc92b
delete mutit jdk
qidi1 Jul 4, 2022
bdf1a42
repaire TLS test
qidi1 Jul 4, 2022
1479efe
empty line in alter-primary-key-false-test.yml
qidi1 Jul 4, 2022
ff3bdff
delete delete-test in alter-primary-key-false-test.yml
qidi1 Jul 4, 2022
fca8686
change doc
qidi1 Jul 4, 2022
699e8e8
use seq in tidb-alter-primary-key-false change name in IndexScanType
qidi1 Jul 4, 2022
5ada226
rename indexscan to indexLookup
qidi1 Jul 4, 2022
293bcce
repair license error
qidi1 Jul 4, 2022
0fabebe
repair tikv cant run
qidi1 Jul 5, 2022
e04b3fd
change to use docker
qidi1 Jul 5, 2022
8957629
change docker name
qidi1 Jul 5, 2022
42f1b01
change to tidb4.0 compose
qidi1 Jul 5, 2022
aaf2687
change compose-4.0 tidb version
qidi1 Jul 5, 2022
a3eacde
change compose-4.0 tidb version
qidi1 Jul 5, 2022
4929d65
remove matrix of yml
qidi1 Jul 5, 2022
82e1acf
rename scan table to scan data
qidi1 Jul 6, 2022
29b468a
simple index scan
qidi1 Jul 11, 2022
7fc70dd
remove unused function
qidi1 Jul 11, 2022
f8cfb8e
format code
qidi1 Jul 11, 2022
b8e7bb9
rename function
qidi1 Jul 11, 2022
9e679c3
rename function
qidi1 Jul 11, 2022
27e9d3f
rename function of buildScan
qidi1 Jul 13, 2022
9cfafb4
rewrite buildScan function
qidi1 Jul 14, 2022
8b385a7
format code
qidi1 Jul 14, 2022
212ab5e
format code
qidi1 Jul 14, 2022
482ff2d
format code
qidi1 Jul 14, 2022
2679207
resolve bug
qidi1 Jul 14, 2022
00fa67d
resolve grammer error
qidi1 Jul 14, 2022
cad259f
repaire bug
qidi1 Jul 17, 2022
48eeb9e
repaire bug
qidi1 Jul 18, 2022
5120ff4
repaire bug
qidi1 Jul 18, 2022
fa70f87
repaire bug
qidi1 Jul 18, 2022
3d9d4d0
repaire bug
qidi1 Jul 19, 2022
ee6a767
repaire bug
qidi1 Jul 19, 2022
ba16a70
Merge branch 'master' into phyicalplanexplain
shiyuhang0 Jul 19, 2022
dce8c31
repaire bug
qidi1 Jul 19, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
43 changes: 43 additions & 0 deletions .github/workflows/alter-primary-key-false-test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
name: alter-primary-key-false-test
qidi1 marked this conversation as resolved.
Show resolved Hide resolved

on:
push:
branches:
- master
pull_request:
branches:
- master

jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
distribution: [ 'adopt' ]
name: Java ${{ matrix.distribution }} sample
qidi1 marked this conversation as resolved.
Show resolved Hide resolved
steps:

- name: checkout
uses: actions/checkout@v2

- name: set up JDK
uses: actions/setup-java@v3
with:
java-version: '8'
distribution: ${{ matrix.distribution }}
cache: maven

- name: add host and copy properties
run: |
echo -e "127.0.0.1 pd \n127.0.0.1 tikv" | sudo tee -a /etc/hosts
sudo cp -r config /config
cp core/src/test/resources/tidb_config_alter_primary_key_true.properties.template core/src/test/resources/tidb_config.properties

- name: build docker
run: docker-compose -f docker-compose-TiDB-TLS.yaml up -d
qidi1 marked this conversation as resolved.
Show resolved Hide resolved
qidi1 marked this conversation as resolved.
Show resolved Hide resolved

- name: build
run: mvn clean package -Dmaven.test.skip=true -B

- name: test
run: mvn test -am -pl core -Dtest=moo -DwildcardSuites=org.apache.spark.sql.catalyst.plans.logical.LogicalPlanTestSuite,com.pingcap.tispark.delete -DfailIfNoTests=false
qidi1 marked this conversation as resolved.
Show resolved Hide resolved
228 changes: 228 additions & 0 deletions config/tidb-alter-primary-key-false.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
# TiDB Configuration.

# TiDB server host.
host = "0.0.0.0"

# TiDB server port.
port = 4000

# Registered store name, [memory, goleveldb, boltdb, tikv, mocktikv]
store = "mocktikv"

# TiDB storage path.
path = "/tmp/tidb"

# The socket file to use for connection.
#socket = ""

# Socket file to write binlog.
#binlog-socket = ""

# Run ddl worker on this tidb-server.
run-ddl = true

# Schema lease duration, very dangerous to change only if you know what you do.
lease = "10s"

# When create table, split a separated region for it. It is recommended to
# turn off this option if there will be a large number of tables created.
split-table = true

# delay-clean-table-lock is used to control whether delayed-release the table lock in the abnormal situation. (Milliseconds)
delay-clean-table-lock = 60000

# The limit of concurrent executed sessions.
# token-limit = 1000

# Enable chunk executors.
enable-chunk = true

# enable-table-lock is used to control table lock feature. Default is false, indicate the table lock feature is disabled.
enable-table-lock = true

# alter-primary-key is used to control alter primary key feature. Default is false, indicate the alter primary key feature is disabled.
# If it is true, we can add the primary key by "alter table", but we may not be able to drop the primary key.
# In order to support "drop primary key" operation , this flag must be true and the table does not have the pkIsHandle flag.
alter-primary-key = false

# index-limit is used to deal with compatibility issues. It can only be in [64, 64*8].
index-limit = 512

# Whether new collations are enabled, as indicated by its name, this configuration entry take effect ONLY when a TiDB cluster bootstraps for the first time.
new_collations_enabled_on_first_bootstrap = false

[log]
# Log level: info, debug, warn, error, fatal.
level = "info"

# Log format, one of json, text, console.
format = "text"

# Disable automatic timestamps in output
disable-timestamp = false

# Stores slow query log into separate files.
#slow-query-file = ""

# Queries with execution time greater than this value will be logged. (Milliseconds)
slow-threshold = 300

# Maximum query length recorded in log.
query-log-max-len = 2048

# File logging.
[log.file]
# Log file name.
filename = ""

# Max log file size in MB.
#max-size = 300

# Max log file keep days.
#max-days = 28

# Maximum number of old log files to retain.
#max-backups = 7

# Rotate log by day
log-rotate = true

[security]
# Path of file that contains list of trusted SSL CAs for connection with mysql client.
ssl-ca = ""

# Path of file that contains X509 certificate in PEM format for connection with mysql client.
ssl-cert = ""

# Path of file that contains X509 key in PEM format for connection with mysql client.
ssl-key = ""

# Path of file that contains list of trusted SSL CAs for connection with cluster components.
cluster-ssl-ca = ""

# Path of file that contains X509 certificate in PEM format for connection with cluster components.
cluster-ssl-cert = ""

# Path of file that contains X509 key in PEM format for connection with cluster components.
cluster-ssl-key = ""

[status]
# If enable status report HTTP service.
report-status = true

# TiDB status port.
status-port = 10080

# Prometheus pushgateway address, leaves it empty will disable prometheus push.
# metrics-addr = "pushgateway:9091"

# Prometheus client push interval in second, set \"0\" to disable prometheus push.
metrics-interval = 0

[performance]
# Set keep alive option for tcp connection.
tcp-keep-alive = true

# feedback probability of statistics
# turn it off to fix https://github.com/pingcap/tispark/issues/1183
feedback-probability = 0.0

# The maximum number of retries when commit a transaction.
retry-limit = 10

# The number of goroutines that participate joining.
join-concurrency = 5

# Whether support cartesian product.
cross-join = true

# Stats lease duration, which inflences the time of analyze and stats load.
stats-lease = "3s"

# Run auto analyze worker on this tidb-server.
run-auto-analyze = true

[xprotocol]
# Start TiDB x server.
xserver = false

# TiDB x protocol server host.
xhost = "0.0.0.0"

# TiDB x protocol server port.
xport = 14000

# The socket file to use for x protocol connection.
xsocket = ""

[proxy-protocol]
# PROXY protocol acceptable client networks.
# Empty string means disable PROXY protocol, * means all networks.
networks = ""

# PROXY protocol header read timeout, unit is second
header-timeout = 5

[plan-cache]
enabled = false
capacity = 2560
shards = 256

[prepared-plan-cache]
enabled = false
capacity = 100

[opentracing]
# Enable opentracing.
enable = false

# Whether to enable the rpc metrics.
rpc-metrics = false

[opentracing.sampler]
# Type specifies the type of the sampler: const, probabilistic, rateLimiting, or remote
type = "const"

# Param is a value passed to the sampler.
# Valid values for Param field are:
# - for "const" sampler, 0 or 1 for always false/true respectively
# - for "probabilistic" sampler, a probability between 0 and 1
# - for "rateLimiting" sampler, the number of spans per second
# - for "remote" sampler, param is the same as for "probabilistic"
# and indicates the initial sampling rate before the actual one
# is received from the mothership
param = 1.0

# SamplingServerURL is the address of jaeger-agent's HTTP sampling server
sampling-server-url = ""

# MaxOperations is the maximum number of operations that the sampler
# will keep track of. If an operation is not tracked, a default probabilistic
# sampler will be used rather than the per operation specific sampler.
max-operations = 0

# SamplingRefreshInterval controls how often the remotely controlled sampler will poll
# jaeger-agent for the appropriate sampling strategy.
sampling-refresh-interval = 0

[opentracing.reporter]
# QueueSize controls how many spans the reporter can keep in memory before it starts dropping
# new spans. The queue is continuously drained by a background go-routine, as fast as spans
# can be sent out of process.
queue-size = 0

# BufferFlushInterval controls how often the buffer is force-flushed, even if it's not full.
# It is generally not useful, as it only matters for very low traffic services.
buffer-flush-interval = 0

# LogSpans, when true, enables LoggingReporter that runs in parallel with the main reporter
# and logs all submitted spans. Main Configuration.Logger must be initialized in the code
# for this option to have any effect.
log-spans = false

# LocalAgentHostPort instructs reporter to send spans to jaeger-agent at this address
local-agent-host-port = ""

[tikv-client]
# Max gRPC connections that will be established with each tikv-server.
grpc-connection-count = 16
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ case class ColumnarRegionTaskExec(
override def simpleString(maxFields: Int): String = verboseString(maxFields)

override def verboseString(maxFields: Int): String =
s"TiSpark $nodeName{downgradeThreshold=$downgradeThreshold,downgradeFilter=${dagRequest.getFilters}"
s"TiSpark $nodeName{downgradeThreshold=$downgradeThreshold,downgradeFilter=${dagRequest.getDowngradeFilters}"

private def inputRDD(): RDD[InternalRow] = {
val numOutputRows = longMetric("numOutputRows")
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# TiDB address
qidi1 marked this conversation as resolved.
Show resolved Hide resolved
tidb.addr=127.0.0.1
# TiDB port
tidb.port=4000
# TiDB login user
# tidb.user=root
# TiDB login password
# tidb.password=
# TPC-H database name, if you already have a tpch database in TiDB, specify the db name so that TPC-H tests will run on this database
# TPC-H test is enabled by default, you may turn it off by setting tpch.db=
# tpch.db=tpch_test
# TPC-DS database name, if you already have a tpcds database in TiDB, specify the db name so that TPC-DS tests will run on this database
# TPC-DS test is disabled by default
# tpcds.db=tpcds_test
# Placement Driver address:port
spark.tispark.pd.addresses=127.0.0.1:2379
# Whether to allow index read in tests, you must set this to true to run index tests.
#spark.tispark.plan.allow_index_read=true
# Whether to load test data before running tests. If you haven't load tispark_test or tpch_test data, set this to true. The next time you run tests, you can set this to false.
# If you do not want the change this value, please set it to auto, the test data will be loaded only if it does not exist in tidb.
#test.data.load=auto
# Whether to generate test data. Enabling test data generation may change data of all tests.
# test.data.generate=true
# The seed used to generate test data (0 means random).
# test.data.generate.seed=1234
# Whether to enable TiFlash tests
# test.tiflash.enable=false
# DB prefix for tidb databases in case it conflicts with hive database
#spark.tispark.db_prefix=tidb_
# Whether to use the new feature Catalog provided by spark-3.0
spark.sql.catalog.tidb_catalog=org.apache.spark.sql.catalyst.catalog.TiCatalog
# Whether to enable authorization when use spark sql
# spark.sql.auth.enable=true
Original file line number Diff line number Diff line change
Expand Up @@ -32,24 +32,38 @@ class BasePlanTest extends BaseTiSparkTest {
case plan: ColumnarCoprocessorRDD => plan
case plan: ColumnarRegionTaskExec => plan
}
val extractDAGRequest: PartialFunction[SparkPlan, TiDAGRequest] = {
case plan: ColumnarRegionTaskExec => plan.dagRequest
case plan: ColumnarCoprocessorRDD => plan.dagRequest
val extractDAGRequest: PartialFunction[SparkPlan, Seq[TiDAGRequest]] = {
qidi1 marked this conversation as resolved.
Show resolved Hide resolved
case plan: ColumnarRegionTaskExec => {
List(plan.dagRequest)
}
case plan: ColumnarCoprocessorRDD => {
plan.tiRDDs.map(x => {
x.dagRequest
})
}
}

def explain[T](df: Dataset[T]): Unit = df.explain

def extractDAGRequests[T](df: Dataset[T]): Seq[TiDAGRequest] =
toPlan(df).collect { extractDAGRequest }
toPlan(df).collect {
extractDAGRequest
}.flatten

def extractTiSparkPlans[T](df: Dataset[T]): Seq[SparkPlan] =
toPlan(df).collect { extractTiSparkPlan }
toPlan(df).collect {
extractTiSparkPlan
}

def extractCoprocessorRDDs[T](df: Dataset[T]): Seq[ColumnarCoprocessorRDD] =
toPlan(df).collect { extractCoprocessorRDD }
toPlan(df).collect {
extractCoprocessorRDD
}

def extractRegionTaskExecs[T](df: Dataset[T]): List[ColumnarRegionTaskExec] =
toPlan(df).collect { extractRegionTaskExec }.toList
toPlan(df).collect {
extractRegionTaskExec
}.toList

def checkIndex[T](df: Dataset[T], index: String): Unit = {
if (!extractCoprocessorRDDs(df).exists(checkIndexName(_, index))) {
Expand All @@ -75,7 +89,7 @@ class BasePlanTest extends BaseTiSparkTest {
if (tiSparkPlans.isEmpty) {
fail(df, "No TiSpark plans found in Dataset")
}
val filteredRequests = tiSparkPlans.collect { extractDAGRequest }.filter {
val filteredRequests = tiSparkPlans.collect { extractDAGRequest }.flatten.filter {
_.getTableInfo.getName.equalsIgnoreCase(tableName)
}
if (filteredRequests.isEmpty) {
Expand Down Expand Up @@ -121,7 +135,7 @@ class BasePlanTest extends BaseTiSparkTest {
}

def getEstimatedRowCount[T](df: Dataset[T], tableName: String): Double =
extractTiSparkPlans(df).collect { extractDAGRequest }.head.getEstimatedCount
extractDAGRequests(df).head.getEstimatedCount

def toPlan[T](df: Dataset[T]): SparkPlan = df.queryExecution.sparkPlan

Expand Down