Skip to content

Commit

Permalink
[SPARK-31406][SQL][TEST] ThriftServerQueryTestSuite: Sharing test dat…
Browse files Browse the repository at this point in the history
…a and test tables among multiple test cases

### What changes were proposed in this pull request?
This PR is related to apache#28060.
`ThriftServerQueryTestSuite` spend 17 minutes time to test.
I checked the code and found `ThriftServerQueryTestSuite` load test data repeatedly.
I've listed all the test cases order by time with desc in the `hive-thriftserver` module below.

Class | Spend time  ↑ | Failure | Skip | Pass | Total test case
-- | -- | -- | -- | -- | --
ThriftServerQueryTestSuite | 17 minutes | 0 | 15 | 140 | 155
CliSuite | 8 minutes 24 seconds | 0 | 0 | 24 | 24
SparkThriftServerProtocolVersionsSuite | 59 seconds | 0 | 0 | 210 | 210
HiveThriftBinaryServerSuite | 36 seconds | 0 | 1 | 21 | 22
SparkMetadataOperationSuite | 19 seconds | 0 | 0 | 7 | 7
HiveCliSessionStateSuite | 16 seconds | 0 | 0 | 2 | 2
SparkSQLEnvSuite | 16 seconds | 0 | 0 | 1 | 1
HiveThriftHttpServerSuite | 15 seconds | 0 | 0 | 3 | 3
SingleSessionSuite | 14 seconds | 0 | 0 | 3 | 3
JdbcConnectionUriSuite | 2.1 seconds | 0 | 0 | 1 | 1
ThriftServerWithSparkContextSuite | 1.4 seconds | 0 | 0 | 1 | 1
SparkExecuteStatementOperationSuite | 63 millseconds | 0 | 0 | 2 | 2
UISeleniumSuite | -1 millseconds | 0 | 1 | 0 | 1

I checked the code of `ThriftServerQueryTestSuite` and found `ThriftServerQueryTestSuite` load test data repeatedly.
This PR will improve the performance of `ThriftServerQueryTestSuite`.
Because apache#28060 provides `createTestTables`(https://github.com/apache/spark/blob/e42a3945acd614a26c7941a9eed161b500fb4520/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala#L574) and `removeTestTables`(https://github.com/apache/spark/blob/e42a3945acd614a26c7941a9eed161b500fb4520/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala#L666), this PR will still uses them.
The total time run `ThriftServerQueryTestSuite` before and after this PR show below.
Before
No | Time
-- | --
1 | 18 minutes, 8 seconds
2 | 22 minutes, 44 seconds
3 | 17 minutes, 48 seconds
4 | 18 minutes, 30 seconds

After
No | Time
-- | --
1 | 16 minutes, 11 seconds
2 | 17 minutes, 19 seconds
3 | 18 minutes, 15 seconds
4 | 17 minutes, 27 seconds

### Why are the changes needed?
Improve the performance of `ThriftServerQueryTestSuite`.

### Does this PR introduce any user-facing change?
'No'.

### How was this patch tested?
Jenkins test

Closes apache#28180 from beliefer/avoid-load-thrift-test-data-repeatedly.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
  • Loading branch information
beliefer authored and sjincho committed Apr 14, 2020
1 parent 13986c9 commit 7fc746d
Showing 1 changed file with 0 additions and 57 deletions.
Expand Up @@ -80,8 +80,6 @@ class ThriftServerQueryTestSuite extends SQLQueryTestSuite with SharedThriftServ
// We do not test with configSet.
withJdbcStatement { statement =>

loadTestData(statement)

configSet.foreach { case (k, v) =>
statement.execute(s"SET $k = $v")
}
Expand Down Expand Up @@ -262,61 +260,6 @@ class ThriftServerQueryTestSuite extends SQLQueryTestSuite with SharedThriftServ
}
}

/** Load built-in test tables. */
private def loadTestData(statement: Statement): Unit = {
// Prepare the data
statement.execute(
"""
|CREATE OR REPLACE TEMPORARY VIEW testdata as
|SELECT id AS key, CAST(id AS string) AS value FROM range(1, 101)
""".stripMargin)
statement.execute(
"""
|CREATE OR REPLACE TEMPORARY VIEW arraydata as
|SELECT * FROM VALUES
|(ARRAY(1, 2, 3), ARRAY(ARRAY(1, 2, 3))),
|(ARRAY(2, 3, 4), ARRAY(ARRAY(2, 3, 4))) AS v(arraycol, nestedarraycol)
""".stripMargin)
statement.execute(
"""
|CREATE OR REPLACE TEMPORARY VIEW mapdata as
|SELECT * FROM VALUES
|MAP(1, 'a1', 2, 'b1', 3, 'c1', 4, 'd1', 5, 'e1'),
|MAP(1, 'a2', 2, 'b2', 3, 'c2', 4, 'd2'),
|MAP(1, 'a3', 2, 'b3', 3, 'c3'),
|MAP(1, 'a4', 2, 'b4'),
|MAP(1, 'a5') AS v(mapcol)
""".stripMargin)
statement.execute(
s"""
|CREATE TEMPORARY VIEW aggtest
| (a int, b float)
|USING csv
|OPTIONS (path '${baseResourcePath.getParent}/test-data/postgresql/agg.data',
| header 'false', delimiter '\t')
""".stripMargin)
statement.execute(
s"""
|CREATE OR REPLACE TEMPORARY VIEW onek
| (unique1 int, unique2 int, two int, four int, ten int, twenty int, hundred int,
| thousand int, twothousand int, fivethous int, tenthous int, odd int, even int,
| stringu1 string, stringu2 string, string4 string)
|USING csv
|OPTIONS (path '${baseResourcePath.getParent}/test-data/postgresql/onek.data',
| header 'false', delimiter '\t')
""".stripMargin)
statement.execute(
s"""
|CREATE OR REPLACE TEMPORARY VIEW tenk1
| (unique1 int, unique2 int, two int, four int, ten int, twenty int, hundred int,
| thousand int, twothousand int, fivethous int, tenthous int, odd int, even int,
| stringu1 string, stringu2 string, string4 string)
|USING csv
| OPTIONS (path '${baseResourcePath.getParent}/test-data/postgresql/tenk.data',
| header 'false', delimiter '\t')
""".stripMargin)
}

// Returns true if sql is retrieving data.
private def isNeedSort(sql: String): Boolean = {
val upperCase = sql.toUpperCase(Locale.ROOT)
Expand Down

0 comments on commit 7fc746d

Please sign in to comment.