Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Sql test bench #335

Merged
merged 38 commits into from
Jan 8, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
cea903f
Initial commit with clean code from PoC branch
dai-chen Dec 13, 2019
a69def2
Refactor code
dai-chen Dec 16, 2019
34d951c
Fix float number parse issue
dai-chen Dec 17, 2019
68655bb
Fix sqlite memory db issue
dai-chen Dec 19, 2019
e3f0603
Save reports
dai-chen Dec 19, 2019
239ce2d
Handle date type
dai-chen Dec 26, 2019
fba4a6c
Load ecommerce test data
dai-chen Dec 30, 2019
fb48771
Refactor comparison test class
dai-chen Dec 30, 2019
e9c7e0f
Support command line args from gradlew
dai-chen Dec 31, 2019
2fcb8a6
Support command line args from gradlew
dai-chen Dec 31, 2019
e6b5df5
Close connection
dai-chen Dec 31, 2019
bcfb680
Use ES rest client only
dai-chen Jan 1, 2020
5fc017f
Rename es host url argument
dai-chen Jan 1, 2020
08b7771
Refactor test set
dai-chen Jan 2, 2020
d1e2aac
Print log to console
dai-chen Jan 2, 2020
19d4d9f
Add unit test for es connection
dai-chen Jan 3, 2020
a8cd0c1
Add unit test for jdbc connection
dai-chen Jan 3, 2020
2f0d672
Fix connect issue
dai-chen Jan 3, 2020
15deeef
Add more unit test
dai-chen Jan 3, 2020
c3d6d38
Refactor DBResult and Row
dai-chen Jan 3, 2020
c8cc035
Fix unit tests
dai-chen Jan 3, 2020
b99deae
Refactor
dai-chen Jan 3, 2020
dfd3bef
Support command line args via gradle
dai-chen Jan 4, 2020
6d42379
Retry different database if mismatch on first one
dai-chen Jan 4, 2020
088b44f
Add more test data
dai-chen Jan 4, 2020
4b2d9f8
Merge branch 'master' into sql-test-bench
dai-chen Jan 4, 2020
1947a3a
Merge branch 'master' into sql-test-bench
dai-chen Jan 6, 2020
e4dabbd
Add documentation
dai-chen Jan 6, 2020
f1fe0c2
Add documentation
dai-chen Jan 6, 2020
525cb3d
Add documentation
dai-chen Jan 6, 2020
01a579f
Add documentation
dai-chen Jan 6, 2020
1eb33c8
Add documentation
dai-chen Jan 6, 2020
ee18634
Clean up old report files
dai-chen Jan 6, 2020
ff242be
Add missing java doc
dai-chen Jan 7, 2020
3d2934f
Create report folder if not exist
dai-chen Jan 7, 2020
6b9642f
Upgrade our JDBC dependency to 1.3
dai-chen Jan 7, 2020
2feab8c
Generate JSON by serializing java bean and use Lombok
dai-chen Jan 7, 2020
d81b165
Run doctest by default
dai-chen Jan 7, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
34 changes: 34 additions & 0 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,7 @@ compileJava {
// TODO: Similarly, need to fix compiling errors in test source code
compileTestJava.options.warnings = false
compileTestJava {
options.compilerArgs.addAll(["-processor", 'lombok.launch.AnnotationProcessorHider$AnnotationProcessor'])
doFirst {
options.compilerArgs.remove('-Werror')
options.compilerArgs.remove('-Xdoclint:all')
Expand All @@ -132,6 +133,33 @@ integTestRunner {
systemProperty 'tests.security.manager', 'false'
// allows integration test classes to access test resource from project root path
systemProperty('project.root', project.rootDir.absolutePath)

// Run different task based on test type. "exclude" is required for each task.
def testType = System.getProperty("testType")
if (testType == 'doctest') { // Doctest to generate documentation
include 'com/amazon/opendistroforelasticsearch/sql/doctest/**/*IT.class'
exclude 'com/amazon/opendistroforelasticsearch/sql/correctness/**'
exclude 'com/amazon/opendistroforelasticsearch/sql/esintgtest/**'

dai-chen marked this conversation as resolved.
Show resolved Hide resolved
} else if (testType == 'comparison') { // Comparision testing
include 'com/amazon/opendistroforelasticsearch/sql/correctness/CorrectnessIT.class'
exclude 'com/amazon/opendistroforelasticsearch/sql/doctest/**'
exclude 'com/amazon/opendistroforelasticsearch/sql/esintgtest/**'

// Enable logging output to console
testLogging.showStandardStreams true

// Pass down system properties to IT class
systemProperty "esHost", System.getProperty("esHost")
systemProperty "dbUrl", System.getProperty("dbUrl")
systemProperty "otherDbUrls", System.getProperty("otherDbUrls")
systemProperty "queries", System.getProperty("queries")

} else { // Run all other integration tests and doctest by default
include 'com/amazon/opendistroforelasticsearch/sql/esintgtest/**/*IT.class'
include 'com/amazon/opendistroforelasticsearch/sql/doctest/**/*IT.class'
exclude 'com/amazon/opendistroforelasticsearch/sql/correctness/**'
}
}

integTestCluster {
Expand Down Expand Up @@ -214,6 +242,12 @@ dependencies {
// testCompile group: 'com.alibaba', name: 'fastjson', version:'1.2.56'
// testCompile group: 'org.mockito', name: 'mockito-core', version:'2.23.4'
testCompile group: "org.elasticsearch.client", name: 'transport', version: "${es_version}"

// JDBC drivers for comparison test. Somehow Apache Derby throws security permission exception.
testCompile group: 'com.amazon.opendistroforelasticsearch.client', name: 'opendistro-sql-jdbc', version: '1.3.0.0'
dai-chen marked this conversation as resolved.
Show resolved Hide resolved
testCompile group: 'com.h2database', name: 'h2', version: '1.4.200'
testCompile group: 'org.xerial', name: 'sqlite-jdbc', version: '3.28.0'
//testCompile group: 'org.apache.derby', name: 'derby', version: '10.15.1.3'
}

apply plugin: 'nebula.ospackage'
Expand Down
289 changes: 289 additions & 0 deletions docs/dev/Testing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,289 @@
# How OpenDistro SQL Is Tested

## 1.Introduction

### 1.1 Problem Statement

Currently there are quite a few unit tests and integration tests in the codebase. However, there are still problems to rely on these test cases only to ensure the correctness of our plugin:

1. **Test coverage**: Although the amount of test cases seems large, we’ve found that more than one important case was missed, for example use table alias in FROM clause, GROUP BY field alias etc.
2. **Test correctness**: Some test case make assertion on the result set too simple to capture the issue if any. For example, some assertion is as loose as checking if result set is not empty. Unfortunately with strict assertion, we’re still not sure if the assertion is correct or not. For example, there were issue [#124](https://github.com/opendistro-for-elasticsearch/sql/issues/124) and [#226](https://github.com/opendistro-for-elasticsearch/sql/issues/226) regarding LEFT JOIN with WHERE clause and SUBSTRING. The correct semantic was actually different from what we thought. We did have MySQL reference docs at hands but it’s possible to miss some cases or misunderstand the correct use. So some mechanism is required to replace the loose assertion and be able to enforce the verification of our understanding.
3. **Test Bench**: We want to run this new test on a regular basis to improving our implementation continuously.

### 1.2 Our Goals

In this work, we want to address the problems above for our testing to get more confidence of the correctness of our plugin. To achieve this goal, we need to improve both the quantity and quality of our tests and also keep it running on a regular basis. Although the search space of SQL is infinite, we can cover typical use cases and corner cases and most importantly make sure it does demonstrate the correctness of our implementation if it can pass.

> Note that performance is also another important area that we want to test and understand how efficient or where is the bottleneck of typical queries. Since Elasticsearch provides ESRally benchmark tool which is totally different from our homemade test harness, we will be focuses on the correctness and won't cover performance testing here.

## 2.Design

### 2.1 Approaches

First we can improve the test coverage by improving the diversity of our test case set, including diversity of test schema, test data and test cases. Secondly although it's difficult to provide rigorous mathematical proof for the result set of each query, we can make our best to ensure relative correctness by comparing with a few other database implementations. Of course the comparison testing doesn't guarantee 100% correctness and heavily depends on the correctness of reference implementation. But it should be much more reliable than hand written assertions after reading specification documentation previously.

### 2.2 Components

At this stage we don’t want to spend too much efforts on setting up a complicated infrastructure for testing. So we can take full advantage of capabilities that GitHub provides:

1. **Test Data & Cases**: Use test case set with Kibana flights and ecommerce sample index.
2. **Trigger**: Set up another GitHub Action workflow.
3. **Test Runner**: Use embedded Elasticsearch and other IMDBs.
4. **Reporting**: Use standard JUnit report or simple custom json format.
5. **Visualization**: Enable GitHub Pages for viewing or feed into Elasticsearch.

![Test Framework Components](img/test-framework-components.png)

## 3.Implementation

### 3.1 Test Data

For schema, we can just use Elasticsearch mapping as the format of schema and convert it to `INSERT` statement. For data we use CSV format simply.

```
{
"_doc": {
"properties": {
"AvgTicketPrice": {
"type": "float"
},
"Cancelled": {
"type": "boolean"
},
"Carrier": {
"type": "keyword"
},
...
...
"dayOfWeek": {
"type": "integer"
}
}
}
}

FlightNum,Origin,FlightDelay,DistanceMiles,...,DestCityName
9HY9SWR,Frankfurt am Main Airport,false,10247.856675613455,...,Sydney
X98CCZO,Cape Town International Airport,false,5482.606664853586,...,Venice
......
```

### 3.2 Test Cases

For now we don't implement a Fuzzer to generate test queries because test queries prepared manually is sufficient to help identify issues. So we just put all queries in a text file. In future, a fuzzer can generate queries and save to file to integrate with Test Runner smoothly.

```
SELECT 1 AS `empty` FROM `kibana_sample_data_flights`
SELECT substring(OriginWeather, 1, 2) AS OriginWeather FROM kibana_sample_data_flights
SELECT SUM(FlightDelayMin) AS sum_FlightDelayMin_ok FROM kibana_sample_data_flights
SELECT SUM(FlightDelay) AS sum_FlightDelay_ok FROM kibana_sample_data_flights
SELECT SUM(DistanceMiles) AS sum_DistanceMiles_ok FROM kibana_sample_data_flights
```

### 3.3 Test Runner

To simplify the test and be confident about the test result, Test Runner runs query by JDBC driver of OpenDistro SQL and other databases. In this case we don’t need to parse the data format returned from our plugin. And obviously another benefit is the correctness of JDBC driver is also covered.

![How We Do Comparison Test](img/how-we-do-comparison-test.png)

### 3.4 Trigger

GitHub Action can be set up to trigger Gradle task to generate test report.

### 3.5 Reporting

Elasticsearch integration test is still using JUnit 4 which has many problems, such as dynamic test cases. So we can define our own report format for flexibility:

```
{
"summary": {
"total": 3,
"success": 1,
"failure": 2
},
"tests": [
{
"id": 1,
"sql": "...",
"result": "Success"
},
{
"id": 2,
"sql": "...",
"result": "Failed",
"resultSets": [
{
"database": "Elasticsearch",
"resultSet": {
"schema": [{"name":"","type":""},...],
"dataRows": [[...],...,[...]]
}
},
{
"database": "Other database",
"resultSet": {
"schema": [...],
"dataRows": [[...],...,[...]]
}
}
]
},
{
"id": 3,
"sql": "...",
"result": "Failed",
"reason": "..."
},
]
}
```

The workflow of generating test result is:

![The Workflow of Comparison Test](img/the-workflow-of-comparison-test.png)

### 3.6 Visualization

TODO

---

## Appendix

### I.Sample Usage

Use default test set and reference databases by `testType` argument given only. Because `integTestRunner` triggers quite a few integration tests which take long time and runs with `gradlew build` every time, `testType` is added to run doctest for documentation and comparison test separately.

Note that for now test data set argument is not supported because it often requires code changes to map more ES data type to JDBC type as well as convert data.

```
$ ./gradlew integTestRunner -DtestType=comparison

[2020-01-06T11:37:57,437][INFO ][c.a.o.s.c.CorrectnessIT ] [performComparisonTest] Starting comparison test
=================================
Tested Database : (Use internal Elasticsearch in workspace)
Other Databases :
SQLite = jdbc:sqlite::memory:
H2 = jdbc:h2:mem:test;DB_CLOSE_DELAY=-1
Test data set(s) :
Test data set :
Table name: kibana_sample_data_flights
Schema: {
"mappings": {
"properties": {
"AvgTicketPrice": {
"type": "float"
},
"Cancelled": {
"type": "boolean"
},
"Carrier": {
"type": "keyword"
},
...
}
}
}

Data rows (first 5 in 21):
[FlightNum, Origin, FlightDelay, DistanceMiles, FlightTimeMin, OriginWeather, dayOfWeek, AvgTicketPrice, Carrier, FlightDelayMin, OriginRegion, FlightDelayType, DestAirportID, Dest, FlightTimeHour, Cancelled, DistanceKilometers, OriginCityName, DestWeather, OriginCountry, DestCountry, DestRegion, OriginAirportID, DestCityName, timestamp]
[RGXY9H5, Chubu Centrair International Airport, false, 1619.970725161303, 124.1471507959044, Heavy Fog, 0, 626.1297405910661, Kibana Airlines, 0, SE-BD, No Delay, CAN, Guangzhou Baiyun International Airport, 2.06911917993174, true, 2607.0901667139924, Tokoname, Clear, JP, CN, SE-BD, NGO, Guangzhou, 2019-12-23T11:19:32]
[WOPNZEP, Munich Airport, true, 198.57903689856937, 34.9738738474057, Sunny, 0, 681.9911763989377, Kibana Airlines, 15, DE-BY, Carrier Delay, VE05, Venice Marco Polo Airport, 0.5828978974567617, false, 319.58198155849124, Munich, Cloudy, DE, IT, IT-34, MUC, Venice, 2019-12-23T12:32:26]
[G9J5O2V, Frankfurt am Main Airport, false, 4857.154739888458, 651.402736475921, Clear, 0, 868.0507463122127, Kibana Airlines, 0, DE-HE, No Delay, XIY, Xi'an Xianyang International Airport, 10.856712274598683, false, 7816.832837711051, Frankfurt am Main, Thunder & Lightning, DE, CN, SE-BD, FRA, Xi'an, 2019-12-23T03:48:33]
[HM80A5V, Itami Airport, false, 5862.6666599206, 555.0027890084269, Heavy Fog, 0, 765.0413127727119, Logstash Airways, 0, SE-BD, No Delay, TV01, Treviso-Sant'Angelo Airport, 9.250046483473783, true, 9435.047413143258, Osaka, Clear, JP, IT, IT-34, ITM, Treviso, 2019-12-23T19:50:48]

Test data set :
Table name: kibana_sample_data_ecommerce
Schema: {
"mappings": {
"properties": {
"category": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"currency": {
"type": "keyword"
},
"customer_birth_date": {
"type": "date"
},
...
}
}
}
Data rows (first 5 in 21):
[customer_first_name, customer_phone, type, manufacturer, customer_full_name, order_date, customer_last_name, day_of_week_i, total_quantity, currency, taxless_total_price, total_unique_products, category, customer_id, sku, order_id, user, customer_gender, email, day_of_week, taxful_total_price]
[Irwin, , order, [Elitelligence, Microlutions], Irwin Mcdonald, 2019-12-19T23:21:07+00:00, Mcdonald, 3, 2, EUR, 26.98, 2, [Men's Clothing], 14, [ZO0564605646, ZO0117401174], 551689, irwin, MALE, irwin@mcdonald-family.zzz, Thursday, 26.98]
[Wilhemina St., , order, [Spherecords Maternity, Oceanavigations], Wilhemina St. Washington, 2019-12-19T08:03:50+00:00, Washington, 3, 2, EUR, 72.98, 2, [Women's Clothing], 17, [ZO0705107051, ZO0270302703], 550817, wilhemina, FEMALE, wilhemina st.@washington-family.zzz, Thursday, 72.98]
[Kamal, , order, [Elitelligence, Oceanavigations], Kamal Foster, 2019-12-19T08:47:02+00:00, Foster, 3, 2, EUR, 45.98, 2, [Men's Clothing], 39, [ZO0532905329, ZO0277802778], 550864, kamal, MALE, kamal@foster-family.zzz, Thursday, 45.98]
[Diane, , order, [Tigress Enterprises, Low Tide Media], Diane Turner, 2019-12-22T13:45:07+00:00, Turner, 6, 2, EUR, 107.94, 2, [Women's Clothing, Women's Shoes], 22, [ZO0059900599, ZO0381103811], 555222, diane, FEMALE, diane@turner-family.zzz, Sunday, 107.94]

Test query set : SQL queries (first 5 in 215):
SELECT SUBSTRING(`kibana_sample_data_flights`.`OriginWeather`, 1, 1024) AS `OriginWeather` FROM `kibana_sample_data_flights` GROUP BY 1
SELECT SUM(`kibana_sample_data_flights`.`FlightDelayMin`) AS `sum_Offset_ok` FROM `kibana_sample_data_flights` GROUP BY 1
SELECT SUM(`kibana_sample_data_flights`.`FlightDelay`) AS `sum_FlightDelay_ok` FROM `kibana_sample_data_flights` GROUP BY 1
SELECT SUM(`kibana_sample_data_flights`.`DistanceMiles`) AS `sum_DistanceMiles_ok` FROM `kibana_sample_data_flights` GROUP BY 1
SELECT YEAR(`kibana_sample_data_flights`.`timestamp`) AS `yr_timestamp_ok` FROM `kibana_sample_data_flights` GROUP BY 1

=================================

[2020-01-06T11:37:57,996][INFO ][c.a.o.s.c.CorrectnessIT ] [performComparisonTest] Loading test data set...
[2020-01-06T11:38:06,308][INFO ][c.a.o.s.c.CorrectnessIT ] [performComparisonTest] Verifying test queries...
[2020-01-06T11:38:21,180][INFO ][c.a.o.s.c.CorrectnessIT ] [performComparisonTest] Saving test report to disk...
[2020-01-06T11:38:21,202][INFO ][c.a.o.s.c.CorrectnessIT ] [performComparisonTest] Report file location is /Users/xxx/Workspace/sql/reports/report_2020-01-06-19.json
[2020-01-06T11:38:21,204][INFO ][c.a.o.s.c.CorrectnessIT ] [performComparisonTest] Cleaning up test data...
[2020-01-06T11:38:21,849][INFO ][c.a.o.s.c.CorrectnessIT ] [performComparisonTest] Completed comparison test.
```

Specify different test case set by `queries` argument:

```
$ ./gradlew integTestRunner -DtestType=comparison -Dqueries=sanity_integration_tests.txt

...
Test query set : SQL queries (first 5 in 7):
SELECT AvgTicketPrice, Cancelled, Carrier, FlightDelayMin, timestamp FROM kibana_sample_data_flights
SELECT AvgTicketPrice AS avg, Cancelled AS cancel, Carrier AS carrier, FlightDelayMin AS delay, timestamp AS ts FROM kibana_sample_data_flights
SELECT Carrier, AVG(FlightDelayMin) FROM kibana_sample_data_flights GROUP BY Carrier
SELECT Carrier, AVG(FlightDelayMin) FROM kibana_sample_data_flights GROUP BY Carrier HAVING AVG(FlightDelayMin) > 5
SELECT YEAR(timestamp) FROM kibana_sample_data_flights
...
```

Specify external Elasticsearch cluster by `esHost` argument, otherwise an internal Elasticsearch in workspace is in use by default.

```
$ ./gradlew integTestRunner -DtestType=comparison -DesHost=localhost:9200

=================================
Tested Database : localhost:9200
Other Databases :
SQLite = jdbc:sqlite::memory:
H2 = jdbc:h2:mem:test;DB_CLOSE_DELAY=-1
...
```

Specify different databases for comparison. `dbUrl` is for database to be tested. `otherDbUrls` is for other databases whose result set be referenced and compared.

```
$ ./gradlew integTestRunner -DtestType=comparison -Dqueries=sanity_integration_tests.txt -DdbUrl=jdbc:sqlite::memory:

=================================
Tested Database : jdbc:sqlite::memory:
Other Databases :
SQLite = jdbc:sqlite::memory:
H2 = jdbc:h2:mem:test;DB_CLOSE_DELAY=-1
...

$ ./gradlew integTestRunner -DtestType=comparison -Dqueries=sanity_integration_tests.txt -DdbUrl=jdbc:sqlite::memory: -DotherDbUrls=Unknown=jdbc:h2:mem:test;DB_CLOSE_DELAY=-1

=================================
Tested Database : jdbc:sqlite::memory:
Other Databases :
Unknown = jdbc:h2:mem:test
...
```
Binary file added docs/dev/img/how-we-do-comparison-test.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/dev/img/test-framework-components.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/dev/img/the-workflow-of-comparison-test.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.