Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PrestoQueryRunner #7628

Closed
wants to merge 6 commits into from
Closed

Conversation

duanmeng
Copy link
Collaborator

No description provided.

Copy link

netlify bot commented Nov 17, 2023

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 083114b
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/655cbb839ac1bd0008cccbbd

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 17, 2023
@duanmeng duanmeng marked this pull request as draft November 17, 2023 07:57
@duanmeng duanmeng marked this pull request as ready for review November 17, 2023 08:47
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duanmeng Thank you for bringing in PrestoQueryRunner to Velox. Looks good % a question about type parsing.

CC: @majetideepak @pedroerp @kgpai

std::vector<TypePtr> types;
for (const auto& column : response_["columns"]) {
names.push_back(column["name"].asString());
// TODO:types.push_back(parseTypeSignature(column["type"].asString()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we maybe parse the JSON and use logic similar to Type::create to create Type from it?

            "typeSignature": {
                "rawType": "row",
                "typeArguments": [
                    {
                        "rawType": "array",
                        "typeArguments": [
                            {
                                "rawType": "map",
                                "typeArguments": [
                                    {
                                        "rawType": "varchar",
                                        "typeArguments": [],

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type parser #7568 from @majetideepak is about to land, I will add parseType(column["type"].asString()) here to get the Velox type once it is merged.

VELOX_CHECK_EQ(
response.status_code,
200,
"Post to {} failed: {}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Post -> POST

VELOX_CHECK_EQ(
response.status_code,
200,
"Get from {} failed: {}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get -> GET

@duanmeng
Copy link
Collaborator Author

@mbasmanova Hi Masha, I've resolved the comments and used the typeParse API. I started a PrestoHiveRunner instance, enabled the PrestoRunnerTest, and ran, all passed locally. Could you please help to review PR? Thanks.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duanmeng Thank you for working on this. Looks good % a question below. Let's add an .md file to explain how to use PrestoQueryRunner, e.g. provide instructions for someone who would want to try running the test.

CC: @kgpai

PrestoQueryRunner(
std::string coordinatorUri,
std::string user,
const int32_t timeout = 10'000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Drop const.
  • Document this parameter.
  • Perhaps, add units to the name for clarity: timeoutMillis or use an std::chrono type.

Why 10'000? Is this 10 seconds?

Why does the test need to override this to 20'000?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the const, documented this parameter, and used std::chrono type.

Why 10'000? Is this 10 seconds?

I followed the default value used in PrestoDB. I just changed it to 1000ms.

Why does the test need to override this to 20'000?

Removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbasmanova Could we add the document file in the fellow-up PR? I found that I need to make code changes to show how to use PrestoQueryRunner in Velox. I want to add a follow-up PR to make the runner configurable through the gflags along with the document on how to use PrestoQueryRunner. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add the document file in the fellow-up PR?

Sure.

I want to add a follow-up PR to make the runner configurable through the gflags

Wondering why would we want to do that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering why would we want to do that.

I found that I need to make code changes to show how to use PrestoQueryRunner in Velox, say velox_aggregation_fuzzer_test, when I am trying to write the document of how to use PrestoQueryRunner, something like the following. With the follow-up, we could control the creation of the query runner instance to switch runner only through program parameters.

diff --git a/velox/exec/tests/AggregationFuzzerTest.cpp b/velox/exec/tests/AggregationFuzzerTest.cpp
--- a/velox/exec/tests/AggregationFuzzerTest.cpp
+++ b/velox/exec/tests/AggregationFuzzerTest.cpp

+#include "velox/exec/tests/utils/PrestoQueryRunner.h"
-  auto duckQueryRunner =

+  auto queryRunner =
+      std::make_unique<facebook::velox::exec::test::PrestoQueryRunner>("http://127.0.0.1:8080", "hive");

@@ -343,5 +337,5 @@ int main(int argc, char** argv) {
       facebook::velox::exec::test::getCustomInputGenerators();
   options.timestampPrecision =
       facebook::velox::VectorFuzzer::Options::TimestampPrecision::kMilliSeconds;
-  return Runner::run(initialSeed, std::move(duckQueryRunner), options);
+  return Runner::run(initialSeed, std::move(queryRunner), options);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to allow for switching between Duck and Presto query runners in the Fuzzer. Once Presto query runner works, we want to always use that and remove Duck query runner.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once Presto query runner works, we want to always use that and remove Duck query runner.

Got, it. I will highlight this in the follow-up.

@duanmeng
Copy link
Collaborator Author

@duanmeng Thank you for working on this. Looks good % a question below. Let's add an .md file to explain how to use PrestoQueryRunner, e.g. provide instructions for someone who would want to try running the test.

@mbasmanova Hi Masha, where should we add this .md file?

@mbasmanova
Copy link
Contributor

where should we add this .md file?

Perhaps, next to PrestoQueryRunner.h, i.e. PrestoQueryRunner.md.

@@ -288,3 +288,11 @@ add_test(
COMMAND cpr_http_client_test
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR})
target_link_libraries(cpr_http_client_test cpr::cpr gtest gtest_main)

add_executable(presto_query_runner_test PrestoQueryRunnerTest.cpp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a separate executable for his test? We used to have a problem with CI ran out of disk space because all these executables are very large. CC: @kgpai

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbasmanova Removed this executable, and linked the runner test into velox_exec_infra_test.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kgpai Krishna, would you help merge this PR?

Copy link
Contributor

@kgpai kgpai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this work @duanmeng

}
};

TEST_F(PrestoQueryRunnerTest, DISABLED_basic) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you add a comment here that this is disabled because it requires a presto co-ordinator running at localhost etc ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments.

/// method.
/// @param resultType Expected type of the results.
/// @return Data received from Presto.
std::multiset<std::vector<velox::variant>> execute(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe in a future pr we can add in the option to pass in table name, so we can potentially have multiple queries running at same time. This can only run single threaded atm.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, got it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kgpai How about generating a random table name when constructing the PrestoQueryRunner, say, tmp_$randonStringName_$threadId?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep things simple and first change the CI to allow running PrestoQueryRunnerTest, then replace Duck runner in the fuzzer.

@facebook-github-bot
Copy link
Contributor

@kgpai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@kgpai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@kgpai merged this pull request in 19b29f7.

Copy link

Conbench analyzed the 1 benchmark run on commit 19b29f7b.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

facebook-github-bot pushed a commit that referenced this pull request Jan 16, 2024
…tal jobs. (#8314)

Summary:
**What ?**
  Recently support for [PrestoQueryRunner ](#7628 added to AggregationFuzzer, so that it can use Presto as a source of truth.  This PR sets up an experimental job that uses the Presto Java container to run Aggregation Fuzzer with Presto as source of truth.

**Why ?**
  Running the aggregation fuzzer against Presto gives us greater confidence on the correctness of our aggregate functions.

**How ?**
   1. [Previously we created a docker image with support for java and Presto](#7920)
   2. This still requires some changes to get Presto to work with AggregationFuzzer. These changes are listed below.
   3. Adds support for Hive metastore - we do this via a file based metastore  by adding the hive.properties file to $PRESTO_HOME/etc/catalog (see #8111)
   4. We also set the pid file on the launcher to ensure there is only one run of the Presto server
   5. Another thing to note is that we set the timezone to be ['Americas/Los_Angeles' due to details described here ](#8127).

**Testing:**

There is no automated way to test this. Here are manual steps I follow:

1. Create presto-java-container by running `docker-compose build presto-java`
6. Ssh into the presto-java-container
7. Copy Velox sources into the presto-java-container and build Velox
8. Launch Presto service: /opt/start-prestojava.sh
9. Run Aggregation fuzzer with --presto_url=http://127.0.0.1:8080 flag

Example Run: Here is an example run https://github.com/facebookincubator/velox/actions/runs/7508139395/job/20443035814

Pull Request resolved: #8314

Reviewed By: kagamiori

Differential Revision: D52701237

Pulled By: kgpai

fbshipit-source-id: e4ba284aed469bc69cc95c90f708a0c92a040205
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants