forked from apache/kyuubi
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upstream updates #6
Open
nousot-cloud-guy
wants to merge
761
commits into
master
Choose a base branch
from
upstream-updates
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…e and related issues ### _Why are the changes needed?_ Currently, the `KyuubiOperationWithEngineSecuritySuite` is not valid, because 1. `InternalSecurityAccessor` is a singleton, only the first initialized one takes effect, which means if we change the testing orders, some tests may fail. 2. `discoveryClient.startSecretNode` calls `PersistentNode#start` underlying, which is async, we should call `waitForInitialCreate` to ensure it is created before running the test. Base on my analysis, it may take 30s for waiting. (mtime-ctime) ``` [zk: 10.221.106.196:55408(CONNECTED) 2] get /SECRET _ENGINE_SECRET_ cZxid = 0x5 ctime = Wed Jul 19 23:01:57 CST 2023 mZxid = 0x7 mtime = Wed Jul 19 23:02:17 CST 2023 pZxid = 0x5 cversion = 0 dataVersion = 1 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 15 numChildren = 0 ``` ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes apache#5072 from pan3793/security. Closes apache#5072 69cce29 [Cheng Pan] fix 2d62355 [Cheng Pan] fix 74eb2cb [Cheng Pan] fix 6d8f4ce [Cheng Pan] KyuubiOperationWithEngineSecurity Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ to fix ``` SparkDeltaOperationSuite: org.apache.kyuubi.engine.spark.operation.SparkDeltaOperationSuite *** ABORTED *** java.lang.RuntimeException: Unable to load a Suite class org.apache.kyuubi.engine.spark.operation.SparkDeltaOperationSuite that was discovered in the runpath: Not Support spark version (4,0) at org.scalatest.tools.DiscoverySuite$.getSuiteInstance(DiscoverySuite.scala:80) at org.scalatest.tools.DiscoverySuite.$anonfun$nestedSuites$1(DiscoverySuite.scala:38) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.IterableLike.foreach(IterableLike.scala:74) at scala.collection.IterableLike.foreach$(IterableLike.scala:73) at scala.collection.AbstractIterable.foreach(Iterable.scala:56) at scala.collection.TraversableLike.map(TraversableLike.scala:286) ... Cause: java.lang.IllegalArgumentException: Not Support spark version (4,0) at org.apache.kyuubi.engine.spark.WithSparkSQLEngine.$init$(WithSparkSQLEngine.scala:42) at org.apache.kyuubi.engine.spark.operation.SparkDeltaOperationSuite.<init>(SparkDeltaOperationSuite.scala:25) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at java.lang.Class.newInstance(Class.java:442) at org.scalatest.tools.DiscoverySuite$.getSuiteInstance(DiscoverySuite.scala:66) at org.scalatest.tools.DiscoverySuite.$anonfun$nestedSuites$1(DiscoverySuite.scala:38) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) ... ``` ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes apache#5075 from cfmcgrady/spark-4.0. Closes apache#5075 ad38c0d [Fu Chen] refine test to adapt Spark 4.0 Authored-by: Fu Chen <cfmcgrady@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
…ess to allows it release temp files ### _Why are the changes needed?_ fix bug apache#5065 ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes apache#5066 from ASiegeLion/master. Closes apache#5065 08d1ac0 [Cheng Pan] Update kyuubi-server/src/main/scala/org/apache/kyuubi/engine/ProcBuilder.scala bf908f5 [Cheng Pan] Update kyuubi-server/src/main/scala/org/apache/kyuubi/engine/ProcBuilder.scala 9144582 [Cheng Pan] Update kyuubi-server/src/main/scala/org/apache/kyuubi/engine/ProcBuilder.scala f1c95e4 [liupeiyue] [KYUUBI-apache#5065] Call destroy first on killing Spark startup process to allows it release temp files 907123a [liupeiyue] [KYUUBI-apache#5065] Call destroy first on killing Spark startup process to allows it release temp files f30a9fc [liupeiyue] [KYUUBI-apache#5065] Call destroy first on killing Spark startup process to allows it release temp files 449be44 [文艺攻城狮] Update kyuubi-server/src/main/scala/org/apache/kyuubi/engine/ProcBuilder.scala 987ffc7 [文艺攻城狮] Update kyuubi-common/src/main/scala/org/apache/kyuubi/config/KyuubiConf.scala 995386f [文艺攻城狮] Update kyuubi-common/src/main/scala/org/apache/kyuubi/config/KyuubiConf.scala ad3d111 [liupeiyue] [KYUUBI-apache#5065]destroy the spark engine release the submitted temp files Lead-authored-by: liupeiyue <liupeiyue@yy.com> Co-authored-by: 文艺攻城狮 <945076608@qq.com> Co-authored-by: Cheng Pan <pan3793@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ Close apache#5009 When Kyuubi Server Log is Huge, it's difficult to find `Spark Engine Log Path` in logs. Here pass the path to spark conf, user can find engine log path in spark ui or spark history server. Submit Command Like: ```shell XXXX/bin/spark-submit \ --class org.apache.kyuubi.engine.spark.SparkSQLEngine \ --conf spark.kyuubi.engine.engineLog.path=XXXX/kyuubi-spark-sql-engine.log.0 \ --proxy-user kyuubi XXXX/target/kyuubi-spark-sql-engine_2.12-1.8.0-SNAPSHOT.jar ``` ### _How was this patch tested?_ - [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes apache#5011 from zwangsheng/KYUUBI_5009. Closes apache#5009 36c7722 [zwangsheng] fix compile 1c20f92 [zwangsheng] retest 70568c7 [zwangsheng] Fix Unit Test 2bc4657 [zwangsheng] try to fix unit test 2197b35 [zwangsheng] Narrow the scope of access a44eefc [zwangsheng] [KYUUBI apache#5009]Pass Spark Engine Log Path to Spark COnf Authored-by: zwangsheng <2213335496@qq.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
…e schema ### _Why are the changes needed?_ This is required by Batch V2, as it allows the batch job queued in metastore before being picked by Kyuubi Server for scheduling. ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [x] Add screenshots for manual tests if appropriate ``` mysql> CREATE TABLE IF NOT EXISTS metadata( -> key_id bigint PRIMARY KEY AUTO_INCREMENT COMMENT 'the auto increment key id', -> identifier varchar(36) NOT NULL COMMENT 'the identifier id, which is an UUID', -> session_type varchar(32) NOT NULL COMMENT 'the session type, SQL or BATCH', -> real_user varchar(255) NOT NULL COMMENT 'the real user', -> user_name varchar(255) NOT NULL COMMENT 'the user name, might be a proxy user', -> ip_address varchar(128) COMMENT 'the client ip address', -> kyuubi_instance varchar(1024) NOT NULL COMMENT 'the kyuubi instance that creates this', -> state varchar(128) NOT NULL COMMENT 'the session state', -> resource varchar(1024) COMMENT 'the main resource', -> class_name varchar(1024) COMMENT 'the main class name', -> request_name varchar(1024) COMMENT 'the request name', -> request_conf mediumtext COMMENT 'the request config map', -> request_args mediumtext COMMENT 'the request arguments', -> create_time BIGINT NOT NULL COMMENT 'the metadata create time', -> engine_type varchar(32) NOT NULL COMMENT 'the engine type', -> cluster_manager varchar(128) COMMENT 'the engine cluster manager', -> engine_open_time bigint COMMENT 'the engine open time', -> engine_id varchar(128) COMMENT 'the engine application id', -> engine_name mediumtext COMMENT 'the engine application name', -> engine_url varchar(1024) COMMENT 'the engine tracking url', -> engine_state varchar(32) COMMENT 'the engine application state', -> engine_error mediumtext COMMENT 'the engine application diagnose', -> end_time bigint COMMENT 'the metadata end time', -> peer_instance_closed boolean default '0' COMMENT 'closed by peer kyuubi instance', -> UNIQUE INDEX unique_identifier_index(identifier), -> INDEX user_name_index(user_name), -> INDEX engine_type_index(engine_type) -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4; Query OK, 0 rows affected (0.04 sec) mysql> ALTER TABLE metadata MODIFY kyuubi_instance varchar(1024) COMMENT 'the kyuubi instance that creates this'; Query OK, 0 rows affected (0.05 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> SHOW CREATE TABLE metadata; mysql> SHOW CREATE TABLE metadata; +----------+---------------------------------------------------------------------------+ | Table | Create Table | +----------+---------------------------------------------------------------------------+ | metadata | CREATE TABLE `metadata` ( `key_id` bigint NOT NULL AUTO_INCREMENT COMMENT 'the auto increment key id', `identifier` varchar(36) NOT NULL COMMENT 'the identifier id, which is an UUID', `session_type` varchar(32) NOT NULL COMMENT 'the session type, SQL or BATCH', `real_user` varchar(255) NOT NULL COMMENT 'the real user', `user_name` varchar(255) NOT NULL COMMENT 'the user name, might be a proxy user', `ip_address` varchar(128) DEFAULT NULL COMMENT 'the client ip address', `kyuubi_instance` varchar(1024) DEFAULT NULL COMMENT 'the kyuubi instance that creates this', `state` varchar(128) NOT NULL COMMENT 'the session state', `resource` varchar(1024) DEFAULT NULL COMMENT 'the main resource', `class_name` varchar(1024) DEFAULT NULL COMMENT 'the main class name', `request_name` varchar(1024) DEFAULT NULL COMMENT 'the request name', `request_conf` mediumtext COMMENT 'the request config map', `request_args` mediumtext COMMENT 'the request arguments', `create_time` bigint NOT NULL COMMENT 'the metadata create time', `engine_type` varchar(32) NOT NULL COMMENT 'the engine type', `cluster_manager` varchar(128) DEFAULT NULL COMMENT 'the engine cluster manager', `engine_open_time` bigint DEFAULT NULL COMMENT 'the engine open time', `engine_id` varchar(128) DEFAULT NULL COMMENT 'the engine application id', `engine_name` mediumtext COMMENT 'the engine application name', `engine_url` varchar(1024) DEFAULT NULL COMMENT 'the engine tracking url', `engine_state` varchar(32) DEFAULT NULL COMMENT 'the engine application state', `engine_error` mediumtext COMMENT 'the engine application diagnose', `end_time` bigint DEFAULT NULL COMMENT 'the metadata end time', `peer_instance_closed` tinyint(1) DEFAULT '0' COMMENT 'closed by peer kyuubi instance', PRIMARY KEY (`key_id`), UNIQUE KEY `unique_identifier_index` (`identifier`), KEY `user_name_index` (`user_name`), KEY `engine_type_index` (`engine_type`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci | +----------+---------------------------------------------------------------------------+ 1 row in set (0.00 sec) mysql> ``` The derby SQL also is tested <img width="1330" alt="image" src="https://github.com/apache/kyuubi/assets/26535726/4eef0742-05dd-4bd6-a77e-e9de0238375e"> - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes apache#5078 from pan3793/nullable. Closes apache#5078 0c5dec8 [Cheng Pan] Make kyuubi_instance nullable in metadata table schema Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ This is a pure code refactor extracted from apache#4790 to reduce the diff. ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes apache#5081 from pan3793/dialect. Closes apache#5081 537d623 [Cheng Pan] Minor refactor JDBCMetadataStore Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>
…fe during bootstrap ### _Why are the changes needed?_ As titled. ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes apache#5082 from link3280/KYUUBI-5080. Closes apache#5080 e8026b8 [Paul Lin] [KYUUBI apache#4806][FLINK] Improve logs fd78f32 [Paul Lin] [KYUUBI apache#4806][FLINK] Fix gateway NPE a0a7c44 [Cheng Pan] Update externals/kyuubi-flink-sql-engine/src/main/java/org/apache/flink/client/deployment/application/executors/EmbeddedExecutorFactory.java 50830d4 [Paul Lin] [KYUUBI apache#5080][FLINK] Fix EmbeddedExecutorFactory not thread-safe during bootstrap Lead-authored-by: Paul Lin <paullin3280@gmail.com> Co-authored-by: Cheng Pan <pan3793@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ Updated [kyuubi on kubernetes config section](https://kyuubi.readthedocs.io/en/master/deployment/kyuubi_on_kubernetes.html#config) to state <code> Kyuubi **does** not recommend using this way on Kubernetes</code> ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes apache#5086 from mans2singh/ISSUE-5085. Closes apache#5086 5faf0df [mans2singh] [KYUUBI # 5085] Update config section based on review comments df9f62f [mans2singh] [KYUUBI # 5085] Update config section of deploy on kubernetes Authored-by: mans2singh <mans2singh@yahoo.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
… blank lines in Windows ### _Why are the changes needed?_ close apache#5090 ### _How was this patch tested?_ After this PR it generates normal settings file in windows. - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes apache#5091 from wForget/KYUUBI-5090. Closes apache#5090 9e974c7 [wforget] fix dc1ebfc [wforget] fix 2cbec60 [wforget] [KYUUBI-5090] Fix AllKyuubiConfiguration to generate redundant blank lines in Windows ecc3b4a [mans2singh] [KYUUBI apache#5086] [KYUUBI # 5085] Update config section of deploy on kubernetes Lead-authored-by: wforget <643348094@qq.com> Co-authored-by: mans2singh <mans2singh@yahoo.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
…edundant version comparison methods ### _Why are the changes needed?_ - Support initializing or comparing version with major version only, e.g "3" equivalent to "3.0" - Remove redundant version comparison methods by using semantic versions of Spark, Flink and Kyuubi - adding common `toDouble` method ### _How was this patch tested?_ - [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes apache#5039 from bowenliang123/improve-semanticversion. Closes apache#5039 b686826 [liangbowen] nit d39646b [liangbowen] SPARK_ENGINE_RUNTIME_VERSION 9148caa [liangbowen] use semantic versions ecc3b4a [mans2singh] [KYUUBI apache#5086] [KYUUBI # 5085] Update config section of deploy on kubernetes Lead-authored-by: liangbowen <liangbowen@gf.com.cn> Co-authored-by: mans2singh <mans2singh@yahoo.com> Signed-off-by: liangbowen <liangbowen@gf.com.cn>
### _Why are the changes needed?_ ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes apache#5094 from dev-lpq/add_python_doc. Closes apache#5094 c7d50d7 [pengqli] upgrade Python-JayDeBeApi doc 41f96fc [pengqli] upgrade Python-JayDeBeApi doc dd0f91b [pengqli] upgrade Python-JayDeBeApi doc ae1b7bc [pengqli] upgrade Python-JayDeBeApi doc 189d7c8 [pengqli] upgrade Python-JayDeBeApi doc 2e1e7b4 [pengqli] upgrade Python-JayDeBeApi doc 362a432 [pengqli] add Python-JayDeBeApi doc Authored-by: pengqli <pengqli@cisco.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ - Remove the existing single quote in message format which causes the argument 0 is not used - `A single quote itself must be represented by doubled single quotes '' throughout a String.` https://docs.oracle.com/javase/8/docs/api/java/text/MessageFormat.html ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes apache#5100 from bowenliang123/datatype-msg. Closes apache#5100 8135ff1 [liangbowen] fix Authored-by: liangbowen <liangbowen@gf.com.cn> Signed-off-by: liangbowen <liangbowen@gf.com.cn>
### _Why are the changes needed?_ - remove 2 unused string builders in `KyuubiQueryResultSet` and `KyuubiArrowQueryResultSet`, which are only appended separator only and never queried again ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes apache#5101 from bowenliang123/unused-sb. Closes apache#5101 ccb6fb7 [liangbowen] remove never queried StringBuilders Authored-by: liangbowen <liangbowen@gf.com.cn> Signed-off-by: liangbowen <liangbowen@gf.com.cn>
### _Why are the changes needed?_ close apache#5099 ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes apache#5103 from lsm1/features/kyuubi_5099. Closes apache#5099 84a1eca [senmiaoliu] fix doc Authored-by: senmiaoliu <senmiaoliu@trip.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
… engine timeout ### _Why are the changes needed?_ apache#5065 ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes apache#5097 from ASiegeLion/master. Closes apache#5065 d50a388 [Cheng Pan] followup 80861dd [liupeiyue] [KYUUBI apache#5065][FOLLOWUP] Graceful close the process when launch engine timeout Lead-authored-by: liupeiyue <liupeiyue@yy.com> Co-authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>
…bi server ### _Why are the changes needed?_ As reported by apache#4825, a large number of engine builder processes may cause high machine load on the kyuubi server, So I want to add a config to limit engine creation concurrency. ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes apache#5089 from wForget/engine_builder_limit. Closes apache#5089 7750700 [wforget] comment 774a859 [wforget] comments 373640f [wforget] Limit maximum engine creation concurrency of kyuubi server ecc3b4a [mans2singh] [KYUUBI apache#5086] [KYUUBI # 5085] Update config section of deploy on kubernetes Lead-authored-by: wforget <643348094@qq.com> Co-authored-by: mans2singh <mans2singh@yahoo.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ close apache#5076 ### _How was this patch tested?_ - [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes apache#5102 from lsm1/features/kyuubi_5076. Closes apache#5076 ce7cfe6 [senmiaoliu] kdf support engine url Authored-by: senmiaoliu <senmiaoliu@trip.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ As titled. ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes apache#5107 from link3280/engine_fatal_log. Closes apache#5106 db45392 [Paul Lin] [KYUUBI apache#5106][Flink] Improve logs for fatal errors Authored-by: Paul Lin <paullin3280@gmail.com> Signed-off-by: Paul Lin <paullin3280@gmail.com>
### _Why are the changes needed?_ #### How is it done today? The current procedure of Batch Job API, called V1 ##### CREATE batch job procedure in Batch V1 ```mermaid sequenceDiagram participant Client participant Server participant Metastore participant RM Client ->> Server : Create Batch Job Server ->> Server : Create Batch Operator Server ->> Metastore : Persist Job metadata (PENDING) Server ->> Server : Put Batch Operator into Execution thread pool Server ->> Client : Batch Job Info Server ->> RM : Submit Applicition (in Execution thread pool) loop Application Check Server ->> RM : Query Application Status Server ->> Metastore : Update Batch Status end ``` ##### GET batch job info procedure in Batch V1 ```mermaid sequenceDiagram participant Client participant Server participant Metastore participant RM Client ->> Server : Query Batch Job Info alt KyuubiInstance matched Server ->> Client : Batch Job Info else Server ->> Server : Forward Request to expected KyuubiInstance end ``` <!-- ```mermaid sequenceDiagram participant Client participant Server participant Metastore participant RM Client ->> Server : Fetch Batch Job logs alt KyuubiInstance matched Server ->> Client : Batch Job logs else Server ->> Server : Forward Request to expected KyuubiInstance end Client ->> Server : Close Batch Job alt KyuubiInstance matched Server ->> RM : Close the Application Server ->> Metastore : Update Batch Status Server ->> Client : Closed Batch Job Info else Server ->> Server : Forward Request to expected KyuubiInstance end ``` --> #### What is new in your approach? This PR proposes a new way for batch job submission, called V2 ##### CREATE batch job procedure in Batch V2 ```mermaid sequenceDiagram participant Client participant Server participant Metastore participant RM Client ->> Server : Create Batch Job Server ->> Metastore : Persist Job metadata (INITIALIZED) Server ->> Client : Batch Job Info loop Forever in dedicated thread pool Server ->> Metastore : Pick up and lock INITIALIZED job Server ->> RM : Submit Application Server ->> RM : Query Application Status Server ->> Metastore : Update Batch Status end ``` ##### GET batch job info procedure in Batch V2 ```mermaid sequenceDiagram participant Client participant Server participant Metastore participant RM Client ->> Server : Query Batch Job Info Server ->> Metastore : Query Batch Job Info Server ->> Client : Batch Job Info ``` <!-- ```mermaid sequenceDiagram participant Client participant Server participant Metastore participant RM Client ->> Server : Fetch Batch Job logs alt KyuubiInstance matched Server ->> Client : Batch Job logs else Server ->> Server : Forward Request to expected KyuubiInstance end Client ->> Server : Close Batch Job alt KyuubiInstance matched Server ->> RM : Close the Application Server ->> Metastore : Update Batch Status Server ->> Client : Closed Batch Job Info else Server ->> Server : Forward Request to expected KyuubiInstance end ``` --> #### What are the limits of current practice, and why do you think it will be successful? Pros: 1. The CREATE request becomes light and returns faster. In V1, we have struggled with whether the response should wait for the engine to be submitted to RM, and how to report the un-submitted job status to the client; in V2, the CREATE request just simply inserts a new record into metastore and returns w/ INITIALIZED state. 2. In common practice, Kyuubi server cluster is deployed behind the load balancer, and the load balancer does not know the real load of each Kyuubi server, suppose it uses Random/RoundRobbin/IPHash policies to forward requests, the existing Batch V1 implementation may cause some Kyuubi servers in high load but others' load are low, because it always uses the requested Kyuubi server to do batch submission; in V2, the Kyuubi server is easy to know the load of itself, e.g. measure by CPU/memory usage, or active batch sessions, and then decides to pick up new batch jobs or not. Besides, when all Kyuubi servers overload, the V1 cannot benefit immediately even if the admin scale up the cluster size. 3. In V1, the metrics are almost independent in each Kyuubi server; in V2, it's easy to expose global metrics of batch jobs when using sharable storage as metastore backend, e.g. we can easily get how many batches are queued in metastore, and how many batches are managed by each Kyuubi server, by querying the metastore backend directly or metrics exposed by each Kyuubi server. Cons: 1. V1 assumes Kyuubi server tolerant long time outage of metastore, V2 forcibly depends on the availability of metastore. But we can move the existing forwarding logic and async retry logic to the implementation of `Metastore` to overcome this regression. ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request Closes apache#4790 from pan3793/batch-v2. Closes apache#4790 860698a [Cheng Pan] BATCH_IMPL_VERSION b9c68aa [Cheng Pan] kyuubi.batch.impl.version 17e4f19 [Cheng Pan] submitter.threads=100 7c0bdb0 [Cheng Pan] Initial implement Batch v2 Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ I'd like to update LDAP doc to guide users for setup LDAP authentication in Kyuubi. ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [x] Add screenshots for manual tests if appropriate <img width="1395" alt="image" src="https://github.com/apache/kyuubi/assets/26535726/6925a8e3-dfaf-48ad-a442-bb635fe75830"> - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes apache#5083 from zhaohehuhu/Improvement-0721. Closes apache#5083 8c0e149 [Cheng Pan] polish 22f8d3a [Cheng Pan] nit 822fa66 [hezhao2] sync 78ae123 [hezhao2] further explanation for LDAP filters 7ebc61a [Cheng Pan] Update docs/security/ldap.md bb06810 [Cheng Pan] Update docs/security/ldap.md 8d19fdf [Cheng Pan] Update docs/security/ldap.md c2fa280 [Cheng Pan] Update docs/security/ldap.md 2acbb87 [hezhao2] update LDAP doc 22027e1 [hezhao2] update LDAP doc Lead-authored-by: hezhao2 <hezhao2@cisco.com> Co-authored-by: Cheng Pan <pan3793@gmail.com> Co-authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>
…gine bootstrap ### _Why are the changes needed?_ As titled. ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes apache#5109 from link3280/bootstrap_file_not_found. Closes apache#5108 318199f [Paul Lin] [KYUUBI apache#5108][Flink] Fix iFileNotFoundException during Flink engine bootstrap Authored-by: Paul Lin <paullin3280@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ Fix apache#3920 https://github.com/apache/kyuubi/actions/runs/5711863703/job/15474230690?pr=4790 ``` DockerizedZkServiceDiscoverySuite: - distribute lock *** FAILED *** Expected exception org.apache.kyuubi.KyuubiSQLException to be thrown, but no exception was thrown (DiscoveryClientTests.scala:147) ``` ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes apache#5112 from pan3793/test-lock. Closes apache#3920 d980f87 [Cheng Pan] Fix flaky test - distribute lock Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ - remove duplicated assignment for the same variable in adjacent lines in `FastHiveDecimalImpl` - replace redundant `putAll` with collection initialization in `BatchRestApi` - use `try-with-resources` statement with the reader and avoid declaring two variables in the same line of code in `KyuubiCommands` - fix `warning: Tag 'return:' is not recognised` compilation warning in `KyuubiGetSqlClassification:L53` ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes apache#5117 from bowenliang123/fastsignum. Closes apache#5117 595b574 [liangbowen] simplify be530fa [liangbowen] fix warning: Tag '@return:' is not recognised compilation warning in KyuubiGetSqlClassification:L53 2497069 [liangbowen] use try-with-resources in KyuubiCommands a54a97f [liangbowen] remove redundant addAll call to collection initialization cc76d5d [liangbowen] remove repeated assignment Authored-by: liangbowen <liangbowen@gf.com.cn> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ It was planned but actually delayed, remove this dummy module to save CI and avoid confusing users and release managers. ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes apache#5113 from pan3793/remove-kudu. Closes apache#5113 ff8fd2e [Cheng Pan] Remove Spark Kudu connector Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ close apache#4940 ### _How was this patch tested?_ - [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes apache#5110 from lsm1/features/kyuubi_4940. Closes apache#4940 6c0a9a3 [senmiaoliu] add kdf for hive engine Authored-by: senmiaoliu <senmiaoliu@trip.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ https://hadoop.apache.org/release/3.3.6.html ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes apache#5116 from pan3793/hadoop-3.3.6. Closes apache#5116 c3717e7 [Cheng Pan] Bump Hadoop 3.3.6 Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ Use StatefulSet instead of Deployment, add a headless service for statefulset ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [x] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request ![image](https://github.com/apache/kyuubi/assets/3177898/0991c287-cf1a-40f1-8e50-3934bd2886ca) ![image](https://github.com/apache/kyuubi/assets/3177898/9a5d11a5-2ac9-468e-bfcb-9a070f54c6b4) Closes apache#5062 from camper42/statefulset. Closes apache#4788 a1a7f1b [camper42] style: remove redudant Global variable `$` 5286f4f [camper42] fix: set statefulset podManagementPolicy ed83ae2 [camper42] style: move headless service to separate file 97b76ea [camper42] use `clusterIP: None` for headless serivce d2078ff [Cheng Pan] Update charts/kyuubi/templates/kyuubi-statefulset.yaml 35c7e0f [Cheng Pan] Update charts/kyuubi/templates/kyuubi-statefulset.yaml 8d970d2 [camper42] style: indent 3cf2274 [camper42] [KYUUBI apache#4788][K8S][HELM] Use StatefulSet instead of Deployment Lead-authored-by: camper42 <camper.xlii@gmail.com> Co-authored-by: Cheng Pan <pan3793@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ close apache#5122 ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes apache#5125 from lsm1/features/kyuubi_5122. Closes apache#5122 02d0769 [senmiaoliu] add hive kdf docs Authored-by: senmiaoliu <senmiaoliu@trip.com> Signed-off-by: liangbowen <liangbowen@gf.com.cn>
### _Why are the changes needed?_ In Batch implementation v2, the following query is frequently executed to pick the job. ``` SELECT identifier FROM metadata WHERE state='INITIALIZED' ORDER BY create_time DESC LIMIT 1 ``` Create an index for `create_time` could speed up the query and reduce the pressure on MySQL server. ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [x] Add screenshots for manual tests if appropriate Test the MySQL upgrading SQLs ``` mysql> CREATE TABLE IF NOT EXISTS metadata( -> key_id bigint PRIMARY KEY AUTO_INCREMENT COMMENT 'the auto increment key id', -> identifier varchar(36) NOT NULL COMMENT 'the identifier id, which is an UUID', -> session_type varchar(32) NOT NULL COMMENT 'the session type, SQL or BATCH', -> real_user varchar(255) NOT NULL COMMENT 'the real user', -> user_name varchar(255) NOT NULL COMMENT 'the user name, might be a proxy user', -> ip_address varchar(128) COMMENT 'the client ip address', -> kyuubi_instance varchar(1024) NOT NULL COMMENT 'the kyuubi instance that creates this', -> state varchar(128) NOT NULL COMMENT 'the session state', -> resource varchar(1024) COMMENT 'the main resource', -> class_name varchar(1024) COMMENT 'the main class name', -> request_name varchar(1024) COMMENT 'the request name', -> request_conf mediumtext COMMENT 'the request config map', -> request_args mediumtext COMMENT 'the request arguments', -> create_time BIGINT NOT NULL COMMENT 'the metadata create time', -> engine_type varchar(32) NOT NULL COMMENT 'the engine type', -> cluster_manager varchar(128) COMMENT 'the engine cluster manager', -> engine_open_time bigint COMMENT 'the engine open time', -> engine_id varchar(128) COMMENT 'the engine application id', -> engine_name mediumtext COMMENT 'the engine application name', -> engine_url varchar(1024) COMMENT 'the engine tracking url', -> engine_state varchar(32) COMMENT 'the engine application state', -> engine_error mediumtext COMMENT 'the engine application diagnose', -> end_time bigint COMMENT 'the metadata end time', -> peer_instance_closed boolean default '0' COMMENT 'closed by peer kyuubi instance', -> UNIQUE INDEX unique_identifier_index(identifier), -> INDEX user_name_index(user_name), -> INDEX engine_type_index(engine_type) -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4; Query OK, 0 rows affected (0.03 sec) mysql> ALTER TABLE metadata MODIFY kyuubi_instance varchar(1024) COMMENT 'the kyuubi instance that creates this'; Query OK, 0 rows affected (0.06 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> ALTER TABLE metadata ADD INDEX create_time_index(create_time); Query OK, 0 rows affected (0.03 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> show create table metadata; +----------+--------------------------------------------------------------------------------+ | Table | Create Table | +----------+--------------------------------------------------------------------------------+ | metadata | CREATE TABLE `metadata` ( `key_id` bigint NOT NULL AUTO_INCREMENT COMMENT 'the auto increment key id', `identifier` varchar(36) NOT NULL COMMENT 'the identifier id, which is an UUID', `session_type` varchar(32) NOT NULL COMMENT 'the session type, SQL or BATCH', `real_user` varchar(255) NOT NULL COMMENT 'the real user', `user_name` varchar(255) NOT NULL COMMENT 'the user name, might be a proxy user', `ip_address` varchar(128) DEFAULT NULL COMMENT 'the client ip address', `kyuubi_instance` varchar(1024) DEFAULT NULL COMMENT 'the kyuubi instance that creates this', `state` varchar(128) NOT NULL COMMENT 'the session state', `resource` varchar(1024) DEFAULT NULL COMMENT 'the main resource', `class_name` varchar(1024) DEFAULT NULL COMMENT 'the main class name', `request_name` varchar(1024) DEFAULT NULL COMMENT 'the request name', `request_conf` mediumtext COMMENT 'the request config map', `request_args` mediumtext COMMENT 'the request arguments', `create_time` bigint NOT NULL COMMENT 'the metadata create time', `engine_type` varchar(32) NOT NULL COMMENT 'the engine type', `cluster_manager` varchar(128) DEFAULT NULL COMMENT 'the engine cluster manager', `engine_open_time` bigint DEFAULT NULL COMMENT 'the engine open time', `engine_id` varchar(128) DEFAULT NULL COMMENT 'the engine application id', `engine_name` mediumtext COMMENT 'the engine application name', `engine_url` varchar(1024) DEFAULT NULL COMMENT 'the engine tracking url', `engine_state` varchar(32) DEFAULT NULL COMMENT 'the engine application state', `engine_error` mediumtext COMMENT 'the engine application diagnose', `end_time` bigint DEFAULT NULL COMMENT 'the metadata end time', `peer_instance_closed` tinyint(1) DEFAULT '0' COMMENT 'closed by peer kyuubi instance', PRIMARY KEY (`key_id`), UNIQUE KEY `unique_identifier_index` (`identifier`), KEY `user_name_index` (`user_name`), KEY `engine_type_index` (`engine_type`), KEY `create_time_index` (`create_time`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci | +----------+--------------------------------------------------------------------------------+ ``` - [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes apache#5131 from pan3793/metastore-create-time-index. Closes apache#5131 fc18041 [Cheng Pan] ALTER TABLE ADD INDEX c2261ed [Cheng Pan] update upgrade script 4f94be5 [Cheng Pan] Create index on metastore.create_time Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ Otherwise we can not see JDK logs like Krb5. ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes apache#5129 from pan3793/beeline-log. Closes apache#5129 1000948 [Cheng Pan] KyuubiBeeline should redirect JDK logging Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ After performing binary distribution artifacts packaging during 1.8.0-rc0 ```patch diff --git a/kyuubi-server/web-ui/pnpm-lock.yaml b/kyuubi-server/web-ui/pnpm-lock.yaml index 83754291b..f25c02de7 100644 --- a/kyuubi-server/web-ui/pnpm-lock.yaml +++ b/kyuubi-server/web-ui/pnpm-lock.yaml -1,4 +1,4 -lockfileVersion: '6.0' +lockfileVersion: '6.1' settings: autoInstallPeers: true ``` The inconsistency may be caused by different version install in the local environment and defined in `pom.xml`, I'm not sure if there is a version management system for pnpm ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [x] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request ### _Was this patch authored or co-authored using generative AI tooling?_ No Closes apache#5569 from pan3793/pnpm-lock. Closes apache#5569 8a09870 [Cheng Pan] Fix pnpm-lock file version Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>
…ity.enabled` to add HTTP auth header ### _Why are the changes needed?_ `kyuubi.engine.security.enabled` aims to control whether enabled security mechanism internal communication, but the current implementation is not symmetrical, the auth generator ignores the conf and always produces the auth header, but the auth header handler is only activated when conf is enabled, that causes authentication failure when `kyuubi.engine.security.enabled=false`(default value) ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request ### _Was this patch authored or co-authored using generative AI tooling?_ No. Closes apache#5566 from pan3793/none-auth. Closes apache#5566 d42a4c3 [Cheng Pan] Revert "Extract AnonymousAuthenticationHandler from BasicAuthenticationHandler" b544343 [Cheng Pan] Extract AnonymousAuthenticationHandler from BasicAuthenticationHandler 75c4b7d [Cheng Pan] InternalRestClient respects `kyuubi.engine.security.enabled` to add HTTP auth header Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>
…edup ### _Why are the changes needed?_ 1. This PR fixes the precision loss issue in `xx_gmt_offset`. Please note that since `xx_gmt_offset` is of integer type, there is no actual loss of precision. ``` trino:tiny> select cc_gmt_offset from call_center ; cc_gmt_offset --------------- -5.00 -5.00 ``` Before this PR: ```scala scala> spark.sql("select cc_gmt_offset from tpcds.tiny.call_center").show +-------------+ |cc_gmt_offset| +-------------+ | -5| | -5| +-------------+ ``` After this PR: ```scala scala> spark.sql("select cc_gmt_offset from tpcds.tiny.call_center").show +-------------+ |cc_gmt_offset| +-------------+ | -5.00| | -5.00| +-------------+ ``` 2. This PR accelerates the generation of the TPC-DS dataset by optimizing the way Rows are generated. Before this PR, The previous process involved converting **Trino TableRow** into **String Row** and then further into **Spark InternalRow**. After this PR, we have streamlined the process by directly converting **Trino TableRow** into **Spark InternalRow**, eliminating unnecessary toString operations. This change significantly improves the speed of TPC-DS dataset generation. ```scala spark.table("tpcds.sf1000.catalog_sales").foreach(r => ()) ``` Task Duration before this PR: ![截屏2023-10-30 下午4 04 12](https://github.com/apache/kyuubi/assets/8537877/69bd9938-2886-4044-99b8-79ed20d4791c) Task Duration after this PR: ![截屏2023-10-30 下午4 02 08](https://github.com/apache/kyuubi/assets/8537877/ddfe01a9-081c-41b5-b82c-a0934dd8686c) ### _How was this patch tested?_ - New UT `tpcds.tiny count and checksum` - Compare checksum values before and after this PR on the 1TB dataset | table_name | count | checksum | |------------------------|-----------------|---------------------------| | call_center | 42 | 95607401475 | | catalog_page | 30000 | 64470199469085 | | catalog_returns | 143996756 | 309202327050775220 | | catalog_sales | 1439980416 | 3092267266923848000 | | customer | 12000000 | 25769069905636795 | | customer_address | 6000000 | 12889423380880973 | | customer_demographics | 1920800 | 4124183189708148 | | date_dim | 73049 | 156926081012862 | | household_demographics | 7200 | 15494873325812 | | income_band | 20 | 41180951007 | | inventory | 783000000 | 1681487454682584456 | | item | 300000 | 643000708260945 | | promotion | 1500 | 3270935493709 | | reason | 65 | 118806664977 | | ship_mode | 20 | 52349078860 | | store | 1002 | 2096408105720 | | store_returns | 287999764 | 618451374856897114 | | store_sales | 2879987999 | 6184670571185100839 | | time_dim | 86400 | 186045071019485 | | warehouse | 20 | 31374161844 | | web_page | 3000 | 6502456139647 | | web_returns | 71997522 | 154614570845312413 | | web_sales | 720000376 | 1546188452223821591 | | web_site | 54 | 107485781738 | ### _Was this patch authored or co-authored using generative AI tooling?_ No Closes apache#5562 from cfmcgrady/tpcds-perf. Closes apache#5550 a789b9e [Fu Chen] maxPartitionBytes=384m 659e209 [Fu Chen] style 916f6d2 [Fu Chen] unnecessary change 75981af [Fu Chen] tpcds perf Authored-by: Fu Chen <cfmcgrady@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ Usually, we can use `spark.sql.shuffle.partitions` to configure the number of shuffle partitions (or `spark.sql.adaptive.coalescePartitions.initialPartitionNum` for AQE). However, it seems difficult to find a universal value for all SQL jobs. Although Spark AQE can dynamically merge and split partitions based on partition size, inappropriate shuffle partitions may still cause some problems: + When there are too few shuffle partitions, the join skew optimization threshold is large and the skew partitions will not be split. + When using RemoteShuffleService, an inappropriate number of shuffle partitions may result in too large partitions or too many partitions, which will lead to high pressure on the shuffle server. So I want to provide an optimization rule to dynamically adjust the number of partitions based on the size of the input data. Calculate the number of partitions based on input data size: ``` targetShufflePartitions = sum(scanSize|shuffleReadSize) / advisoryPartitionSizeInBytes ``` then replace the number of partitions for all `ShuffleExchangeExec` nodes. ### _How was this patch tested?_ - [X] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request ### _Was this patch authored or co-authored using generative AI tooling?_ No Closes apache#5489 from wForget/dynamic_shuffle_partitions. Closes apache#5489 5a2bb6c [wforget] only takes effect when aqe is enabled 038b7bb [wforget] moved behind InsertShuffleNodeBeforeJoin 7ca87d8 [wforget] comment d65047f [wforget] sum scanSizes e4d8f33 [wforget] comments 4f0f25d [wforget] configurable f77d1d6 [wforget] code style 0bf572f [wforget] use partition stats 8d251c3 [wforget] Adjust shuffle partitions dynamically Authored-by: wforget <643348094@qq.com> Signed-off-by: ulyssesyou <ulyssesyou@apache.org>
…at have been verified ### _Why are the changes needed?_ To close apache#5503 For sql such as lateral join in test `[KYUUBI apache#5503][AUTHZ] Check plan auth checked should not set tag to all child nodes`, it will first verify subquery in `lateral` then verify whole plan, if there is a view, when verify the whole plan, the `PermanentViewMarker` will be remove by spark's optimizer. Then it will verify both source table `table1` and `table2`. So I think we need to do 3 things: 1. Mark all PermanentViewMarker's children's all nodes as checked and Subquery's all child marks as checked. 2. `isAuthChecked` should only check the first level of the plan to avoid skipping the check of the whole plan in the demo test 3. in `buildQuery`, if the current node has the tag, we just skip it. Without this pr, the SQL in test will both check `table1` and `table2` ### _How was this patch tested?_ - [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request ### _Was this patch authored or co-authored using generative AI tooling?_ No Closes apache#5563 from AngersZhuuuu/KYUUBI-5503-FOLLOWUP. Closes apache#5503 c1a427f [Angerszhuuuu] Update Authorization.scala d6b2899 [Angerszhuuuu] update 633bc91 [Angerszhuuuu] Update Authorization.scala 7a006b1 [Angerszhuuuu] [KYUUBI apache#5503][FOLLOWUP][AUTHZ] Authz should skip inner plan that have been verified Authored-by: Angerszhuuuu <angers.zhu@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ To close apache#5575 Fix wrong code in test case of dir command ### _How was this patch tested?_ - [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request ### _Was this patch authored or co-authored using generative AI tooling?_ No Closes apache#5577 from AngersZhuuuu/KYUUBI-5576. Closes apache#5576 60e2cb8 [Angerszhuuuu] [KYUUBI apache#5576][Bug] Fix wrong code in test case of dir command Authored-by: Angerszhuuuu <angers.zhu@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
…seful ### _Why are the changes needed?_ As title and make Web UI more clean. And as Contact Us page and Overview page will do refactor later, so remain these. ### _How was this patch tested?_ - [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request ![截屏2023-10-31 15 30 37](https://github.com/apache/kyuubi/assets/52876270/443feaf5-2d9a-4683-9214-6b7f5b5769cd) ### _Was this patch authored or co-authored using generative AI tooling?_ No Closes apache#5574 from zwangsheng/KYUUBI#5573. Closes apache#5573 462f9f6 [zwangsheng] fix comments d321010 [zwangsheng] [KYUUBI apache#5573][Improvement] Delete parts of the Kyuubi Web UI that are not useful Authored-by: zwangsheng <binjieyang@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why are the changes needed?
How was this patch tested?
Add some test cases that check the changes thoroughly including negative and positive cases if possible
Add screenshots for manual tests if appropriate
Run test locally before make a pull request