Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstream updates #6

Open
wants to merge 761 commits into
base: master
Choose a base branch
from
Open

Upstream updates #6

wants to merge 761 commits into from

Conversation

nousot-cloud-guy
Copy link

Why are the changes needed?

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before make a pull request

pan3793 and others added 30 commits July 20, 2023 17:02
…e and related issues

### _Why are the changes needed?_

Currently, the `KyuubiOperationWithEngineSecuritySuite` is not valid, because

1. `InternalSecurityAccessor` is a singleton, only the first initialized one takes effect, which means if we change the testing orders, some tests may fail.
2. `discoveryClient.startSecretNode` calls `PersistentNode#start` underlying, which is async, we should call `waitForInitialCreate` to ensure it is created before running the test. Base on my analysis, it may take 30s for waiting. (mtime-ctime)
   ```
   [zk: 10.221.106.196:55408(CONNECTED) 2] get /SECRET
   _ENGINE_SECRET_
   cZxid = 0x5
   ctime = Wed Jul 19 23:01:57 CST 2023
   mZxid = 0x7
   mtime = Wed Jul 19 23:02:17 CST 2023
   pZxid = 0x5
   cversion = 0
   dataVersion = 1
   aclVersion = 0
   ephemeralOwner = 0x0
   dataLength = 15
   numChildren = 0
   ```
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes apache#5072 from pan3793/security.

Closes apache#5072

69cce29 [Cheng Pan] fix
2d62355 [Cheng Pan] fix
74eb2cb [Cheng Pan] fix
6d8f4ce [Cheng Pan] KyuubiOperationWithEngineSecurity

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_

to fix

```
SparkDeltaOperationSuite:
org.apache.kyuubi.engine.spark.operation.SparkDeltaOperationSuite *** ABORTED ***
  java.lang.RuntimeException: Unable to load a Suite class org.apache.kyuubi.engine.spark.operation.SparkDeltaOperationSuite that was discovered in the runpath: Not Support spark version (4,0)
  at org.scalatest.tools.DiscoverySuite$.getSuiteInstance(DiscoverySuite.scala:80)
  at org.scalatest.tools.DiscoverySuite.$anonfun$nestedSuites$1(DiscoverySuite.scala:38)
  at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
  at scala.collection.Iterator.foreach(Iterator.scala:943)
  at scala.collection.Iterator.foreach$(Iterator.scala:943)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
  at scala.collection.IterableLike.foreach(IterableLike.scala:74)
  at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
  at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
  at scala.collection.TraversableLike.map(TraversableLike.scala:286)
  ...
  Cause: java.lang.IllegalArgumentException: Not Support spark version (4,0)
  at org.apache.kyuubi.engine.spark.WithSparkSQLEngine.$init$(WithSparkSQLEngine.scala:42)
  at org.apache.kyuubi.engine.spark.operation.SparkDeltaOperationSuite.<init>(SparkDeltaOperationSuite.scala:25)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
  at java.lang.Class.newInstance(Class.java:442)
  at org.scalatest.tools.DiscoverySuite$.getSuiteInstance(DiscoverySuite.scala:66)
  at org.scalatest.tools.DiscoverySuite.$anonfun$nestedSuites$1(DiscoverySuite.scala:38)
  at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
  ...
```

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes apache#5075 from cfmcgrady/spark-4.0.

Closes apache#5075

ad38c0d [Fu Chen] refine test to adapt Spark 4.0

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
…ess to allows it release temp files

### _Why are the changes needed?_

fix bug apache#5065

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes apache#5066 from ASiegeLion/master.

Closes apache#5065

08d1ac0 [Cheng Pan] Update kyuubi-server/src/main/scala/org/apache/kyuubi/engine/ProcBuilder.scala
bf908f5 [Cheng Pan] Update kyuubi-server/src/main/scala/org/apache/kyuubi/engine/ProcBuilder.scala
9144582 [Cheng Pan] Update kyuubi-server/src/main/scala/org/apache/kyuubi/engine/ProcBuilder.scala
f1c95e4 [liupeiyue] [KYUUBI-apache#5065] Call destroy first on killing Spark startup process to allows it release temp files
907123a [liupeiyue] [KYUUBI-apache#5065] Call destroy first on killing Spark startup process to allows it release temp files
f30a9fc [liupeiyue] [KYUUBI-apache#5065] Call destroy first on killing Spark startup process to allows it release temp files
449be44 [文艺攻城狮] Update kyuubi-server/src/main/scala/org/apache/kyuubi/engine/ProcBuilder.scala
987ffc7 [文艺攻城狮] Update kyuubi-common/src/main/scala/org/apache/kyuubi/config/KyuubiConf.scala
995386f [文艺攻城狮] Update kyuubi-common/src/main/scala/org/apache/kyuubi/config/KyuubiConf.scala
ad3d111 [liupeiyue] [KYUUBI-apache#5065]destroy the spark engine release the submitted temp files

Lead-authored-by: liupeiyue <liupeiyue@yy.com>
Co-authored-by: 文艺攻城狮 <945076608@qq.com>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_

Close apache#5009

When Kyuubi Server Log is Huge, it's difficult to find `Spark Engine Log Path` in logs.

Here pass the path to spark conf, user can find engine log path in spark ui or spark history server.

Submit Command Like:
```shell
XXXX/bin/spark-submit \
  --class org.apache.kyuubi.engine.spark.SparkSQLEngine \
  --conf spark.kyuubi.engine.engineLog.path=XXXX/kyuubi-spark-sql-engine.log.0 \
  --proxy-user kyuubi XXXX/target/kyuubi-spark-sql-engine_2.12-1.8.0-SNAPSHOT.jar
```

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes apache#5011 from zwangsheng/KYUUBI_5009.

Closes apache#5009

36c7722 [zwangsheng] fix compile
1c20f92 [zwangsheng] retest
70568c7 [zwangsheng] Fix Unit Test
2bc4657 [zwangsheng] try to fix unit test
2197b35 [zwangsheng] Narrow the scope of access
a44eefc [zwangsheng] [KYUUBI apache#5009]Pass Spark Engine Log Path to Spark COnf

Authored-by: zwangsheng <2213335496@qq.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
…e schema

### _Why are the changes needed?_

This is required by Batch V2, as it allows the batch job queued in metastore before being picked by Kyuubi Server for scheduling.

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [x] Add screenshots for manual tests if appropriate

```
mysql> CREATE TABLE IF NOT EXISTS metadata(
    ->     key_id bigint PRIMARY KEY AUTO_INCREMENT COMMENT 'the auto increment key id',
    ->     identifier varchar(36) NOT NULL COMMENT 'the identifier id, which is an UUID',
    ->     session_type varchar(32) NOT NULL COMMENT 'the session type, SQL or BATCH',
    ->     real_user varchar(255) NOT NULL COMMENT 'the real user',
    ->     user_name varchar(255) NOT NULL COMMENT 'the user name, might be a proxy user',
    ->     ip_address varchar(128) COMMENT 'the client ip address',
    ->     kyuubi_instance varchar(1024) NOT NULL COMMENT 'the kyuubi instance that creates this',
    ->     state varchar(128) NOT NULL COMMENT 'the session state',
    ->     resource varchar(1024) COMMENT 'the main resource',
    ->     class_name varchar(1024) COMMENT 'the main class name',
    ->     request_name varchar(1024) COMMENT 'the request name',
    ->     request_conf mediumtext COMMENT 'the request config map',
    ->     request_args mediumtext COMMENT 'the request arguments',
    ->     create_time BIGINT NOT NULL COMMENT 'the metadata create time',
    ->     engine_type varchar(32) NOT NULL COMMENT 'the engine type',
    ->     cluster_manager varchar(128) COMMENT 'the engine cluster manager',
    ->     engine_open_time bigint COMMENT 'the engine open time',
    ->     engine_id varchar(128) COMMENT 'the engine application id',
    ->     engine_name mediumtext COMMENT 'the engine application name',
    ->     engine_url varchar(1024) COMMENT 'the engine tracking url',
    ->     engine_state varchar(32) COMMENT 'the engine application state',
    ->     engine_error mediumtext COMMENT 'the engine application diagnose',
    ->     end_time bigint COMMENT 'the metadata end time',
    ->     peer_instance_closed boolean default '0' COMMENT 'closed by peer kyuubi instance',
    ->     UNIQUE INDEX unique_identifier_index(identifier),
    ->     INDEX user_name_index(user_name),
    ->     INDEX engine_type_index(engine_type)
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
Query OK, 0 rows affected (0.04 sec)

mysql> ALTER TABLE metadata MODIFY kyuubi_instance varchar(1024) COMMENT 'the kyuubi instance that creates this';
Query OK, 0 rows affected (0.05 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> SHOW CREATE TABLE metadata;
mysql> SHOW CREATE TABLE metadata;
+----------+---------------------------------------------------------------------------+
| Table    | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
+----------+---------------------------------------------------------------------------+
| metadata | CREATE TABLE `metadata` (
  `key_id` bigint NOT NULL AUTO_INCREMENT COMMENT 'the auto increment key id',
  `identifier` varchar(36) NOT NULL COMMENT 'the identifier id, which is an UUID',
  `session_type` varchar(32) NOT NULL COMMENT 'the session type, SQL or BATCH',
  `real_user` varchar(255) NOT NULL COMMENT 'the real user',
  `user_name` varchar(255) NOT NULL COMMENT 'the user name, might be a proxy user',
  `ip_address` varchar(128) DEFAULT NULL COMMENT 'the client ip address',
  `kyuubi_instance` varchar(1024) DEFAULT NULL COMMENT 'the kyuubi instance that creates this',
  `state` varchar(128) NOT NULL COMMENT 'the session state',
  `resource` varchar(1024) DEFAULT NULL COMMENT 'the main resource',
  `class_name` varchar(1024) DEFAULT NULL COMMENT 'the main class name',
  `request_name` varchar(1024) DEFAULT NULL COMMENT 'the request name',
  `request_conf` mediumtext COMMENT 'the request config map',
  `request_args` mediumtext COMMENT 'the request arguments',
  `create_time` bigint NOT NULL COMMENT 'the metadata create time',
  `engine_type` varchar(32) NOT NULL COMMENT 'the engine type',
  `cluster_manager` varchar(128) DEFAULT NULL COMMENT 'the engine cluster manager',
  `engine_open_time` bigint DEFAULT NULL COMMENT 'the engine open time',
  `engine_id` varchar(128) DEFAULT NULL COMMENT 'the engine application id',
  `engine_name` mediumtext COMMENT 'the engine application name',
  `engine_url` varchar(1024) DEFAULT NULL COMMENT 'the engine tracking url',
  `engine_state` varchar(32) DEFAULT NULL COMMENT 'the engine application state',
  `engine_error` mediumtext COMMENT 'the engine application diagnose',
  `end_time` bigint DEFAULT NULL COMMENT 'the metadata end time',
  `peer_instance_closed` tinyint(1) DEFAULT '0' COMMENT 'closed by peer kyuubi instance',
  PRIMARY KEY (`key_id`),
  UNIQUE KEY `unique_identifier_index` (`identifier`),
  KEY `user_name_index` (`user_name`),
  KEY `engine_type_index` (`engine_type`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+----------+---------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql>
```

The derby SQL also is tested

<img width="1330" alt="image" src="https://github.com/apache/kyuubi/assets/26535726/4eef0742-05dd-4bd6-a77e-e9de0238375e">

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes apache#5078 from pan3793/nullable.

Closes apache#5078

0c5dec8 [Cheng Pan] Make kyuubi_instance nullable in metadata table schema

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_

This is a pure code refactor extracted from apache#4790 to reduce the diff.

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes apache#5081 from pan3793/dialect.

Closes apache#5081

537d623 [Cheng Pan] Minor refactor JDBCMetadataStore

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
…fe during bootstrap

### _Why are the changes needed?_
As titled.

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes apache#5082 from link3280/KYUUBI-5080.

Closes apache#5080

e8026b8 [Paul Lin] [KYUUBI apache#4806][FLINK] Improve logs
fd78f32 [Paul Lin] [KYUUBI apache#4806][FLINK] Fix gateway NPE
a0a7c44 [Cheng Pan] Update externals/kyuubi-flink-sql-engine/src/main/java/org/apache/flink/client/deployment/application/executors/EmbeddedExecutorFactory.java
50830d4 [Paul Lin] [KYUUBI apache#5080][FLINK] Fix EmbeddedExecutorFactory not thread-safe during bootstrap

Lead-authored-by: Paul Lin <paullin3280@gmail.com>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_
Updated [kyuubi on kubernetes config section](https://kyuubi.readthedocs.io/en/master/deployment/kyuubi_on_kubernetes.html#config) to state <code> Kyuubi **does** not recommend using this way on Kubernetes</code>

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes apache#5086 from mans2singh/ISSUE-5085.

Closes apache#5086

5faf0df [mans2singh] [KYUUBI # 5085] Update config section based on review comments
df9f62f [mans2singh] [KYUUBI # 5085] Update config section of deploy on kubernetes

Authored-by: mans2singh <mans2singh@yahoo.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
… blank lines in Windows

### _Why are the changes needed?_

close apache#5090

### _How was this patch tested?_

After this PR it generates normal settings file in windows.

- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes apache#5091 from wForget/KYUUBI-5090.

Closes apache#5090

9e974c7 [wforget] fix
dc1ebfc [wforget] fix
2cbec60 [wforget] [KYUUBI-5090] Fix AllKyuubiConfiguration to generate redundant blank lines in Windows
ecc3b4a [mans2singh] [KYUUBI apache#5086] [KYUUBI # 5085] Update config section of deploy on kubernetes

Lead-authored-by: wforget <643348094@qq.com>
Co-authored-by: mans2singh <mans2singh@yahoo.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
…edundant version comparison methods

### _Why are the changes needed?_

- Support initializing or comparing version with major version only, e.g "3" equivalent to  "3.0"
- Remove redundant version comparison methods by using semantic versions of Spark, Flink and Kyuubi
- adding common `toDouble` method

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes apache#5039 from bowenliang123/improve-semanticversion.

Closes apache#5039

b686826 [liangbowen] nit
d39646b [liangbowen] SPARK_ENGINE_RUNTIME_VERSION
9148caa [liangbowen] use semantic versions
ecc3b4a [mans2singh] [KYUUBI apache#5086] [KYUUBI # 5085] Update config section of deploy on kubernetes

Lead-authored-by: liangbowen <liangbowen@gf.com.cn>
Co-authored-by: mans2singh <mans2singh@yahoo.com>
Signed-off-by: liangbowen <liangbowen@gf.com.cn>
### _Why are the changes needed?_

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes apache#5094 from dev-lpq/add_python_doc.

Closes apache#5094

c7d50d7 [pengqli] upgrade Python-JayDeBeApi doc
41f96fc [pengqli] upgrade Python-JayDeBeApi doc
dd0f91b [pengqli] upgrade Python-JayDeBeApi doc
ae1b7bc [pengqli] upgrade Python-JayDeBeApi doc
189d7c8 [pengqli] upgrade Python-JayDeBeApi doc
2e1e7b4 [pengqli] upgrade Python-JayDeBeApi doc
362a432 [pengqli] add Python-JayDeBeApi doc

Authored-by: pengqli <pengqli@cisco.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_

- Remove the existing single quote in message format which causes the argument 0 is not used
- `A single quote itself must be represented by doubled single quotes '' throughout a String.` https://docs.oracle.com/javase/8/docs/api/java/text/MessageFormat.html

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes apache#5100 from bowenliang123/datatype-msg.

Closes apache#5100

8135ff1 [liangbowen] fix

Authored-by: liangbowen <liangbowen@gf.com.cn>
Signed-off-by: liangbowen <liangbowen@gf.com.cn>
### _Why are the changes needed?_

- remove 2 unused string builders in `KyuubiQueryResultSet` and `KyuubiArrowQueryResultSet`, which are only appended separator only and never queried again

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes apache#5101 from bowenliang123/unused-sb.

Closes apache#5101

ccb6fb7 [liangbowen] remove never queried StringBuilders

Authored-by: liangbowen <liangbowen@gf.com.cn>
Signed-off-by: liangbowen <liangbowen@gf.com.cn>
### _Why are the changes needed?_

close apache#5099

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes apache#5103 from lsm1/features/kyuubi_5099.

Closes apache#5099

84a1eca [senmiaoliu] fix doc

Authored-by: senmiaoliu <senmiaoliu@trip.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
… engine timeout

### _Why are the changes needed?_
apache#5065

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes apache#5097 from ASiegeLion/master.

Closes apache#5065

d50a388 [Cheng Pan] followup
80861dd [liupeiyue] [KYUUBI apache#5065][FOLLOWUP] Graceful close the process when launch engine timeout

Lead-authored-by: liupeiyue <liupeiyue@yy.com>
Co-authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
…bi server

### _Why are the changes needed?_

As reported by apache#4825, a large number of engine builder processes may cause high machine load on the kyuubi server, So I want to add a config to limit engine creation concurrency.

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes apache#5089 from wForget/engine_builder_limit.

Closes apache#5089

7750700 [wforget] comment
774a859 [wforget] comments
373640f [wforget] Limit maximum engine creation concurrency of kyuubi server
ecc3b4a [mans2singh] [KYUUBI apache#5086] [KYUUBI # 5085] Update config section of deploy on kubernetes

Lead-authored-by: wforget <643348094@qq.com>
Co-authored-by: mans2singh <mans2singh@yahoo.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_

close apache#5076

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes apache#5102 from lsm1/features/kyuubi_5076.

Closes apache#5076

ce7cfe6 [senmiaoliu] kdf support engine url

Authored-by: senmiaoliu <senmiaoliu@trip.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_
As titled.

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes apache#5107 from link3280/engine_fatal_log.

Closes apache#5106

db45392 [Paul Lin] [KYUUBI apache#5106][Flink] Improve logs for fatal errors

Authored-by: Paul Lin <paullin3280@gmail.com>
Signed-off-by: Paul Lin <paullin3280@gmail.com>
### _Why are the changes needed?_

#### How is it done today?

The current procedure of Batch Job API, called V1

##### CREATE batch job procedure in Batch V1

```mermaid
sequenceDiagram
participant Client
participant Server
participant Metastore
participant RM

Client ->> Server : Create Batch Job
Server ->> Server : Create Batch Operator
Server ->> Metastore : Persist Job metadata (PENDING)
Server ->> Server : Put Batch Operator into Execution thread pool
Server ->> Client : Batch Job Info
Server ->> RM : Submit Applicition (in Execution thread pool)
loop Application Check
    Server ->> RM : Query Application Status
    Server ->> Metastore : Update Batch Status
end
```

##### GET batch job info procedure in Batch V1

```mermaid
sequenceDiagram
participant Client
participant Server
participant Metastore
participant RM

Client ->> Server : Query Batch Job Info
alt KyuubiInstance matched
    Server ->> Client : Batch Job Info
else
    Server ->> Server : Forward Request to expected KyuubiInstance
end
```

<!--
```mermaid
sequenceDiagram
participant Client
participant Server
participant Metastore
participant RM

Client ->> Server : Fetch Batch Job logs
alt KyuubiInstance matched
    Server ->> Client : Batch Job logs
else
    Server ->> Server : Forward Request to expected KyuubiInstance
end

Client ->> Server : Close Batch Job
alt KyuubiInstance matched
    Server ->> RM : Close the Application
    Server ->> Metastore : Update Batch Status
    Server ->> Client : Closed Batch Job Info
else
    Server ->> Server : Forward Request to expected KyuubiInstance
end
```
-->

#### What is new in your approach?

This PR proposes a new way for batch job submission, called V2

##### CREATE batch job procedure in Batch V2

```mermaid
sequenceDiagram
participant Client
participant Server
participant Metastore
participant RM

Client ->> Server : Create Batch Job
Server ->> Metastore : Persist Job metadata (INITIALIZED)
Server ->> Client : Batch Job Info

loop Forever in dedicated thread pool
    Server ->> Metastore : Pick up and lock INITIALIZED job
    Server ->> RM : Submit Application
    Server ->> RM : Query Application Status
    Server ->> Metastore : Update Batch Status
end
```

##### GET batch job info procedure in Batch V2

```mermaid
sequenceDiagram
participant Client
participant Server
participant Metastore
participant RM

Client ->> Server : Query Batch Job Info
Server ->> Metastore : Query Batch Job Info
Server ->> Client : Batch Job Info
```

<!--
```mermaid
sequenceDiagram
participant Client
participant Server
participant Metastore
participant RM

Client ->> Server : Fetch Batch Job logs
alt KyuubiInstance matched
    Server ->> Client : Batch Job logs
else
    Server ->> Server : Forward Request to expected KyuubiInstance
end

Client ->> Server : Close Batch Job
alt KyuubiInstance matched
    Server ->> RM : Close the Application
    Server ->> Metastore : Update Batch Status
    Server ->> Client : Closed Batch Job Info
else
    Server ->> Server : Forward Request to expected KyuubiInstance
end
```
-->

#### What are the limits of current practice, and why do you think it will be successful?

Pros:

1. The CREATE request becomes light and returns faster. In V1, we have struggled with whether the response should wait for the engine to be submitted to RM, and how to report the un-submitted job status to the client; in V2, the CREATE request just simply inserts a new record into metastore and returns w/ INITIALIZED state.
2. In common practice, Kyuubi server cluster is deployed behind the load balancer, and the load balancer does not know the real load of each Kyuubi server, suppose it uses Random/RoundRobbin/IPHash policies to forward requests, the existing Batch V1 implementation may cause some Kyuubi servers in high load but others' load are low, because it always uses the requested Kyuubi server to do batch submission; in V2, the Kyuubi server is easy to know the load of itself, e.g. measure by CPU/memory usage, or active batch sessions, and then decides to pick up new batch jobs or not. Besides, when all Kyuubi servers overload, the V1 cannot benefit immediately even if the admin scale up the cluster size.
3. In V1, the metrics are almost independent in each Kyuubi server; in V2, it's easy to expose global metrics of batch jobs when using sharable storage as metastore backend, e.g. we can easily get how many batches are queued in metastore, and how many batches are managed by each Kyuubi server, by querying the metastore backend directly or metrics exposed by each Kyuubi server.

Cons:

1. V1 assumes Kyuubi server tolerant long time outage of metastore, V2 forcibly depends on the availability of metastore. But we can move the existing forwarding logic and async retry logic to the implementation of `Metastore` to overcome this regression.

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request

Closes apache#4790 from pan3793/batch-v2.

Closes apache#4790

860698a [Cheng Pan] BATCH_IMPL_VERSION
b9c68aa [Cheng Pan] kyuubi.batch.impl.version
17e4f19 [Cheng Pan] submitter.threads=100
7c0bdb0 [Cheng Pan] Initial implement Batch v2

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_

I'd like to update LDAP doc to guide users for setup LDAP authentication in Kyuubi.
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [x] Add screenshots for manual tests if appropriate

<img width="1395" alt="image" src="https://github.com/apache/kyuubi/assets/26535726/6925a8e3-dfaf-48ad-a442-bb635fe75830">

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes apache#5083 from zhaohehuhu/Improvement-0721.

Closes apache#5083

8c0e149 [Cheng Pan] polish
22f8d3a [Cheng Pan] nit
822fa66 [hezhao2] sync
78ae123 [hezhao2] further explanation for LDAP filters
7ebc61a [Cheng Pan] Update docs/security/ldap.md
bb06810 [Cheng Pan] Update docs/security/ldap.md
8d19fdf [Cheng Pan] Update docs/security/ldap.md
c2fa280 [Cheng Pan] Update docs/security/ldap.md
2acbb87 [hezhao2] update LDAP doc
22027e1 [hezhao2] update LDAP doc

Lead-authored-by: hezhao2 <hezhao2@cisco.com>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Co-authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
…gine bootstrap

### _Why are the changes needed?_
As titled.

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes apache#5109 from link3280/bootstrap_file_not_found.

Closes apache#5108

318199f [Paul Lin] [KYUUBI apache#5108][Flink] Fix iFileNotFoundException during Flink engine bootstrap

Authored-by: Paul Lin <paullin3280@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_

Fix apache#3920

https://github.com/apache/kyuubi/actions/runs/5711863703/job/15474230690?pr=4790

```
DockerizedZkServiceDiscoverySuite:
- distribute lock *** FAILED ***
  Expected exception org.apache.kyuubi.KyuubiSQLException to be thrown, but no exception was thrown (DiscoveryClientTests.scala:147)
```

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes apache#5112 from pan3793/test-lock.

Closes apache#3920

d980f87 [Cheng Pan] Fix flaky test - distribute lock

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_

- remove duplicated assignment for the same variable in adjacent lines in `FastHiveDecimalImpl`
- replace redundant `putAll` with collection initialization in `BatchRestApi`
- use `try-with-resources` statement with the reader and avoid declaring two variables in the same line of code in `KyuubiCommands`
- fix `warning: Tag 'return:' is not recognised` compilation warning in `KyuubiGetSqlClassification:L53`

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes apache#5117 from bowenliang123/fastsignum.

Closes apache#5117

595b574 [liangbowen] simplify
be530fa [liangbowen] fix warning: Tag '@return:' is not recognised compilation warning in KyuubiGetSqlClassification:L53
2497069 [liangbowen] use try-with-resources in KyuubiCommands
a54a97f [liangbowen] remove redundant addAll call to collection initialization
cc76d5d [liangbowen] remove repeated assignment

Authored-by: liangbowen <liangbowen@gf.com.cn>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_

It was planned but actually delayed, remove this dummy module to save CI and avoid confusing users and release managers.

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes apache#5113 from pan3793/remove-kudu.

Closes apache#5113

ff8fd2e [Cheng Pan] Remove Spark Kudu connector

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_

close apache#4940

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes apache#5110 from lsm1/features/kyuubi_4940.

Closes apache#4940

6c0a9a3 [senmiaoliu] add kdf for hive engine

Authored-by: senmiaoliu <senmiaoliu@trip.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_

https://hadoop.apache.org/release/3.3.6.html

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes apache#5116 from pan3793/hadoop-3.3.6.

Closes apache#5116

c3717e7 [Cheng Pan] Bump Hadoop 3.3.6

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_

Use StatefulSet instead of Deployment, add a headless service for statefulset

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [x] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

![image](https://github.com/apache/kyuubi/assets/3177898/0991c287-cf1a-40f1-8e50-3934bd2886ca)
![image](https://github.com/apache/kyuubi/assets/3177898/9a5d11a5-2ac9-468e-bfcb-9a070f54c6b4)

Closes apache#5062 from camper42/statefulset.

Closes apache#4788

a1a7f1b [camper42] style: remove redudant Global variable `$`
5286f4f [camper42] fix: set statefulset podManagementPolicy
ed83ae2 [camper42] style: move headless service to separate file
97b76ea [camper42] use `clusterIP: None` for headless serivce
d2078ff [Cheng Pan] Update charts/kyuubi/templates/kyuubi-statefulset.yaml
35c7e0f [Cheng Pan] Update charts/kyuubi/templates/kyuubi-statefulset.yaml
8d970d2 [camper42] style: indent
3cf2274 [camper42] [KYUUBI apache#4788][K8S][HELM] Use StatefulSet instead of Deployment

Lead-authored-by: camper42 <camper.xlii@gmail.com>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_

close apache#5122

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes apache#5125 from lsm1/features/kyuubi_5122.

Closes apache#5122

02d0769 [senmiaoliu] add hive kdf docs

Authored-by: senmiaoliu <senmiaoliu@trip.com>
Signed-off-by: liangbowen <liangbowen@gf.com.cn>
### _Why are the changes needed?_

In Batch implementation v2, the following query is frequently executed to pick the job.
```
SELECT identifier FROM metadata WHERE state='INITIALIZED' ORDER BY create_time DESC LIMIT 1
```
Create an index for `create_time` could speed up the query and reduce the pressure on MySQL server.

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [x] Add screenshots for manual tests if appropriate

Test the MySQL upgrading SQLs

```
mysql> CREATE TABLE IF NOT EXISTS metadata(
    ->     key_id bigint PRIMARY KEY AUTO_INCREMENT COMMENT 'the auto increment key id',
    ->     identifier varchar(36) NOT NULL COMMENT 'the identifier id, which is an UUID',
    ->     session_type varchar(32) NOT NULL COMMENT 'the session type, SQL or BATCH',
    ->     real_user varchar(255) NOT NULL COMMENT 'the real user',
    ->     user_name varchar(255) NOT NULL COMMENT 'the user name, might be a proxy user',
    ->     ip_address varchar(128) COMMENT 'the client ip address',
    ->     kyuubi_instance varchar(1024) NOT NULL COMMENT 'the kyuubi instance that creates this',
    ->     state varchar(128) NOT NULL COMMENT 'the session state',
    ->     resource varchar(1024) COMMENT 'the main resource',
    ->     class_name varchar(1024) COMMENT 'the main class name',
    ->     request_name varchar(1024) COMMENT 'the request name',
    ->     request_conf mediumtext COMMENT 'the request config map',
    ->     request_args mediumtext COMMENT 'the request arguments',
    ->     create_time BIGINT NOT NULL COMMENT 'the metadata create time',
    ->     engine_type varchar(32) NOT NULL COMMENT 'the engine type',
    ->     cluster_manager varchar(128) COMMENT 'the engine cluster manager',
    ->     engine_open_time bigint COMMENT 'the engine open time',
    ->     engine_id varchar(128) COMMENT 'the engine application id',
    ->     engine_name mediumtext COMMENT 'the engine application name',
    ->     engine_url varchar(1024) COMMENT 'the engine tracking url',
    ->     engine_state varchar(32) COMMENT 'the engine application state',
    ->     engine_error mediumtext COMMENT 'the engine application diagnose',
    ->     end_time bigint COMMENT 'the metadata end time',
    ->     peer_instance_closed boolean default '0' COMMENT 'closed by peer kyuubi instance',
    ->     UNIQUE INDEX unique_identifier_index(identifier),
    ->     INDEX user_name_index(user_name),
    ->     INDEX engine_type_index(engine_type)
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
Query OK, 0 rows affected (0.03 sec)

mysql> ALTER TABLE metadata MODIFY kyuubi_instance varchar(1024) COMMENT 'the kyuubi instance that creates this';
Query OK, 0 rows affected (0.06 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE metadata ADD INDEX create_time_index(create_time);
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> show create table metadata;
+----------+--------------------------------------------------------------------------------+
| Table    | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
+----------+--------------------------------------------------------------------------------+
| metadata | CREATE TABLE `metadata` (
  `key_id` bigint NOT NULL AUTO_INCREMENT COMMENT 'the auto increment key id',
  `identifier` varchar(36) NOT NULL COMMENT 'the identifier id, which is an UUID',
  `session_type` varchar(32) NOT NULL COMMENT 'the session type, SQL or BATCH',
  `real_user` varchar(255) NOT NULL COMMENT 'the real user',
  `user_name` varchar(255) NOT NULL COMMENT 'the user name, might be a proxy user',
  `ip_address` varchar(128) DEFAULT NULL COMMENT 'the client ip address',
  `kyuubi_instance` varchar(1024) DEFAULT NULL COMMENT 'the kyuubi instance that creates this',
  `state` varchar(128) NOT NULL COMMENT 'the session state',
  `resource` varchar(1024) DEFAULT NULL COMMENT 'the main resource',
  `class_name` varchar(1024) DEFAULT NULL COMMENT 'the main class name',
  `request_name` varchar(1024) DEFAULT NULL COMMENT 'the request name',
  `request_conf` mediumtext COMMENT 'the request config map',
  `request_args` mediumtext COMMENT 'the request arguments',
  `create_time` bigint NOT NULL COMMENT 'the metadata create time',
  `engine_type` varchar(32) NOT NULL COMMENT 'the engine type',
  `cluster_manager` varchar(128) DEFAULT NULL COMMENT 'the engine cluster manager',
  `engine_open_time` bigint DEFAULT NULL COMMENT 'the engine open time',
  `engine_id` varchar(128) DEFAULT NULL COMMENT 'the engine application id',
  `engine_name` mediumtext COMMENT 'the engine application name',
  `engine_url` varchar(1024) DEFAULT NULL COMMENT 'the engine tracking url',
  `engine_state` varchar(32) DEFAULT NULL COMMENT 'the engine application state',
  `engine_error` mediumtext COMMENT 'the engine application diagnose',
  `end_time` bigint DEFAULT NULL COMMENT 'the metadata end time',
  `peer_instance_closed` tinyint(1) DEFAULT '0' COMMENT 'closed by peer kyuubi instance',
  PRIMARY KEY (`key_id`),
  UNIQUE KEY `unique_identifier_index` (`identifier`),
  KEY `user_name_index` (`user_name`),
  KEY `engine_type_index` (`engine_type`),
  KEY `create_time_index` (`create_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+----------+--------------------------------------------------------------------------------+
```

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes apache#5131 from pan3793/metastore-create-time-index.

Closes apache#5131

fc18041 [Cheng Pan] ALTER TABLE ADD INDEX
c2261ed [Cheng Pan] update upgrade script
4f94be5 [Cheng Pan] Create index on metastore.create_time

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_

Otherwise we can not see JDK logs like Krb5.

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes apache#5129 from pan3793/beeline-log.

Closes apache#5129

1000948 [Cheng Pan] KyuubiBeeline should redirect JDK logging

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
pan3793 and others added 8 commits October 31, 2023 12:09
### _Why are the changes needed?_

After performing binary distribution artifacts packaging during 1.8.0-rc0

```patch
diff --git a/kyuubi-server/web-ui/pnpm-lock.yaml b/kyuubi-server/web-ui/pnpm-lock.yaml
index 83754291b..f25c02de7 100644
--- a/kyuubi-server/web-ui/pnpm-lock.yaml
+++ b/kyuubi-server/web-ui/pnpm-lock.yaml
 -1,4 +1,4
-lockfileVersion: '6.0'
+lockfileVersion: '6.1'

 settings:
   autoInstallPeers: true
```

The inconsistency may be caused by different version install in the local environment and defined in `pom.xml`, I'm not sure if there is a version management system for pnpm

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [x] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

### _Was this patch authored or co-authored using generative AI tooling?_

No

Closes apache#5569 from pan3793/pnpm-lock.

Closes apache#5569

8a09870 [Cheng Pan] Fix pnpm-lock file version

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
…ity.enabled` to add HTTP auth header

### _Why are the changes needed?_

`kyuubi.engine.security.enabled` aims to control whether enabled security mechanism internal communication, but the current implementation is not symmetrical, the auth generator ignores the conf and always produces the auth header, but the auth header handler is only activated when conf is enabled, that causes authentication failure when `kyuubi.engine.security.enabled=false`(default value)

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

### _Was this patch authored or co-authored using generative AI tooling?_

No.

Closes apache#5566 from pan3793/none-auth.

Closes apache#5566

d42a4c3 [Cheng Pan] Revert "Extract AnonymousAuthenticationHandler from BasicAuthenticationHandler"
b544343 [Cheng Pan] Extract AnonymousAuthenticationHandler from BasicAuthenticationHandler
75c4b7d [Cheng Pan] InternalRestClient respects `kyuubi.engine.security.enabled` to add HTTP auth header

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
…edup

### _Why are the changes needed?_

1. This PR fixes the precision loss issue in `xx_gmt_offset`. Please note that since `xx_gmt_offset` is of integer type, there is no actual loss of precision.

```
trino:tiny> select cc_gmt_offset from call_center ;
 cc_gmt_offset
---------------
         -5.00
         -5.00
```

Before this PR:

```scala
scala> spark.sql("select cc_gmt_offset from tpcds.tiny.call_center").show
+-------------+
|cc_gmt_offset|
+-------------+
|           -5|
|           -5|
+-------------+
```

After this PR:
```scala
scala> spark.sql("select cc_gmt_offset from tpcds.tiny.call_center").show
+-------------+
|cc_gmt_offset|
+-------------+
|        -5.00|
|        -5.00|
+-------------+
```

2. This PR accelerates the generation of the TPC-DS dataset by optimizing the way Rows are generated.

Before this PR, The previous process involved converting **Trino TableRow** into **String Row** and then further into **Spark InternalRow**.

After this PR, we have streamlined the process by directly converting **Trino TableRow** into **Spark InternalRow**, eliminating unnecessary toString operations. This change significantly improves the speed of TPC-DS dataset generation.

```scala
spark.table("tpcds.sf1000.catalog_sales").foreach(r => ())
```

Task Duration before this PR:

![截屏2023-10-30 下午4 04 12](https://github.com/apache/kyuubi/assets/8537877/69bd9938-2886-4044-99b8-79ed20d4791c)

Task Duration after this PR:

![截屏2023-10-30 下午4 02 08](https://github.com/apache/kyuubi/assets/8537877/ddfe01a9-081c-41b5-b82c-a0934dd8686c)

### _How was this patch tested?_

- New UT `tpcds.tiny count and checksum`
- Compare checksum values before and after this PR on the 1TB dataset

| table_name             | count           | checksum                  |
|------------------------|-----------------|---------------------------|
| call_center            | 42              | 95607401475               |
| catalog_page           | 30000           | 64470199469085            |
| catalog_returns        | 143996756       | 309202327050775220        |
| catalog_sales          | 1439980416      | 3092267266923848000       |
| customer               | 12000000        | 25769069905636795         |
| customer_address       | 6000000         | 12889423380880973         |
| customer_demographics  | 1920800         | 4124183189708148          |
| date_dim               | 73049           | 156926081012862           |
| household_demographics | 7200            | 15494873325812            |
| income_band            | 20              | 41180951007               |
| inventory              | 783000000       | 1681487454682584456       |
| item                   | 300000          | 643000708260945           |
| promotion              | 1500            | 3270935493709             |
| reason                 | 65              | 118806664977              |
| ship_mode              | 20              | 52349078860               |
| store                  | 1002            | 2096408105720             |
| store_returns          | 287999764       | 618451374856897114        |
| store_sales            | 2879987999      | 6184670571185100839       |
| time_dim               | 86400           | 186045071019485           |
| warehouse              | 20              | 31374161844               |
| web_page               | 3000            | 6502456139647             |
| web_returns            | 71997522        | 154614570845312413        |
| web_sales              | 720000376       | 1546188452223821591       |
| web_site               | 54              | 107485781738              |

### _Was this patch authored or co-authored using generative AI tooling?_

No

Closes apache#5562 from cfmcgrady/tpcds-perf.

Closes apache#5550

a789b9e [Fu Chen] maxPartitionBytes=384m
659e209 [Fu Chen] style
916f6d2 [Fu Chen] unnecessary change
75981af [Fu Chen] tpcds perf

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_

Usually, we can use `spark.sql.shuffle.partitions` to configure the number of shuffle partitions (or `spark.sql.adaptive.coalescePartitions.initialPartitionNum` for AQE). However, it seems difficult to find a universal value for all SQL jobs.

Although Spark AQE can dynamically merge and split partitions based on partition size, inappropriate shuffle partitions may still cause some problems:

+ When there are too few shuffle partitions, the join skew optimization threshold is large and the skew partitions will not be split.
+ When using RemoteShuffleService, an inappropriate number of shuffle partitions may result in too large partitions or too many partitions, which will lead to high pressure on the shuffle server.

So I want to provide an optimization rule to dynamically adjust the number of partitions based on the size of the input data.

Calculate the number of partitions based on input data size:

```
targetShufflePartitions = sum(scanSize|shuffleReadSize) / advisoryPartitionSizeInBytes
```

then replace the number of partitions for all `ShuffleExchangeExec` nodes.

### _How was this patch tested?_
- [X] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

### _Was this patch authored or co-authored using generative AI tooling?_

No

Closes apache#5489 from wForget/dynamic_shuffle_partitions.

Closes apache#5489

5a2bb6c [wforget] only takes effect when aqe is enabled
038b7bb [wforget] moved behind InsertShuffleNodeBeforeJoin
7ca87d8 [wforget] comment
d65047f [wforget] sum scanSizes
e4d8f33 [wforget] comments
4f0f25d [wforget] configurable
f77d1d6 [wforget] code style
0bf572f [wforget] use partition stats
8d251c3 [wforget] Adjust shuffle partitions dynamically

Authored-by: wforget <643348094@qq.com>
Signed-off-by: ulyssesyou <ulyssesyou@apache.org>
…at have been verified

### _Why are the changes needed?_
To close apache#5503
For sql such as lateral join in test `[KYUUBI apache#5503][AUTHZ] Check plan auth checked should not set tag to all child nodes`, it will first verify subquery in `lateral` then verify whole plan, if there is a view, when verify the whole plan, the `PermanentViewMarker` will be remove by spark's optimizer.
Then it will verify both source table `table1` and `table2`.
So I think we need to do 3 things:

1. Mark all PermanentViewMarker's children's all nodes as checked and Subquery's all child marks as checked.
2. `isAuthChecked` should only check the first level of the plan to avoid skipping the check of the whole plan in the demo test
3. in `buildQuery`, if the current node has the tag, we just skip it.

Without this pr, the SQL in test will both check `table1` and `table2`

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

### _Was this patch authored or co-authored using generative AI tooling?_
No

Closes apache#5563 from AngersZhuuuu/KYUUBI-5503-FOLLOWUP.

Closes apache#5503

c1a427f [Angerszhuuuu] Update Authorization.scala
d6b2899 [Angerszhuuuu] update
633bc91 [Angerszhuuuu] Update Authorization.scala
7a006b1 [Angerszhuuuu] [KYUUBI apache#5503][FOLLOWUP][AUTHZ] Authz should skip inner plan that have been verified

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_
To close apache#5575
 Fix wrong code in test case of dir command

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

### _Was this patch authored or co-authored using generative AI tooling?_
No

Closes apache#5577 from AngersZhuuuu/KYUUBI-5576.

Closes apache#5576

60e2cb8 [Angerszhuuuu] [KYUUBI apache#5576][Bug] Fix wrong code in test case of dir command

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
…seful

### _Why are the changes needed?_

As title and make Web UI more clean.

And as Contact Us page and Overview page will do refactor later, so remain these.

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

![截屏2023-10-31 15 30 37](https://github.com/apache/kyuubi/assets/52876270/443feaf5-2d9a-4683-9214-6b7f5b5769cd)

### _Was this patch authored or co-authored using generative AI tooling?_

No

Closes apache#5574 from zwangsheng/KYUUBI#5573.

Closes apache#5573

462f9f6 [zwangsheng] fix comments
d321010 [zwangsheng] [KYUUBI apache#5573][Improvement] Delete parts of the Kyuubi Web UI that are not useful

Authored-by: zwangsheng <binjieyang@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment