-
Notifications
You must be signed in to change notification settings - Fork 575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(sink): enable delta lake sink #10374
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we mark it as user-facing?
Codecov Report
@@ Coverage Diff @@
## main #10374 +/- ##
=======================================
Coverage 70.48% 70.48%
=======================================
Files 1257 1257
Lines 214185 214185
=======================================
+ Hits 150959 150960 +1
+ Misses 63226 63225 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 1 file with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
commit ec637af4f5458b1a951d591a3dd7fc6994192e8f Author: Little-Wallace <bupt2013211450@gmail.com> Date: Tue Jun 20 12:52:47 2023 +0800 fix config Signed-off-by: Little-Wallace <bupt2013211450@gmail.com> commit 14641c2 Author: Little-Wallace <bupt2013211450@gmail.com> Date: Mon Jun 19 20:47:43 2023 +0800 fix config Signed-off-by: Little-Wallace <bupt2013211450@gmail.com> commit bc252ee Author: Little-Wallace <bupt2013211450@gmail.com> Date: Mon Jun 19 20:10:51 2023 +0800 fix busy loop Signed-off-by: Little-Wallace <bupt2013211450@gmail.com> commit 5b816a6 Merge: 1059c15 02dfee5 Author: Wallace <bupt2013211450@gmail.com> Date: Mon Jun 19 13:59:04 2023 +0800 Merge branch 'main' into scheduler-split commit 02dfee5 Author: William Wen <44139337+wenym1@users.noreply.github.com> Date: Mon Jun 19 13:52:03 2023 +0800 feat(log-store): implement a merge stream of kv-log-store (#10090) commit a6c9c39 Author: lmatz <lmatz823@gmail.com> Date: Mon Jun 19 13:28:28 2023 +0800 chore: use github action to auto cherry pick pr to release branch (#10383) commit 608e183 Author: Bohan Zhang <tabvision@bupt.icu> Date: Mon Jun 19 12:18:28 2023 +0800 fix: support variable scale decimal in avro (#10368) Co-authored-by: idx0-dev <124041366+idx0-dev@users.noreply.github.com> commit 75f6025 Author: zwang28 <70626450+zwang28@users.noreply.github.com> Date: Sun Jun 18 17:41:15 2023 +0800 feat(trace): enable await tree trace for compactor (#10381) commit 321d376 Author: wu <f_dogs@protonmail.com> Date: Sun Jun 18 15:59:38 2023 +0800 feat(connector): sink support for elasticsearch (#10357) commit d13d862 Author: Eric Fu <eric@singularity-data.com> Date: Sun Jun 18 00:26:47 2023 +0800 feat: add debug profile tools in docker image (#10380) commit 1059c15 Merge: 9ac9ed4 d26f4bb Author: Wallace <bupt2013211450@gmail.com> Date: Fri Jun 16 20:49:21 2023 +0800 Merge branch 'main' into scheduler-split commit d26f4bb Author: Yuhao Su <31772373+yuhao-su@users.noreply.github.com> Date: Fri Jun 16 18:36:27 2023 +0800 feat(metrics): add metrics for the evicted watermark for each executors (#10379) commit 3dd1393 Author: William Wen <44139337+wenym1@users.noreply.github.com> Date: Fri Jun 16 17:34:34 2023 +0800 feat(sink): enable delta lake sink (#10374) commit 9ac9ed4 Merge: 58d8562 5c6b25c Author: Wallace <bupt2013211450@gmail.com> Date: Fri Jun 16 17:08:38 2023 +0800 Merge branch 'main' into scheduler-split commit 7b66d55 Author: William Wen <44139337+wenym1@users.noreply.github.com> Date: Fri Jun 16 16:49:57 2023 +0800 fix(docker): install sasl library in docker (#10365) Co-authored-by: Eric Fu <eric@singularity-data.com> commit 5c6b25c Author: zwang28 <70626450+zwang28@users.noreply.github.com> Date: Fri Jun 16 16:10:23 2023 +0800 feat(ctl): list serving fragment mappings (#10331) commit 2c2a2b7 Author: Renjie Liu <liurenjie2008@gmail.com> Date: Fri Jun 16 15:49:00 2023 +0800 fix: Memory counter leak (#10358) commit 1c1354c Author: lmatz <lmatz823@gmail.com> Date: Fri Jun 16 15:36:00 2023 +0800 chore: return a warning message when creating sink with order by (#10239) commit 558cef5 Author: zwang28 <70626450+zwang28@users.noreply.github.com> Date: Fri Jun 16 13:55:08 2023 +0800 feat(frontend): support mask failed serving worker temporarily (#10328) commit 7dccfa3 Author: Bohan Zhang <tabvision@bupt.icu> Date: Fri Jun 16 13:03:21 2023 +0800 chore: fix kafka download path in risedev (#10363) commit 58d8562 Author: Little-Wallace <bupt2013211450@gmail.com> Date: Fri Jun 16 12:53:47 2023 +0800 fix config test Signed-off-by: Little-Wallace <bupt2013211450@gmail.com> commit e77b76b Author: Little-Wallace <bupt2013211450@gmail.com> Date: Fri Jun 16 12:21:35 2023 +0800 fix space reclaim miss Signed-off-by: Little-Wallace <bupt2013211450@gmail.com> commit 2e5a907 Author: Little-Wallace <bupt2013211450@gmail.com> Date: Fri Jun 16 11:10:19 2023 +0800 merge conflict Signed-off-by: Little-Wallace <bupt2013211450@gmail.com> commit 1af4ea1 Author: Little-Wallace <bupt2013211450@gmail.com> Date: Wed Jun 14 16:41:54 2023 +0800 do not check table size for large throughput Signed-off-by: Little-Wallace <bupt2013211450@gmail.com> commit ccc47a2 Merge: 35199c4 9d83f88 Author: Wallace <bupt2013211450@gmail.com> Date: Fri Jun 16 10:59:19 2023 +0800 Merge branch 'main' into scheduler-split commit 9d83f88 Author: idx0-dev <124041366+idx0-dev@users.noreply.github.com> Date: Thu Jun 15 18:39:25 2023 +0800 refactor(source): unified message parser (#10096) Co-authored-by: Eric Fu <eric@singularity-data.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> commit 171e212 Author: William Wen <44139337+wenym1@users.noreply.github.com> Date: Thu Jun 15 16:35:33 2023 +0800 feat(pinot-demo): add demo for sink to pinot via kafka (#10294) commit 11d3092 Author: William Wen <44139337+wenym1@users.noreply.github.com> Date: Thu Jun 15 16:32:54 2023 +0800 feat(java-binding): bundle jni library to jar (#10229) commit 56f4011 Author: Yuhao Su <31772373+yuhao-su@users.noreply.github.com> Date: Thu Jun 15 16:27:28 2023 +0800 feat(metrics): add memory usage metrics for more executor (#10351) commit ea7f95b Author: Yuanxin Cao <60498509+xx01cyx@users.noreply.github.com> Date: Thu Jun 15 15:30:31 2023 +0800 refactor(sink): prune out hidden columns within sink executor (#10276) commit d818a00 Author: Tesla Zhang <ice1000kotlin@foxmail.com> Date: Thu Jun 15 02:58:47 2023 -0400 refactor(plan_node_fmt): 6 more impls for Distill, refactor all `columns_name` functions (#10344) commit 26750c9 Author: xxchan <xxchan22f@gmail.com> Date: Thu Jun 15 08:34:38 2023 +0200 build: use debug=1 back for release (#10345) commit ca41717 Author: Renjie Liu <liurenjie2008@gmail.com> Date: Thu Jun 15 14:28:25 2023 +0800 fix: Batch memory maybe negative (#10338) commit d95d3a2 Author: zwang28 <70626450+zwang28@users.noreply.github.com> Date: Thu Jun 15 14:09:33 2023 +0800 chore(metric): add metric for hummock full GC (#10264) commit 65f05dd Author: StrikeW <wangsiyuanse@gmail.com> Date: Thu Jun 15 13:10:06 2023 +0800 test(integration-test): jdbc sink data type tests (#10202) commit a164ab7 Author: xxchan <xxchan22f@gmail.com> Date: Thu Jun 15 06:06:18 2023 +0200 chore: bump typos version and fix typos (#10342) commit 5cf94c9 Author: xxchan <xxchan22f@gmail.com> Date: Wed Jun 14 16:18:53 2023 +0200 feat: support scalar function in FROM clause (#10317) commit 9593d1b Author: Tesla Zhang <ice1000kotlin@foxmail.com> Date: Wed Jun 14 08:40:29 2023 -0400 refactor(plan_node_fmt): 4 more impls for Distill (#10296) commit 5b38239 Author: xxchan <xxchan22f@gmail.com> Date: Wed Jun 14 13:20:21 2023 +0200 fix: replace ouroboros with self_cell (#10316) commit 90ee868 Author: Xinjing Hu <honeta@qq.com> Date: Wed Jun 14 19:00:27 2023 +0800 feat(expr, agg): support `PERCENTILE_CONT`, `PERCENTILE_DISC` and `MODE` aggregation (#10252) Signed-off-by: Richard Chien <stdrc@outlook.com> Co-authored-by: Richard Chien <stdrc@outlook.com> Co-authored-by: Noel Kwan <47273164+kwannoel@users.noreply.github.com> commit e3fe51b Author: congyi wang <58715567+wcy-fdu@users.noreply.github.com> Date: Wed Jun 14 17:41:39 2023 +0800 refactor(log): change `aws_credential_types::cache::lazy_caching` log level to WARN (#10333) commit 33694b1 Author: stonepage <40830455+st1page@users.noreply.github.com> Date: Wed Jun 14 17:11:22 2023 +0800 refactor(binder): bind create table (#10307) commit 02a110c Author: Noel Kwan <47273164+kwannoel@users.noreply.github.com> Date: Wed Jun 14 16:18:10 2023 +0800 feat(storage): support replicated `LocalHummockStorage` (#10226) commit ede3278 Author: Richard Chien <stdrc@outlook.com> Date: Wed Jun 14 16:02:34 2023 +0800 refactor(common): add `MemcmpEncoded` struct to represent memcmp encoded data (#10319) Signed-off-by: Richard Chien <stdrc@outlook.com> commit ff91a4a Author: Li0k <yuli@singularity-data.com> Date: Wed Jun 14 15:56:21 2023 +0800 refactor(storage): refactor hummock timer loop (#10164) commit 353da76 Author: Richard Chien <stdrc@outlook.com> Date: Wed Jun 14 15:06:07 2023 +0800 fix(macro): support `derive(EstimateSize)` on tuple struct (#10318) Signed-off-by: Richard Chien <stdrc@outlook.com> Co-authored-by: Yuhao Su <31772373+yuhao-su@users.noreply.github.com> commit 7dd388b Author: Runji Wang <wangrunji0408@163.com> Date: Wed Jun 14 14:52:48 2023 +0800 doc(udf): document Java UDF (#10320) commit e4aec8b Author: xiangjinwu <17769960+xiangjinwu@users.noreply.github.com> Date: Wed Jun 14 14:15:36 2023 +0800 feat(binder): support `group by` output alias or index (#10305) commit 8eb0e43 Author: Huangjw <1223644280@qq.com> Date: Wed Jun 14 11:01:28 2023 +0800 fix(ci): fix release script (#10325) commit 86f734c Author: Shanicky Chen <peng@singularity-data.com> Date: Wed Jun 14 03:45:42 2023 +0800 fix: Increase timeout for end-to-end test (parallel) (dev mode) (#10308) Co-authored-by: xxchan <xxchan22f@gmail.com> commit e02ef6c Author: Eric Fu <eric@singularity-data.com> Date: Wed Jun 14 03:42:56 2023 +0800 fix: jemalloc profiling (#10314) Co-authored-by: xxchan <xxchan22f@gmail.com> commit 07f6b52 Author: xxchan <xxchan22f@gmail.com> Date: Tue Jun 13 21:31:21 2023 +0200 fix: use alias as table function's column name (#10311) commit 3017aa2 Author: xxchan <xxchan22f@gmail.com> Date: Tue Jun 13 20:58:37 2023 +0200 ci: download dependencies from s3 (#9782) commit f971965 Author: zwang28 <70626450+zwang28@users.noreply.github.com> Date: Tue Jun 13 19:24:45 2023 +0800 refactor(batch): maintain serving vnode mapping in meta node (#10004) commit 2b2950d Author: Zhanxiang (Patrick) Huang <hzxa21@hotmail.com> Date: Tue Jun 13 19:07:40 2023 +0800 refactor: replace minstant/minitrace with tokio instant/tracing (#10302) commit 9177034 Author: congyi wang <58715567+wcy-fdu@users.noreply.github.com> Date: Tue Jun 13 18:08:51 2023 +0800 feat(metrics): monitor s3 sdk retry (#9790) commit 16a0efc Author: Runji Wang <wangrunji0408@163.com> Date: Tue Jun 13 17:58:34 2023 +0800 feat(udf): Java UDF SDK (#10095) commit 2b2ea49 Author: Eric Fu <eric@singularity-data.com> Date: Tue Jun 13 17:19:35 2023 +0800 fix(metrics): incorrect FP rate (#10300) commit 54c660b Author: lmatz <lmatz823@gmail.com> Date: Tue Jun 13 16:57:07 2023 +0800 chore: remove enable_stream_row_count config (#10261) commit a6f38d9 Author: Shanicky Chen <peng@singularity-data.com> Date: Tue Jun 13 16:30:40 2023 +0800 feat: Add revision for rescheduling process (#10199) Signed-off-by: Shanicky Chen <peng@risingwave-labs.com> Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Previously, the delta lake java connector is implemented, but user cannot yet create the delta lake sink via SQL. In this PR, we enable user create delta lake sink via SQL.
Following is a demo to run the delta lake sink. Delta lake data are stored in minio. But it should be able to be stored in AWS S3.
./risedev d ci-iceberg-test
. Minio endpoint is at localhost:9301.deltalake
in the minio portal. Portal endpoint is at http://localhost:9400.psql -h localhost -p 4566 -d dev -U root
and then create source and sink. Data are randomly generated with the datagen source.flush
in the psql client to ensure that some data are flushed to the sink. Execute the following SQL from spark sql shell to query deltalake data from Spark.Checklist
./risedev check
(or alias,./risedev c
)Documentation
Click here for Documentation
Types of user-facing changes
Please keep the types that apply to your changes, and remove the others.
Release note