Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[spark3] support insert && append on keyed table #16

Closed
baiyangtx opened this issue Jul 19, 2022 · 0 comments
Closed

[spark3] support insert && append on keyed table #16

baiyangtx opened this issue Jul 19, 2022 · 0 comments
Assignees
Labels
module:mixed-spark Spark module for Mixed Format type:enhancement New feature or request
Milestone

Comments

@baiyangtx
Copy link
Contributor

support insert into sql and dataframe append for keyed table.

insert into and append action for keyed table is append to keyed table change store

@baiyangtx baiyangtx added the module:mixed-spark Spark module for Mixed Format label Jul 19, 2022
@baiyangtx baiyangtx added this to the Release 0.3.1 milestone Jul 19, 2022
@baiyangtx baiyangtx added the type:enhancement New feature or request label Jul 19, 2022
shidayang pushed a commit to shidayang/arctic that referenced this issue Nov 26, 2022
…g-optimize

close writer/reader safely in iceberg optimize executor
zhoujinsong added a commit that referenced this issue Nov 29, 2022
* add API of get iceberg sequenceNumber

* resolve #741 add optimize task content

* resolve #741 fix task content

* resolve #741 add method in IcebergContentFile

* remove template definition form IcebergContentFile

* read/write interface

* add iceberg minor executor

* refactor optimize executor

* implement SequenceNumberFetcher and add unit test

* resolve #741 iceberg optimize plan and commit

* add more SequenceNumberFetcherTest

* fix #747 fix local test

* fix #747 fix local test

* refactor ManifestReader to TableEntriesScan

* Add iceberg combined read/write

* Add iceberg combined read/write

* add unit test for TableEntriesScan

* fix #747 add partition table unit test

* iceberg minor optimize only return tasks with at least 2 files

* Add iceberg combined read/write

* add unit tests for iceberg optimize executor

* fix logger name

* implement iceberg optimize executor

* fix #747 rename full optimize for iceberg table

* fix format properties parse error

* Add iceberg combined read/write

* fix unit test error

* Add iceberg combined read/write

* fix #636 fix local test

* add native iceberg test case into OptimizeIntegrationTest

* close reader/writer safely in optimize executor

* fix postional delete file duplicate error

* Set table properties to writer in executor

* fix check style (#8)

* add plan log

* fix check style

Co-authored-by: luting <dylzlt93299@gmail.com>

* Add native iceberg table optimize test case (#9)

* add native iceberg full optimize unit test

* add native iceberg optimize test case for
1.Minor Optimize: compact small files
2.Minor Optimize: compact small file with eq-delete file
3.Minor Optimize: compact big file with eq-delete files (get pos-delete file)
4.Full Optimize: compact big file with pos-delete file and eq-delete file

Co-authored-by: wangtao <wangtao3@corp.netease.com>

* fix repeat data files (#10)

* add plan log

* fix check style

* fix repeat data files

Co-authored-by: luting <dylzlt93299@gmail.com>

* fix native iceberg full optimize not remove old datafile (#11)

* add native iceberg full optimize unit test

* add native iceberg optimize test case for
1.Minor Optimize: compact small files
2.Minor Optimize: compact small file with eq-delete file
3.Minor Optimize: compact big file with eq-delete files (get pos-delete file)
4.Full Optimize: compact big file with pos-delete file and eq-delete file

* fix native iceberg full optimize not remove old datafile

Co-authored-by: wangtao <wangtao3@corp.netease.com>

* Add iceberg combined read/write (#12)

* add unit test for IcebergFanoutPosDeleteWriter

* Modify Optimize properties and add more test case (#13)

* add native iceberg full optimize unit test

* add native iceberg optimize test case for
1.Minor Optimize: compact small files
2.Minor Optimize: compact small file with eq-delete file
3.Minor Optimize: compact big file with eq-delete files (get pos-delete file)
4.Full Optimize: compact big file with pos-delete file and eq-delete file

* fix native iceberg full optimize not remove old datafile

* as head

* add test

* to get 0 size file's sequence

* add integration test case for native iceberg full optimize

* refactor optimize unit test

* refactor optimize properties

Co-authored-by: wangtao <wangtao3@corp.netease.com>

* fix fanout pos delete writer unit test error

* Reduce optimize plan task cost time (#14)

* add plan log

* fix check style

* fix repeat data files

* optimize plan cost time

Co-authored-by: luting <dylzlt93299@gmail.com>

* Remove useless method (#15)

* add plan log

* fix check style

* fix repeat data files

* optimize plan cost time

* fix review

* remove useless code

Co-authored-by: luting <dylzlt93299@gmail.com>

* 1.add test case for iceberg v1 table optimizing (#16)

2.rename ManifestEntry to IcebergFileEntry and remove useless Status
3.add more docs

* Fix load table cache (#17)

* add plan log

* fix check style

* fix repeat data files

* optimize plan cost time

* fix review

* remove useless code

* fix load table cache

Co-authored-by: luting <dylzlt93299@gmail.com>

* fix comments for combined iceberg reader

Co-authored-by: wangtao <wangtao3@corp.netease.com>
Co-authored-by: luting <luting@corp.netease.com>
Co-authored-by: luting <dylzlt93299@gmail.com>
Co-authored-by: shidayang <530847445@qq.com>
Co-authored-by: luting <1004611953@qq.com>
Co-authored-by: wangtaohz <103108928+wangtaohz@users.noreply.github.com>
zhoujinsong added a commit that referenced this issue May 31, 2023
* add API of get iceberg sequenceNumber

* resolve #741 add optimize task content

* resolve #741 fix task content

* resolve #741 add method in IcebergContentFile

* remove template definition form IcebergContentFile

* read/write interface

* add iceberg minor executor

* refactor optimize executor

* implement SequenceNumberFetcher and add unit test

* resolve #741 iceberg optimize plan and commit

* add more SequenceNumberFetcherTest

* fix #747 fix local test

* fix #747 fix local test

* refactor ManifestReader to TableEntriesScan

* Add iceberg combined read/write

* Add iceberg combined read/write

* add unit test for TableEntriesScan

* fix #747 add partition table unit test

* iceberg minor optimize only return tasks with at least 2 files

* Add iceberg combined read/write

* add unit tests for iceberg optimize executor

* fix logger name

* implement iceberg optimize executor

* fix #747 rename full optimize for iceberg table

* fix format properties parse error

* Add iceberg combined read/write

* fix unit test error

* Add iceberg combined read/write

* fix #636 fix local test

* add native iceberg test case into OptimizeIntegrationTest

* close reader/writer safely in optimize executor

* fix postional delete file duplicate error

* Set table properties to writer in executor

* fix check style (#8)

* add plan log

* fix check style

Co-authored-by: luting <dylzlt93299@gmail.com>

* Add native iceberg table optimize test case (#9)

* add native iceberg full optimize unit test

* add native iceberg optimize test case for
1.Minor Optimize: compact small files
2.Minor Optimize: compact small file with eq-delete file
3.Minor Optimize: compact big file with eq-delete files (get pos-delete file)
4.Full Optimize: compact big file with pos-delete file and eq-delete file

Co-authored-by: wangtao <wangtao3@corp.netease.com>

* fix repeat data files (#10)

* add plan log

* fix check style

* fix repeat data files

Co-authored-by: luting <dylzlt93299@gmail.com>

* fix native iceberg full optimize not remove old datafile (#11)

* add native iceberg full optimize unit test

* add native iceberg optimize test case for
1.Minor Optimize: compact small files
2.Minor Optimize: compact small file with eq-delete file
3.Minor Optimize: compact big file with eq-delete files (get pos-delete file)
4.Full Optimize: compact big file with pos-delete file and eq-delete file

* fix native iceberg full optimize not remove old datafile

Co-authored-by: wangtao <wangtao3@corp.netease.com>

* Add iceberg combined read/write (#12)

* add unit test for IcebergFanoutPosDeleteWriter

* Modify Optimize properties and add more test case (#13)

* add native iceberg full optimize unit test

* add native iceberg optimize test case for
1.Minor Optimize: compact small files
2.Minor Optimize: compact small file with eq-delete file
3.Minor Optimize: compact big file with eq-delete files (get pos-delete file)
4.Full Optimize: compact big file with pos-delete file and eq-delete file

* fix native iceberg full optimize not remove old datafile

* as head

* add test

* to get 0 size file's sequence

* add integration test case for native iceberg full optimize

* refactor optimize unit test

* refactor optimize properties

Co-authored-by: wangtao <wangtao3@corp.netease.com>

* fix fanout pos delete writer unit test error

* Reduce optimize plan task cost time (#14)

* add plan log

* fix check style

* fix repeat data files

* optimize plan cost time

Co-authored-by: luting <dylzlt93299@gmail.com>

* Remove useless method (#15)

* add plan log

* fix check style

* fix repeat data files

* optimize plan cost time

* fix review

* remove useless code

Co-authored-by: luting <dylzlt93299@gmail.com>

* 1.add test case for iceberg v1 table optimizing (#16)

2.rename ManifestEntry to IcebergFileEntry and remove useless Status
3.add more docs

* Fix load table cache (#17)

* add plan log

* fix check style

* fix repeat data files

* optimize plan cost time

* fix review

* remove useless code

* fix load table cache

Co-authored-by: luting <dylzlt93299@gmail.com>

* fix comments for combined iceberg reader

Co-authored-by: wangtao <wangtao3@corp.netease.com>
Co-authored-by: luting <luting@corp.netease.com>
Co-authored-by: luting <dylzlt93299@gmail.com>
Co-authored-by: shidayang <530847445@qq.com>
Co-authored-by: luting <1004611953@qq.com>
Co-authored-by: wangtaohz <103108928+wangtaohz@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:mixed-spark Spark module for Mixed Format type:enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants