sync to 1.2: supporting reader linear search on unsorted fake pk col #16461

gouhongshen · 2024-05-29T01:44:46Z

User description

What type of PR is this?

Which issue(s) this PR fixes:

issue ##13959

What this PR does / why we need it:

supporting reader linear search on unsorted fake pk col

PR Type

Enhancement, Tests

Description

Added multiple linear search functions for unordered vectors in search.go.
Updated filter handling in disttae/reader.go to use new blockio.ReadFilter structure.
Refactored filter handling in txn_table.go to accommodate new filter type.
Enhanced utility functions in util.go to support new filter structure and unsorted search functions.
Updated tests in util_test.go for new filter structure.
Refactored blockio/read.go to use ReadFilterSearchFuncType and updated ReadFilter structure.
Updated logtail/snapshot.go to replace nil filters with blockio.ReadFilter{}.
Added SQL test cases and results for linear search on fake primary key column.

Changes walkthrough 📝

Relevant files

Enhancement

6 files

search.go `Add linear search functions for unordered vectors.` pkg/container/vector/search.go Added multiple linear search functions for unordered vectors. Introduced factories for creating search functions based on value types. Implemented search functions for various data types including byte slices and prefixes.	+103/-0
reader.go `Update filter handling in disttae reader.` pkg/vm/engine/disttae/reader.go Replaced `nil` filters with `blockio.ReadFilter{}`. Updated filter evaluation logic to handle new filter structure. Modified block reading logic to use the new filter structure.	+15/-14
txn_table.go `Refactor filter handling in txn_table.` pkg/vm/engine/disttae/txn_table.go Replaced `ReadFilter` with `ReadFilterSearchFuncType`. Updated filter building logic to use new filter structure. Adjusted block read calls to accommodate new filter type.	+7/-8
util.go `Enhance util functions for new filter structure.` pkg/vm/engine/disttae/util.go Added support for unsorted search functions. Updated `getNonCompositePKSearchFuncByExpr` to return new filter structure. Modified linear search factory functions to handle new filter types.	+100/-49
read.go `Refactor blockio read functions for new filter structure.` pkg/vm/engine/tae/blockio/read.go Introduced `ReadFilterSearchFuncType` and updated `ReadFilter` structure. Modified `ReadByFilter` and `BlockRead` functions to use new filter structure.	+19/-5
snapshot.go `Update snapshot logtail for new filter structure.` pkg/vm/engine/tae/logtail/snapshot.go Replaced `nil` filters with `blockio.ReadFilter{}` in `BlockRead` calls.	+1/-1

Tests

3 files

util_test.go `Update util tests for new filter structure.` pkg/vm/engine/disttae/util_test.go Updated test to use `SortedSearchFunc` from new filter structure.	+1/-1
block_reader_filter.result `Add test results for fake PK linear search.` test/distributed/cases/disttae_filters/reader_filters/block_reader_filter.result Added test results for linear search on fake primary key column.	+24/-0
block_reader_filter.sql `Add SQL tests for fake PK linear search.` test/distributed/cases/disttae_filters/reader_filters/block_reader_filter.sql Added SQL test cases for linear search on fake primary key column.	+14/-0

💡 PR-Agent usage:
Comment /help on the PR to get a list of all available PR-Agent tools and their descriptions

codiumai-pr-agent-pro · 2024-05-29T01:45:26Z

PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

PR Review 🔍

⏱️ Estimated effort to review [1-5]	4, because the PR involves multiple complex changes across several files, including the addition of new search functions, updates to filter handling, and modifications to existing functions to accommodate new filter types. The changes impact core functionalities like search and filtering in vectors, which requires a thorough understanding of the existing architecture and careful review to ensure correctness and performance.
🧪 Relevant tests	Yes
⚡ Possible issues	Performance Concern: The addition of multiple linear search functions could potentially lead to performance degradation, especially for large datasets. Linear searches are generally less efficient than binary searches, particularly when dealing with unsorted data.
⚡ Possible issues	Possible Bug: The use of nil checks and conversions between different filter types and search functions must be carefully reviewed to ensure that there are no edge cases or conditions where the filters might not behave as expected, leading to incorrect query results or runtime errors.
🔒 Security concerns	No

codiumai-pr-agent-pro · 2024-05-29T01:46:09Z

PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

PR Code Suggestions ✨

Category	Suggestion	Score
Maintainability	Remove the duplicate import statement for `moerr` The import statement for `moerr` is duplicated. It should be removed to avoid redundancy and potential confusion. pkg/vm/engine/disttae/util.go [27] -"github.com/matrixorigin/matrixone/pkg/common/moerr" +# Remove the duplicate import statement Suggestion importance[1-10]: 10 Why: The import statement for `moerr` is indeed duplicated at lines 25 and 27. Removing the duplicate is crucial to avoid redundancy and potential confusion.	10
	Define the `buildFilter` and `buildUnsortedFilter` functions outside the main function to improve readability and maintainability In the `PKPersistedBetween` function, the `buildFilter` and `buildUnsortedFilter` functions can be defined outside the main function to improve readability and maintainability. pkg/vm/engine/disttae/txn_table.go [2247-2270] var filter blockio.ReadFilterSearchFuncType -buildFilter := func() blockio.ReadFilterSearchFuncType { +var unsortedFilter blockio.ReadFilterSearchFuncType + +func buildFilter(keys Keys, inExpr Expr, tbl txnTable) blockio.ReadFilterSearchFuncType { //keys must be sorted. keys.InplaceSort() bytes, _ := keys.MarshalBinary() filter := getNonCompositePKSearchFuncByExpr( inExpr, "pk", tbl.proc.Load()) return filter.SortedSearchFunc } -... -var unsortedFilter blockio.ReadFilterSearchFuncType -buildUnsortedFilter := func() blockio.ReadFilterSearchFuncType { + +func buildUnsortedFilter(keys Keys) blockio.ReadFilterSearchFuncType { return getNonSortedPKSearchFuncByPKVec(keys) } Suggestion importance[1-10]: 6 Why: Moving these functions outside the main function scope would indeed improve readability and maintainability. However, the impact on functionality is minimal, so it gets a moderate score.	6
	Add comments to explain the purpose of each `SELECT` statement To improve readability and maintainability, consider adding comments to explain the purpose of each SQL statement, especially the `SELECT` statements. test/distributed/cases/disttae_filters/reader_filters/block_reader_filter.sql [10-12] +-- Select rows where b equals 1 select a from t1 where b = 1; +-- Select rows where b is between 1 and 3 select a from t1 where b between 1 and 3; +-- Select rows where b is in the set (1, 2, 3) select a from t1 where b in (1,2,3); Suggestion importance[1-10]: 5 Why: Adding comments improves the readability and maintainability of the code. However, the impact on functionality is minimal, so it receives a moderate score.	5
Possible bug	Check for `nil` before assigning functions to `filter` to avoid potential runtime errors The `sortedSearchFunc` and `unSortedSearchFunc` should be checked for `nil` before being assigned to `filter.SortedSearchFunc` and `filter.UnSortedSearchFunc` to avoid potential runtime errors. pkg/vm/engine/disttae/util.go [696-700] -filter.SortedSearchFunc = func(vecs []vector.Vector) []int32 { - return sortedSearchFunc(vecs[0]) -} -filter.UnSortedSearchFunc = func(vecs []vector.Vector) []int32 { - return unSortedSearchFunc(vecs[0]) +if sortedSearchFunc != nil && unSortedSearchFunc != nil { + filter.SortedSearchFunc = func(vecs []vector.Vector) []int32 { + return sortedSearchFunc(vecs[0]) + } + filter.UnSortedSearchFunc = func(vecs []vector.Vector) []int32 { + return unSortedSearchFunc(vecs[0]) + } } Suggestion importance[1-10]: 9 Why: Checking `sortedSearchFunc` and `unSortedSearchFunc` for `nil` before using them in `filter` is essential to prevent runtime errors from nil pointer dereferences. This suggestion addresses a potential major bug.	9
Possible bug	Initialize function variables to `nil` to avoid potential nil pointer dereferences The `sortedSearchFunc` and `unSortedSearchFunc` variables should be initialized to `nil` to ensure they are explicitly set before use, avoiding potential nil pointer dereferences. pkg/vm/engine/disttae/util.go [540] -var sortedSearchFunc, unSortedSearchFunc func(vector.Vector) []int32 +var sortedSearchFunc, unSortedSearchFunc func(vector.Vector) []int32 = nil, nil Suggestion importance[1-10]: 7 Why: Initializing `sortedSearchFunc` and `unSortedSearchFunc` to `nil` is a good practice to ensure they are explicitly set before use, preventing nil pointer dereferences. However, the Go language initializes function variables to `nil` by default, so this is more about explicitness than a requirement.	7
Best practice	Initialize the `filter` variable at the beginning of the function to ensure it is always defined The `filter` variable should be initialized at the beginning of the function to ensure it is always defined, even if the `valExpr` is `nil`. pkg/vm/engine/disttae/util.go [531-534] +var filter blockio.ReadFilter valExpr := getPkExpr(expr, pkName, proc) if valExpr == nil { - return false, false, blockio.ReadFilter{} + return false, false, filter } Suggestion importance[1-10]: 8 Why: Initializing `filter` at the beginning of the function ensures that it is always defined, which is crucial for the function's robustness, especially when `valExpr` is `nil`.	8
Best practice	Use `IF NOT EXISTS` and `IF EXISTS` clauses to make the `CREATE` and `DROP` statements more robust To ensure that the database and table are created and dropped only if they do not already exist, consider using `IF NOT EXISTS` and `IF EXISTS` clauses for the `CREATE` and `DROP` statements respectively. test/distributed/cases/disttae_filters/reader_filters/block_reader_filter.sql [1-14] drop database if exists testdb; -create database testdb; +create database if not exists testdb; ... -drop table t1; -drop database testdb; +drop table if exists t1; +drop database if exists testdb; Suggestion importance[1-10]: 7 Why: The suggestion correctly identifies a best practice for handling database and table creation and deletion, ensuring operations are idempotent. However, the original code already uses `IF EXISTS` for the `DROP` statements, so the improvement is mainly for the `CREATE` statement.	7
Performance	Optimize the nested loops in the `UnOrderedLinearSearchOffsetByValFactory` function by using a map for faster lookup In the `UnOrderedLinearSearchOffsetByValFactory` function, the nested loops can be optimized by using a map to store the values for faster lookup, which will improve the search performance. pkg/container/vector/search.go [36-42] +valMap := make(map[T]struct{}, len(vals)) +for _, val := range vals { + valMap[val] = struct{}{} +} for x := range rows { - for y := range vals { - if (cmp != nil && cmp(rows[x], vals[y]) == 0) \|\| (cmp == nil && rows[x] == vals[y]) { - sels = append(sels, int32(x)) - break - } + if _, found := valMap[rows[x]]; found { + sels = append(sels, int32(x)) } } Suggestion importance[1-10]: 8 Why: This is a valid optimization suggestion that would significantly improve the performance of the search function by reducing the complexity from O(n*m) to O(n+m).	8
Performance	Consolidate the `filter.Valid` check to avoid redundant evaluations In the `getPKFilter` function, the `filter.Valid` check is performed multiple times. It would be more efficient to perform this check once and store the result in a variable to avoid redundant evaluations. pkg/vm/engine/disttae/reader.go [144-157] -if !ok \|\| !filter.Valid { +isValidFilter := ok && filter.Valid +if !isValidFilter { mixin.filterState.evaluated = true mixin.filterState.filter = blockio.ReadFilter{} mixin.filterState.hasNull = hasNull return blockio.ReadFilter{} } ... -if filter.Valid { +if isValidFilter { mixin.filterState.filter = filter } Suggestion importance[1-10]: 7 Why: The suggestion correctly identifies a potential performance improvement by reducing redundant checks of `filter.Valid`. This change would make the code more efficient and cleaner.	7
Enhancement	Add a verification step to check the row count after inserting test data To ensure the test data is correctly inserted, consider adding a verification step after the `INSERT` statement to check the row count in the table. test/distributed/cases/disttae_filters/reader_filters/block_reader_filter.sql [7] insert into t1 select , from generate_series(1, 8192000)g; +select count() from t1; Suggestion importance[1-10]: 6* Why: Adding a verification step after data insertion is a good practice for ensuring data integrity, especially in test environments. This suggestion is practical and enhances the reliability of the test setup.	6
Enhancement	Add a verification step to check the status of the `mo_ctl` flush operation To ensure the `flush` operation is successful, consider adding a verification step to check the status of the `mo_ctl` command. test/distributed/cases/disttae_filters/reader_filters/block_reader_filter.sql [9] select mo_ctl("dn", "flush", "testdb.t1"); +-- Verify the flush operation +select * from mo_ctl_status where table_name = 't1'; Suggestion importance[1-10]: 6 Why: This suggestion enhances the robustness of the script by verifying the success of a critical operation. It's a valuable addition for ensuring the expected behavior of database operations.	6
Readability	Simplify the `searchFunc` assignment logic in the `BlockRead` function using a single if-else block In the `BlockRead` function, the `searchFunc` assignment logic can be simplified by using a single if-else block to improve readability. pkg/vm/engine/tae/blockio/read.go [140-152] var searchFunc ReadFilterSearchFuncType if (filter.HasFakePK \|\| !info.Sorted) && filter.UnSortedSearchFunc != nil { searchFunc = filter.UnSortedSearchFunc } else if info.Sorted && filter.SortedSearchFunc != nil { searchFunc = filter.SortedSearchFunc } -if searchFunc != nil { - if sels, err = ReadByFilter( - ctx, info, inputDeletes, filterSeqnums, filterColTypes, - types.TimestampToTS(ts), searchFunc, fs, mp, - ); err != nil { - return nil, err - } +if searchFunc == nil { + return nil, errors.New("no valid search function found") } +if sels, err = ReadByFilter( + ctx, info, inputDeletes, filterSeqnums, filterColTypes, + types.TimestampToTS(ts), searchFunc, fs, mp, +); err != nil { + return nil, err +} + Suggestion importance[1-10]: 5 Why: While the suggestion aims to improve readability, the proposed change does not significantly simplify the existing logic and introduces a new error check that was not present before. This could potentially alter the behavior of the function.	5

gouhongshen added 6 commits May 29, 2024 09:39

linear search done

229be8e

remove comment

2be390f

fix ut

c207422

fix type err and sca

9c611ed

adding bvt case for fake_pk_col unsorted search

3d9e63e

fix bvt

36925b1

gouhongshen requested review from XuPeng-SH, LeftHandCold, triump2020, aunjgr, aressu1985 and heni02 as code owners May 29, 2024 01:44

codiumai-pr-agent-pro bot added Tests Enhancement labels May 29, 2024

mergify bot requested a review from sukki37 May 29, 2024 01:45

mergify bot added the kind/enhancement label May 29, 2024

codiumai-pr-agent-pro bot added the Review effort [1-5]: 4 label May 29, 2024

mergify bot mentioned this pull request May 29, 2024

sync to 1.2: supporting reader linear search on unsorted fake pk col #16460

Closed

7 tasks

matrix-meow added the size/M Denotes a PR that changes [100,499] lines label May 29, 2024

add more bvt cases

885ac8d

aressu1985 approved these changes May 29, 2024

View reviewed changes

aunjgr approved these changes May 29, 2024

View reviewed changes

XuPeng-SH approved these changes May 29, 2024

View reviewed changes

Merge branch '1.2-dev' into support_linear_search_on_fake_pk_col_to_1.2

85eeb27

sukki37 approved these changes May 29, 2024

View reviewed changes

gouhongshen and others added 3 commits May 30, 2024 09:41

Merge branch '1.2-dev' into support_linear_search_on_fake_pk_col_to_1.2

69f4b03

Merge branch '1.2-dev' into support_linear_search_on_fake_pk_col_to_1.2

47fc98b

Merge branch '1.2-dev' into support_linear_search_on_fake_pk_col_to_1.2

b134f84

gouhongshen and others added 2 commits May 30, 2024 20:43

Merge branch '1.2-dev' into support_linear_search_on_fake_pk_col_to_1.2

43fa1d8

Merge branch '1.2-dev' into support_linear_search_on_fake_pk_col_to_1.2

9f6d6b7

mergify bot merged commit c317b77 into matrixorigin:1.2-dev May 30, 2024
17 of 18 checks passed

aylei mentioned this pull request Jun 5, 2024

1.2 dev #16688

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync to 1.2: supporting reader linear search on unsorted fake pk col #16461

sync to 1.2: supporting reader linear search on unsorted fake pk col #16461

gouhongshen commented May 29, 2024 •

edited by codiumai-pr-agent-pro bot

Loading

codiumai-pr-agent-pro bot commented May 29, 2024

codiumai-pr-agent-pro bot commented May 29, 2024

sync to 1.2: supporting reader linear search on unsorted fake pk col #16461

sync to 1.2: supporting reader linear search on unsorted fake pk col #16461

Conversation

gouhongshen commented May 29, 2024 • edited by codiumai-pr-agent-pro bot Loading

User description

What type of PR is this?

Which issue(s) this PR fixes:

What this PR does / why we need it:

PR Type

Description

Changes walkthrough 📝

codiumai-pr-agent-pro bot commented May 29, 2024

PR Review 🔍

codiumai-pr-agent-pro bot commented May 29, 2024

PR Code Suggestions ✨

gouhongshen commented May 29, 2024 •

edited by codiumai-pr-agent-pro bot

Loading