Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cost of GC Check #16470

Merged
merged 9 commits into from
Jun 3, 2024
Merged

Add cost of GC Check #16470

merged 9 commits into from
Jun 3, 2024

Conversation

LeftHandCold
Copy link
Contributor

@LeftHandCold LeftHandCold commented May 29, 2024

User description

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #16101

What this PR does / why we need it:

To avoid List() operations on oss, tke or s3, you need to add the Cost interface.


PR Type

Enhancement


Description

  • Added Cost method to FileService interface and implemented it across various file service types.
  • Introduced CostItem and CostAttr types to represent cost attributes.
  • Defined constants CostLow and CostHugh to categorize costs.
  • Added a cost check in the Check method of the checker to avoid high-cost operations.

Changes walkthrough 📝

Relevant files
Enhancement
file_service.go
Add cost attributes and method to FileService interface   

pkg/fileservice/file_service.go

  • Added Cost method to FileService interface.
  • Introduced CostItem and CostAttr types.
  • Defined constants CostLow and CostHugh.
  • +15/-0   
    file_services.go
    Implement Cost method in FileServices                                       

    pkg/fileservice/file_services.go

    • Implemented Cost method for FileServices.
    +6/-0     
    local_etl_fs.go
    Implement Cost method in LocalETLFS                                           

    pkg/fileservice/local_etl_fs.go

    • Implemented Cost method for LocalETLFS.
    +6/-0     
    local_fs.go
    Implement Cost method in LocalFS                                                 

    pkg/fileservice/local_fs.go

    • Implemented Cost method for LocalFS.
    +6/-0     
    memory_fs.go
    Implement Cost method in MemoryFS                                               

    pkg/fileservice/memory_fs.go

    • Implemented Cost method for MemoryFS.
    +6/-0     
    s3_fs.go
    Implement Cost method in S3FS                                                       

    pkg/fileservice/s3_fs.go

    • Implemented Cost method for S3FS.
    +6/-0     
    sub_path.go
    Implement Cost method in subPathFS                                             

    pkg/fileservice/sub_path.go

    • Implemented Cost method for subPathFS.
    +6/-0     
    check.go
    Add cost check in checker                                                               

    pkg/vm/engine/tae/db/gc/check.go

    • Added cost check in Check method of checker.
    +4/-0     

    💡 PR-Agent usage:
    Comment /help on the PR to get a list of all available PR-Agent tools and their descriptions

    Copy link

    PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

    PR Review 🔍

    ⏱️ Estimated effort to review [1-5]

    3, because the PR involves multiple changes across various files, introducing new types and methods which require understanding the existing architecture and ensuring the new logic integrates seamlessly without causing regressions.

    🧪 Relevant tests

    No

    ⚡ Possible issues

    Consistency in Naming: The constant CostHugh might be a typo and should be CostHigh for clarity and consistency.

    Hardcoded Cost Values: The cost values are hardcoded in the Cost() method implementations across different file services. This might not be flexible if the cost characteristics change or need to be configurable in the future.

    🔒 Security concerns

    No

    @matrix-meow matrix-meow added the size/S Denotes a PR that changes [10,99] lines label May 29, 2024
    Copy link

    PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

    PR Code Suggestions ✨

    CategorySuggestion                                                                                                                                    Score
    Maintainability
    Rename CostHugh to CostHigh to correct the spelling

    Consider renaming CostHugh to CostHigh to correct the spelling and improve code
    readability.

    pkg/fileservice/file_service.go [173]

    -CostHugh
    +CostHigh
     
    Suggestion importance[1-10]: 10

    Why: Corrects a clear spelling mistake, improving code readability and preventing potential confusion or errors in future development.

    10
    Possible bug
    Add a check to ensure the Cost method does not return nil before accessing its List attribute

    Consider adding a check to ensure that the Cost method does not return nil before
    accessing its List attribute to avoid potential runtime panics.

    pkg/fileservice/s3_fs.go [54-55]

    -if c.cleaner.fs.Service.Cost().List != fileservice.CostLow {
    +costAttr := c.cleaner.fs.Service.Cost()
    +if costAttr == nil || costAttr.List != fileservice.CostLow {
         return nil
     }
     
    Suggestion importance[1-10]: 9

    Why: This is a crucial suggestion to prevent runtime panics due to nil dereferencing, which is a common source of bugs in Go.

    9
    Enhancement
    Retrieve the actual cost attribute from the underlying file services instead of returning a hardcoded value

    Instead of returning a new CostAttr instance with a hardcoded value, consider retrieving
    the actual cost attribute from the underlying file services to provide more accurate
    information.

    pkg/fileservice/file_services.go [171-173]

    -return &CostAttr{
    -    List: CostLow,
    +costAttr := &CostAttr{List: CostLow}
    +for _, fs := range f.services {
    +    if fsCost := fs.Cost(); fsCost.List > costAttr.List {
    +        costAttr.List = fsCost.List
    +    }
     }
    +return costAttr
     
    Suggestion importance[1-10]: 7

    Why: This suggestion improves the accuracy and flexibility of the Cost method by dynamically determining the cost based on underlying services, though the original hardcoded value might still be intentional for simplification.

    7
    Add a SetCost method to the FileService interface to allow dynamic updates to the cost attributes

    Consider adding a SetCost method to the FileService interface to allow dynamic updates to
    the cost attributes, which can be useful for adjusting cost metrics at runtime.

    pkg/fileservice/file_service.go [64-65]

     // Cost returns the cost attr of the file service
     Cost() *CostAttr
    +// SetCost sets the cost attr of the file service
    +SetCost(cost *CostAttr)
     
    Suggestion importance[1-10]: 6

    Why: Adding a SetCost method could enhance the flexibility of the interface, allowing runtime adjustments to cost metrics. However, the necessity and impact of this change depend on the specific use case and architecture.

    6

    pkg/fileservice/file_service.go Outdated Show resolved Hide resolved
    pkg/fileservice/file_services.go Show resolved Hide resolved
    pkg/fileservice/sub_path.go Show resolved Hide resolved
    @mergify mergify bot merged commit 6c81a45 into matrixorigin:main Jun 3, 2024
    17 of 18 checks passed
    LeftHandCold added a commit to LeftHandCold/matrixone that referenced this pull request Jun 3, 2024
    To avoid List() operations on oss, tke or s3, you need to add the Cost interface.
    
    Approved by: @reusee, @XuPeng-SH
    XuPeng-SH pushed a commit to XuPeng-SH/matrixone that referenced this pull request Jun 4, 2024
    * GC needs to consume all the mo_snapshot tables (matrixorigin#16539)
    
    Each tenant of the current mo has a mo_snapshot table to store snapshot information. GC needs to consume all mo_snapshot tables.
    
    Approved by: @XuPeng-SH
    
    * append log for upgrade and sqlExecutoer (matrixorigin#16575)
    
    append log for upgrader and sqlExecutor
    
    Approved by: @daviszhen, @badboynt1, @zhangxu19830126, @m-schen
    
    * [enhancement] proxy: filter CNs that are not in working state. (matrixorigin#16558)
    
    1. filter CNs that are not in working state.
    2. add some logs for migration
    
    Approved by: @zhangxu19830126
    
    * fix lock service ut (matrixorigin#16517)
    
    fix lock service ut
    
    Approved by: @zhangxu19830126
    
    * Add cost of GC Check (matrixorigin#16470)
    
    To avoid List() operations on oss, tke or s3, you need to add the Cost interface.
    
    Approved by: @reusee, @XuPeng-SH
    
    * optimize explain info for tp/ap query (matrixorigin#16578)
    
    optimize explain info for tp/ap query
    
    Approved by: @daviszhen, @ouyuanning, @aunjgr
    
    * Bvt disable trace (matrixorigin#16581)
    
    aim to exclude the `system,system_metrics` part case.
    changes:
    1. move `cases/table/system_table_cases` system,system_metrics part into individule case file.
    
    Approved by: @heni02
    
    * remove log print from automaxprocs (matrixorigin#16546)
    
    remove log print from automaxprocs
    
    Approved by: @triump2020, @m-schen, @ouyuanning, @aunjgr, @zhangxu19830126
    
    * rmTag15901 (matrixorigin#16585)
    
    rm 15901
    
    Approved by: @heni02
    
    * remove some MustStrCol&MustBytesCol (matrixorigin#16361)
    
    Remove some unnecessary MustStrCol, MustBytesCol calls.
    
    Approved by: @daviszhen, @reusee, @m-schen, @aunjgr, @XuPeng-SH
    
    * add bvt tag (matrixorigin#16589)
    
    add bvt tag
    
    Approved by: @heni02, @aressu1985
    
    * fix a bug that cause load performance regression issue (matrixorigin#16600)
    
    fix a bug that cause load performance regression issue
    
    Approved by: @m-schen
    
    * add case for restore pub_sub (matrixorigin#16602)
    
    add case for restore pub_sub
    
    Approved by: @heni02
    
    * add shard service kernel (matrixorigin#16565)
    
    Add shardservice kernel.
    
    Approved by: @reusee, @m-schen, @daviszhen, @XuPeng-SH, @volgariver6, @badboynt1, @ouyuanning, @triump2020, @w-zr, @sukki37, @aunjgr, @fengttt
    
    * [BugFix]: Use L2DistanceSq instead of L2Distance during IndexScan (matrixorigin#16366)
    
    During `KNN Select` and `Mapping Entries to Centroids via CROSS_JOIN_L2`, we can make use of L2DistanceSq instead of L2Distance, as it avoids `Sqrt()`. We can see the improvement in QPS for SIFT128 from 90 to 100. However, for GIST960, the QPS did not change much.
    
    L2DistanceSq is suitable only when there is a comparison (ie ORDER BY), and when the absolute value (ie actual L2Distance) is not required.
    - In the case of `CROSS JOIN L2` we find the nearest centroid for the Entry using `L2DistanceSq`. `CROSS JOIN L2` is used in both INSERT and CREATE INDEX.
    - In the case of `KNN SELECT`, our query has ORDER BY L2_DISTANCE(...), which can make use of `L2DistanceSq` as the L2Distance value is not explicitly required.
    
    **NOTE:** L2DistanceSq is not suitable in Kmenas++ for Centroid Computation, as it will impact the centroids picked.
    
    Approved by: @heni02, @m-schen, @aunjgr, @badboynt1
    
    * add sharding metrics (matrixorigin#16606)
    
    add sharding metrics
    
    Approved by: @aptend
    
    * fix data race (matrixorigin#16608)
    
    fix data race
    
    Approved by: @reusee
    
    * Refactor reshape (matrixorigin#15879)
    
    Reshape objects block by block.
    
    Approved by: @XuPeng-SH
    
    * refactor system variables to support account isolation (matrixorigin#16551)
    
    - system variable now is account isolated
    - table `mo_mysql_compatibility_mode` only saves delta info between account's and cluster's default system variable values
    - always use session variable except `show global variables`
    
    Approved by: @daviszhen, @aunjgr, @aressu1985
    
    * fix merge
    
    * [cherry-pick-16594] : fix moc3399 (matrixorigin#16611)
    
    When truncate table, if the table does not have any auto-incr col, there is no need to call the Reset interface of increment_service
    
    Approved by: @ouyuanning
    
    * bump go to 1.22.3, fix make compose and optimize ut script (matrixorigin#16604)
    
    1. bump go version from 1.21.5 to 1.22.3
    2. fix `make compose` to make it work
    3. `make ut` will read `UT_WORKDIR` env variable to store report, it will be `$HOME` if `UT_WORKDIR` is empty
    
    Approved by: @zhangxu19830126, @sukki37
    
    * remove isMerge from build operator (matrixorigin#16622)
    
    remove isMerge from build operator
    
    Approved by: @m-schen
    
    ---------
    
    Co-authored-by: GreatRiver <2552853833@qq.com>
    Co-authored-by: qingxinhome <70939751+qingxinhome@users.noreply.github.com>
    Co-authored-by: LiuBo <g.user.lb@gmail.com>
    Co-authored-by: iamlinjunhong <49111204+iamlinjunhong@users.noreply.github.com>
    Co-authored-by: nitao <badboynt@126.com>
    Co-authored-by: Jackson <xzxiong@yeah.net>
    Co-authored-by: Ariznawlll <ariznawl@163.com>
    Co-authored-by: Wei Ziran <weiziran125@gmail.com>
    Co-authored-by: YANGGMM <www.yangzhao123@gmail.com>
    Co-authored-by: fagongzi <zhangxu19830126@gmail.com>
    Co-authored-by: Arjun Sunil Kumar <arjunsk@users.noreply.github.com>
    Co-authored-by: Kai Cao <ck89119@users.noreply.github.com>
    Co-authored-by: Jensen <jensenojs@qq.com>
    Co-authored-by: brown <endeavorjia@gmail.com>
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    None yet

    4 participants