Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix data race #16608

Merged
merged 2 commits into from
Jun 4, 2024
Merged

Conversation

zhangxu19830126
Copy link
Contributor

@zhangxu19830126 zhangxu19830126 commented Jun 4, 2024

User description

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #16141

What this PR does / why we need it:

fix data race


PR Type

Bug fix


Description

  • Added a new disconnectedC channel to the clientSession struct to handle disconnections more gracefully.
  • Modified the fetch function within startWriteLoop to return a boolean indicating if the session should be closed, improving control flow.
  • Updated the startWriteLoop method to check the new disconnectedC channel and handle the boolean return value from fetch, ensuring proper session closure.
  • Introduced a disconnected method to the clientSession struct to signal when a session is disconnected, preventing potential data races.

Changes walkthrough 📝

Relevant files
Bug fix
server.go
Fix data race by adding disconnection handling and improving session
management.

pkg/common/morpc/server.go

  • Added a disconnectedC channel to clientSession struct.
  • Modified fetch function to return a boolean indicating if the session
    should be closed.
  • Updated startWriteLoop to handle the new disconnectedC channel and the
    boolean return value from fetch.
  • Introduced a disconnected method to clientSession to signal
    disconnection.
  • +97/-85 

    💡 PR-Agent usage:
    Comment /help on the PR to get a list of all available PR-Agent tools and their descriptions

    @matrix-meow matrix-meow added the size/M Denotes a PR that changes [100,499] lines label Jun 4, 2024
    Copy link

    PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

    PR Review 🔍

    ⏱️ Estimated effort to review [1-5]

    3, because the PR involves multiple changes across the session management logic, including modifications to control flow and synchronization mechanisms. Understanding the context and ensuring that the changes resolve the data race without introducing new issues requires a moderate level of effort.

    🧪 Relevant tests

    No

    ⚡ Possible issues

    Possible Bug: The fetch function now returns a boolean to indicate whether the session should be closed, but the handling of this return value might not cover all edge cases, especially under high load or unusual network conditions.

    Data Race: While the introduction of disconnectedC aims to handle disconnections better, the use of channels without proper synchronization might still lead to race conditions if not handled carefully in all scenarios where it's accessed.

    🔒 Security concerns

    No

    @mergify mergify bot added the kind/bug Something isn't working label Jun 4, 2024
    Copy link

    PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

    PR Code Suggestions ✨

    CategorySuggestion                                                                                                                                    Score
    Performance
    Combine multiple select statements into a single select with multiple cases for better readability and efficiency

    The fetch function can be optimized by reducing the number of select statements. Instead
    of having multiple select statements, combine them into a single select with multiple
    cases. This will make the code more readable and potentially more efficient.

    pkg/common/morpc/server.go [315-327]

     select {
    -    case <-ctx.Done():
    -        responses = nil
    -        return true
    -    case <-cs.ctx.Done():
    -        responses = nil
    -        return true
    -    case <-cs.disconnectedC:
    +    case <-ctx.Done(), <-cs.ctx.Done(), <-cs.disconnectedC:
             responses = nil
             return true
         case resp, ok := <-cs.c:
             if ok {
                 responses = append(responses, resp)
             }
         default:
             return false
     }
     
    Suggestion importance[1-10]: 8

    Why: The suggestion correctly identifies an opportunity to simplify and optimize the code by combining multiple select cases into one, which improves readability and efficiency.

    8
    Add a small sleep duration in the loop to prevent high CPU usage

    In the startWriteLoop function, the fetch function is called in a loop. Consider adding a
    small sleep duration in the loop to prevent it from running too frequently and consuming
    too much CPU.

    pkg/common/morpc/server.go [334-429]

     for {
         closed := fetch()
         ...
         if closed {
             return
         }
    +    time.Sleep(10 * time.Millisecond) // Adjust the duration as needed
     }
     
    Suggestion importance[1-10]: 5

    Why: Adding a sleep to manage CPU usage could be beneficial in some scenarios, but it might not be the best approach for all use cases and could introduce unnecessary delays.

    5
    Possible bug
    Add a check to ensure the channel is not closed before sending to it in the disconnected method

    The disconnected method in clientSession can be improved by adding a check to ensure that
    the channel is not closed before sending to it, to avoid potential panics.

    pkg/common/morpc/server.go [569-573]

     func (cs *clientSession) disconnected() {
    +    select {
    +    case <-cs.ctx.Done():
    +        return
    +    default:
    +    }
         select {
         case cs.disconnectedC <- struct{}{}:
         default:
         }
     }
     
    Suggestion importance[1-10]: 7

    Why: This suggestion addresses a potential bug where sending on a closed channel could cause a panic. Adding a check before sending enhances robustness.

    7
    Maintainability
    Create a helper function to handle setting responses to nil to reduce redundancy

    In the fetch function, the responses slice is set to nil in multiple places. Instead of
    repeating this line, consider creating a helper function to handle this logic, which will
    make the code cleaner and reduce redundancy.

    pkg/common/morpc/server.go [283-331]

    -responses = nil
    -return true
    +handleResponsesNil := func() bool {
    +    responses = nil
    +    return true
    +}
    +...
    +select {
    +    case <-ctx.Done():
    +        return handleResponsesNil()
    +    case <-cs.ctx.Done():
    +        return handleResponsesNil()
    +    case <-cs.disconnectedC:
    +        return handleResponsesNil()
    +    case resp, ok := <-cs.c:
    +        if ok {
    +            responses = append(responses, resp)
    +        }
    +    default:
    +        return false
    +}
     
    Suggestion importance[1-10]: 6

    Why: The suggestion to create a helper function is valid for reducing redundancy and improving maintainability, though it's not critical for functionality.

    6

    @mergify mergify bot merged commit b076d9c into matrixorigin:main Jun 4, 2024
    17 of 18 checks passed
    XuPeng-SH pushed a commit to XuPeng-SH/matrixone that referenced this pull request Jun 4, 2024
    * GC needs to consume all the mo_snapshot tables (matrixorigin#16539)
    
    Each tenant of the current mo has a mo_snapshot table to store snapshot information. GC needs to consume all mo_snapshot tables.
    
    Approved by: @XuPeng-SH
    
    * append log for upgrade and sqlExecutoer (matrixorigin#16575)
    
    append log for upgrader and sqlExecutor
    
    Approved by: @daviszhen, @badboynt1, @zhangxu19830126, @m-schen
    
    * [enhancement] proxy: filter CNs that are not in working state. (matrixorigin#16558)
    
    1. filter CNs that are not in working state.
    2. add some logs for migration
    
    Approved by: @zhangxu19830126
    
    * fix lock service ut (matrixorigin#16517)
    
    fix lock service ut
    
    Approved by: @zhangxu19830126
    
    * Add cost of GC Check (matrixorigin#16470)
    
    To avoid List() operations on oss, tke or s3, you need to add the Cost interface.
    
    Approved by: @reusee, @XuPeng-SH
    
    * optimize explain info for tp/ap query (matrixorigin#16578)
    
    optimize explain info for tp/ap query
    
    Approved by: @daviszhen, @ouyuanning, @aunjgr
    
    * Bvt disable trace (matrixorigin#16581)
    
    aim to exclude the `system,system_metrics` part case.
    changes:
    1. move `cases/table/system_table_cases` system,system_metrics part into individule case file.
    
    Approved by: @heni02
    
    * remove log print from automaxprocs (matrixorigin#16546)
    
    remove log print from automaxprocs
    
    Approved by: @triump2020, @m-schen, @ouyuanning, @aunjgr, @zhangxu19830126
    
    * rmTag15901 (matrixorigin#16585)
    
    rm 15901
    
    Approved by: @heni02
    
    * remove some MustStrCol&MustBytesCol (matrixorigin#16361)
    
    Remove some unnecessary MustStrCol, MustBytesCol calls.
    
    Approved by: @daviszhen, @reusee, @m-schen, @aunjgr, @XuPeng-SH
    
    * add bvt tag (matrixorigin#16589)
    
    add bvt tag
    
    Approved by: @heni02, @aressu1985
    
    * fix a bug that cause load performance regression issue (matrixorigin#16600)
    
    fix a bug that cause load performance regression issue
    
    Approved by: @m-schen
    
    * add case for restore pub_sub (matrixorigin#16602)
    
    add case for restore pub_sub
    
    Approved by: @heni02
    
    * add shard service kernel (matrixorigin#16565)
    
    Add shardservice kernel.
    
    Approved by: @reusee, @m-schen, @daviszhen, @XuPeng-SH, @volgariver6, @badboynt1, @ouyuanning, @triump2020, @w-zr, @sukki37, @aunjgr, @fengttt
    
    * [BugFix]: Use L2DistanceSq instead of L2Distance during IndexScan (matrixorigin#16366)
    
    During `KNN Select` and `Mapping Entries to Centroids via CROSS_JOIN_L2`, we can make use of L2DistanceSq instead of L2Distance, as it avoids `Sqrt()`. We can see the improvement in QPS for SIFT128 from 90 to 100. However, for GIST960, the QPS did not change much.
    
    L2DistanceSq is suitable only when there is a comparison (ie ORDER BY), and when the absolute value (ie actual L2Distance) is not required.
    - In the case of `CROSS JOIN L2` we find the nearest centroid for the Entry using `L2DistanceSq`. `CROSS JOIN L2` is used in both INSERT and CREATE INDEX.
    - In the case of `KNN SELECT`, our query has ORDER BY L2_DISTANCE(...), which can make use of `L2DistanceSq` as the L2Distance value is not explicitly required.
    
    **NOTE:** L2DistanceSq is not suitable in Kmenas++ for Centroid Computation, as it will impact the centroids picked.
    
    Approved by: @heni02, @m-schen, @aunjgr, @badboynt1
    
    * add sharding metrics (matrixorigin#16606)
    
    add sharding metrics
    
    Approved by: @aptend
    
    * fix data race (matrixorigin#16608)
    
    fix data race
    
    Approved by: @reusee
    
    * Refactor reshape (matrixorigin#15879)
    
    Reshape objects block by block.
    
    Approved by: @XuPeng-SH
    
    * refactor system variables to support account isolation (matrixorigin#16551)
    
    - system variable now is account isolated
    - table `mo_mysql_compatibility_mode` only saves delta info between account's and cluster's default system variable values
    - always use session variable except `show global variables`
    
    Approved by: @daviszhen, @aunjgr, @aressu1985
    
    * fix merge
    
    * [cherry-pick-16594] : fix moc3399 (matrixorigin#16611)
    
    When truncate table, if the table does not have any auto-incr col, there is no need to call the Reset interface of increment_service
    
    Approved by: @ouyuanning
    
    * bump go to 1.22.3, fix make compose and optimize ut script (matrixorigin#16604)
    
    1. bump go version from 1.21.5 to 1.22.3
    2. fix `make compose` to make it work
    3. `make ut` will read `UT_WORKDIR` env variable to store report, it will be `$HOME` if `UT_WORKDIR` is empty
    
    Approved by: @zhangxu19830126, @sukki37
    
    * remove isMerge from build operator (matrixorigin#16622)
    
    remove isMerge from build operator
    
    Approved by: @m-schen
    
    ---------
    
    Co-authored-by: GreatRiver <2552853833@qq.com>
    Co-authored-by: qingxinhome <70939751+qingxinhome@users.noreply.github.com>
    Co-authored-by: LiuBo <g.user.lb@gmail.com>
    Co-authored-by: iamlinjunhong <49111204+iamlinjunhong@users.noreply.github.com>
    Co-authored-by: nitao <badboynt@126.com>
    Co-authored-by: Jackson <xzxiong@yeah.net>
    Co-authored-by: Ariznawlll <ariznawl@163.com>
    Co-authored-by: Wei Ziran <weiziran125@gmail.com>
    Co-authored-by: YANGGMM <www.yangzhao123@gmail.com>
    Co-authored-by: fagongzi <zhangxu19830126@gmail.com>
    Co-authored-by: Arjun Sunil Kumar <arjunsk@users.noreply.github.com>
    Co-authored-by: Kai Cao <ck89119@users.noreply.github.com>
    Co-authored-by: Jensen <jensenojs@qq.com>
    Co-authored-by: brown <endeavorjia@gmail.com>
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    Bug fix kind/bug Something isn't working Review effort [1-5]: 3 size/M Denotes a PR that changes [100,499] lines
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    None yet

    3 participants