Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the estimation of tcp packet num send back to client #16364

Merged
merged 9 commits into from
May 27, 2024

Conversation

xzxiong
Copy link
Contributor

@xzxiong xzxiong commented May 23, 2024

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue # https://github.com/matrixorigin/MO-Cloud/issues/2774#issuecomment-2128654059

What this PR does / why we need it:

changes:

  1. count mo do logic Flush op, see CountPacket()
  2. calculate basic number of tcp packets written to client, base on the write buffer size 4096 byte.
  3. add bvt case, verify the basic result.

1. base part: flush op cnt
2. upload part: calculation = payload / 16KiB
3. response part: calculation = sendByte / 4KiB
@xzxiong xzxiong changed the title Optimize tcp packet numack count estimation Optimize the estimation of tcp packet num send back to client May 23, 2024
@matrix-meow matrix-meow added the size/XXL Denotes a PR that changes 2000+ lines label May 23, 2024
@mergify mergify bot added kind/bug Something isn't working kind/enhancement labels May 23, 2024
@matrix-meow
Copy link
Contributor

@xzxiong Thanks for your contributions!

Here are review comments for file pkg/frontend/mysql_protocol.go:

Pull Request Review:

Title:

The title of the pull request is clear and concise, indicating that the purpose of the changes is to optimize the estimation of TCP packet numbers sent back to the client.

Body:

The body of the pull request provides information about the type of PR (Improvement), the specific issue it fixes (#2774), and a brief description of the changes made.

Changes:

  1. The changes in the mysql_protocol.go file focus on optimizing the calculation of outgoing traffic bytes and packet numbers sent to the client. The changes involve refactoring the CalculateOutTrafficBytes function to improve the accuracy of the calculations.

  2. The addition of a constant bit4TcpWriteCopy for better readability and understanding of the code.

  3. Modifications in the CalculateOutTrafficBytes function to include different parts of the packet count calculation for various scenarios, such as sending data as ResultSet or CSV.

  4. In the flushOutBuffer and writePackets functions, the addition of mp.ses.CountPacket(1) to count the packets being sent, which seems to be related to tracking the number of packets sent.

Suggestions for Improvement:

  1. Unused Code Removal: The removal of mp.ses.CountPacket calls in the AddSequenceId and SetSequenceID functions might be unintentional. If these calls are no longer needed, it's good to remove them completely to avoid confusion.

  2. Code Clarity: The calculation of packets in the CalculateOutTrafficBytes function seems complex and could benefit from more comments or explanatory variable names to improve readability and maintainability.

  3. Magic Numbers: The use of constants like 4096, 16 KiB, and 20 in calculations could be replaced with named constants for better code readability and maintainability.

  4. Consistency: Ensure consistency in naming conventions and formatting throughout the codebase to enhance code clarity and maintainability.

  5. Security Concern: It's important to ensure that the CountPacket function is used securely and does not introduce any vulnerabilities related to packet counting or tracking.

  6. Testing: Consider adding or updating unit tests to cover the changes made in the CalculateOutTrafficBytes, flushOutBuffer, and writePackets functions to ensure their correctness and prevent regressions.

  7. Documentation: Provide inline comments or documentation to explain the purpose of the functions and the rationale behind the calculations for future reference.

By addressing these suggestions, the codebase can be improved in terms of readability, maintainability, and security.

Here are review comments for file pkg/frontend/session.go:

Pull Request Review:

Title:

The title of the pull request is clear and indicates that the goal is to optimize the estimation of TCP packet numbers sent back to the client.

Body:

The body of the pull request provides relevant information about the type of PR (Improvement), the specific issue it addresses (#2774), and a brief summary of the changes made.

Changes:

  1. The changes made in the session.go file involve updating comments related to packet counting and payload counting in the Session struct.
    • The comment // packetCounter count the packet communicated with client. was changed to // packetCounter count the TCP packet sent to the client. This change clarifies the purpose of the packetCounter field.
    • The comment // payloadCounter count the payload sent by 'load data' was changed to // payloadCounter count the payload sent by 'load data LOCAL infile'. This change specifies the source of the payload being counted.

Feedback and Suggestions for Improvement:

  1. Clarify Variable Names:

    • While the comments have been updated for clarity, it might be beneficial to also consider renaming the variables themselves (packetCounter and payloadCounter) to better reflect their updated descriptions. For example, tcpPacketCounter and localPayloadCounter could be more descriptive.
  2. Consistency in Terminology:

    • Ensure consistency in terminology throughout the codebase. If referring to TCP packets, consistently use the term "TCP" to avoid confusion.
  3. Security Consideration:

    • When dealing with packet counting and payload tracking, it's essential to ensure that there are no vulnerabilities related to buffer overflows or unexpected data handling. Make sure that the counters are properly initialized, incremented, and checked to prevent any potential security risks.
  4. Optimization Opportunity:

    • While the changes made are related to comments, consider if there are any performance optimizations that can be applied to the packet and payload counting mechanisms. For example, optimizing the way these counters are updated or accessed could improve efficiency.
  5. Documentation Update:

    • Update any relevant documentation or inline comments to reflect the changes accurately. This will help developers understand the purpose and usage of these counters effectively.
  6. Testing:

    • Ensure that the changes do not introduce any regressions by running appropriate tests, especially related to packet handling and payload transmission.

By addressing the suggestions mentioned above, the codebase can be improved in terms of clarity, consistency, security, and potentially performance.

Here are review comments for file test/distributed/cases/log/query_tcp.result:

Pull Request Review:

Title:

The title of the pull request is clear and concise, indicating that the goal is to optimize the estimation of TCP packet numbers sent back to the client.

Body:

The body of the pull request is brief and lacks detailed information. It mentions that the PR is an improvement and is related to fixing issue #2774. However, there is no explanation of the specific changes made or the rationale behind them.

Changes:

The changes made in the pull request are not provided in the diff section, which makes it difficult to review the actual code modifications. Without the changes displayed, it is challenging to assess the impact of the optimization on the codebase.

Feedback and Suggestions:

  1. Lack of Detailed Explanation:

    • The pull request description lacks details about the changes made. It is essential to provide a clear explanation of what optimizations were implemented and why they were necessary. This information helps reviewers understand the purpose of the changes and their potential impact.
  2. Missing Code Changes:

    • The diff section does not display any actual code changes, making it impossible to review the modifications. It is crucial to include the specific code alterations in the pull request to allow for a thorough review.
  3. Security Considerations:

    • When optimizing TCP packet handling, it is crucial to ensure that the changes do not introduce any security vulnerabilities. Review the code changes carefully to identify any potential security risks, such as buffer overflows, injection vulnerabilities, or insecure data handling.
  4. Testing:

    • It is important to include tests related to the optimized TCP packet estimation to verify that the changes work as intended and do not introduce regressions. Adding test cases can help ensure the reliability and stability of the codebase.
  5. Documentation:

    • Consider updating the relevant documentation to reflect the changes made in the optimization process. Clear and up-to-date documentation can assist developers in understanding the updated functionality and usage of the code.
  6. Code Quality:

    • Verify that the optimized code adheres to the project's coding standards and best practices. Consistent formatting, meaningful variable names, and proper error handling can enhance the readability and maintainability of the code.

Overall, to improve the quality of the pull request:

  • Provide a detailed explanation of the optimization changes.
  • Include the actual code modifications in the diff section.
  • Address security considerations and perform thorough testing.
  • Update documentation to reflect the changes.
  • Ensure code quality and adherence to coding standards.

By addressing these points, the pull request can be enhanced to effectively optimize the estimation of TCP packet numbers while maintaining code quality and security.

Here are review comments for file test/distributed/cases/log/query_tcp.sql:

Pull Request Review:

Title:

The title of the pull request is clear and indicates that the goal is to optimize the estimation of TCP packet numbers sent back to the client.

Body:

The body of the pull request provides information about the type of PR (BUG, Improvement), the specific issue it fixes, and the changes made in the code related to the issue. It references issue #2774 on GitHub, which is helpful for tracking the context of the changes.

Changes:

The changes in the pull request are related to a SQL file query_tcp.sql in the test/distributed/cases/log directory. The changes involve creating accounts, databases, and tables, as well as inserting data for testing purposes. The SQL script seems to be setting up test cases for querying and inserting rows.

Feedback and Suggestions for Improvement:

  1. Security Concerns:

    • The SQL script contains sensitive information such as account names and passwords (ADMIN_NAME 'admin' IDENTIFIED BY '123456'). Storing passwords in plain text in scripts is a security risk. It is recommended to avoid hardcoding passwords and consider using secure methods like environment variables or secure vaults for storing sensitive information.
  2. Optimization:

    • The SQL script seems to be setting up test data for specific test cases. While this is necessary for testing, it is important to ensure that the test data setup is efficient and does not unnecessarily bloat the script. Consider optimizing the data setup process to make it more concise and focused on the specific test scenarios.
  3. Code Readability:

    • The comments in the SQL script are helpful for understanding the purpose of each section. However, it would be beneficial to provide more descriptive comments explaining the rationale behind each test case or data insertion. This can improve the readability and maintainability of the script.
  4. Testing Strategy:

    • It is essential to ensure that the test cases cover a wide range of scenarios and edge cases to validate the functionality thoroughly. Consider adding more diverse test cases to cover different aspects of the feature being tested.
  5. Documentation:

    • While the PR body mentions the purpose of the changes, it would be beneficial to include more detailed information about the optimization performed and the expected impact on the system. Providing clear documentation can help other team members understand the changes better.
  6. Version Control Best Practices:

    • Ensure that the changes made in the SQL script align with the overall version control best practices. It is essential to maintain consistency in coding standards and adhere to the project's guidelines.

In conclusion, addressing the security concerns, optimizing the test data setup, improving code readability, enhancing testing coverage, documenting the changes effectively, and following version control best practices will contribute to the overall quality and maintainability of the codebase.

Here are review comments for file test/distributed/cases/zz_statement_query_type/query_tcp.result:

Pull Request Review:

Title:

The title of the pull request is clear and indicates that the aim is to optimize the estimation of TCP packet numbers sent back to the client.

Body:

The body of the pull request provides information about the type of PR, the specific issue it fixes, and the changes made in the code related to the issue. It references the issue #2774 on GitHub for context.

Changes:

  1. The changes in the query_tcp.result file involve setting different values for @tcp_cnt and then executing select statements to check certain conditions based on the values of @tcp_cnt. The queries retrieve data from system.statement_info based on specific criteria.

Feedback and Suggestions for Improvement:

  1. Variable Naming: The variable @tcp_cnt is used in the queries. It's important to ensure that variable names are descriptive and meaningful. Consider using a more descriptive name that reflects the purpose of the variable.

  2. Code Clarity: The queries in the query_tcp.result file could benefit from comments to explain the purpose of each query and the significance of the conditions being checked. This will improve code readability and maintainability.

  3. Security Concerns: Ensure that the queries are not vulnerable to SQL injection attacks. Consider using parameterized queries or input validation to prevent malicious input from causing harm.

  4. Optimization: Depending on the context and performance requirements, consider optimizing the queries for efficiency. This could involve indexing columns used in the WHERE clause or reviewing the query execution plan for potential improvements.

  5. Testing: Ensure that the changes are thoroughly tested to validate the functionality and performance impact of the optimizations. Consider adding test cases to cover different scenarios and edge cases.

  6. Documentation: It's important to update any relevant documentation to reflect the changes made in the codebase. This will help other developers understand the purpose and impact of the optimizations.

By addressing the above points, the pull request can not only optimize the estimation of TCP packet numbers but also enhance the overall quality and security of the codebase.

Here are review comments for file test/distributed/cases/zz_statement_query_type/query_tcp.sql:

Pull Request Review:

Title:

The title of the pull request is clear and indicates that the goal is to optimize the estimation of TCP packet numbers sent back to the client.

Body:

The body of the pull request provides information about the type of PR (Improvement), the issue it fixes, and the changes made in the code related to the optimization of TCP packet estimation.

Changes:

  1. The changes made in the query_tcp.sql file in the test/distributed/cases/zz_statement_query_type directory involve adding test cases and SQL queries to estimate TCP packet numbers sent back to the client under different scenarios.
  2. The SQL queries added include cases for checking column statements, calculating packet counts, and verifying specific statements like 'use test'.

Feedback and Suggestions for Improvement:

  1. Code Clarity and Readability:

    • The added SQL queries lack detailed comments explaining the purpose of each query and the expected outcome. Adding clear comments will improve code readability and help in understanding the intent behind each test case.
    • Consider breaking down the SQL queries into smaller, more manageable parts for better readability and maintainability.
  2. Consistency in Naming Conventions:

    • Ensure consistent naming conventions for variables and comments throughout the codebase. This helps in maintaining a clean and uniform code style.
  3. Security Concerns:

    • Be cautious when including sensitive information like account names in SQL queries. It's advisable to avoid hardcoding such information directly in the code. Consider using environment variables or configuration files for sensitive data.
  4. Optimization:

    • While the optimization of TCP packet estimation is the primary goal of the PR, consider optimizing the SQL queries for better performance. Ensure that the queries are efficient and do not introduce unnecessary overhead.
  5. Testing:

    • It's essential to thoroughly test the added SQL queries to ensure they provide accurate estimations of TCP packet numbers. Consider edge cases and different scenarios to validate the functionality.
  6. Documentation:

    • Include detailed documentation within the code or in a separate document explaining the purpose of the TCP packet estimation optimization and how the added SQL queries contribute to this optimization.
  7. Version Control:

    • Ensure that the changes made in the pull request align with the overall version control strategy of the project. Maintain consistency in commit messages and follow the project's branching and merging guidelines.

In conclusion, while the optimization of TCP packet estimation is a valuable improvement, addressing code clarity, security concerns, optimization, testing, and documentation will enhance the quality and maintainability of the codebase.

@mergify mergify bot merged commit b81c3e7 into matrixorigin:main May 27, 2024
17 of 19 checks passed
@xzxiong xzxiong deleted the rebase_tcp_pkg branch May 27, 2024 07:36
mergify bot pushed a commit that referenced this pull request May 27, 2024
#16412)

ref: #16364
changes:
1. count mo do logic Flush op, see CountPacket()
2. calculate basic number of tcp packets written to client, base on the write buffer size 4096 byte.
3. add bvt case, verify the basic result.
4. change config observability::tcpPacket default = true

Approved by: @qingxinhome, @heni02, @daviszhen, @sukki37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working kind/enhancement size/XXL Denotes a PR that changes 2000+ lines
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants