Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestHistoryBasedStatsTracking.testHistoryBasedStatsCalculator is flaky #22204

Open
elharo opened this issue Mar 14, 2024 · 3 comments
Open

TestHistoryBasedStatsTracking.testHistoryBasedStatsCalculator is flaky #22204

elharo opened this issue Mar 14, 2024 · 3 comments

Comments

@elharo
Copy link
Contributor

elharo commented Mar 14, 2024

This test and its superclass shares a lot of state between test methods. Possibly it could be deflaked by moving some or all of the initialization from BeforeClass to BeforeMethod.

2024-03-14T11:13:03.2037597Z [ERROR] Tests run: 61, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 203.158 s <<< FAILURE! - in TestSuite
2024-03-14T11:13:03.2039919Z [ERROR] com.facebook.presto.execution.TestHistoryBasedStatsTracking.testHistoryBasedStatsCalculator Time elapsed: 0.103 s <<< FAILURE!
2024-03-14T11:13:03.2041698Z java.lang.AssertionError:
2024-03-14T11:13:03.2042236Z Plan does not match, expected [
2024-03-14T11:13:03.2042619Z
2024-03-14T11:13:03.2042831Z - anyTree
2024-03-14T11:13:03.2043243Z - node(FilterNode)
2024-03-14T11:13:03.2043717Z expectedOutputRowCount(2.0)
2024-03-14T11:13:03.2044279Z expectedOutputSize(199.0)
2024-03-14T11:13:03.2044826Z - node
2024-03-14T11:13:03.2045059Z
2024-03-14T11:13:03.2045211Z ] but found [
2024-03-14T11:13:03.2045439Z
2024-03-14T11:13:03.2047776Z - Output[PlanNodeId 6][nationkey, name, regionkey, comment] => [nationkey:bigint, name:varchar(25), regionkey:bigint, comment:varchar(152)]
2024-03-14T11:13:03.2053014Z - ScanFilter[PlanNodeId 0,108][table = TableHandle {connectorId='tpch', connectorHandle='nation:sf0.01', layout='Optional[nation:sf0.01]'}, filterPredicate = (substr(name, BIGINT'1', BIGINT'1')) = (VARCHAR'A')] => [nationkey:bigint, name:varchar(25), regionkey:bigint, comment:varchar(152)]
2024-03-14T11:13:03.2055339Z regionkey := tpch:regionkey (1:15)
2024-03-14T11:13:03.2055952Z name := tpch:name (1:15)
2024-03-14T11:13:03.2056545Z comment := tpch:comment (1:15)
2024-03-14T11:13:03.2057173Z nationkey := tpch:nationkey (1:15)
2024-03-14T11:13:03.2057603Z
2024-03-14T11:13:03.2057742Z ]
2024-03-14T11:13:03.2058575Z at com.facebook.presto.sql.planner.assertions.PlanAssert.assertPlan(PlanAssert.java:56)
2024-03-14T11:13:03.2060103Z at com.facebook.presto.sql.planner.assertions.PlanAssert.assertPlan(PlanAssert.java:40)
2024-03-14T11:13:03.2062121Z at

@elharo
Copy link
Contributor Author

elharo commented Mar 14, 2024

Might have been caused by #20947

@rschlussel rschlussel self-assigned this Mar 21, 2024
@rschlussel
Copy link
Contributor

I added the stats information to the "actual" plan that gets printed for failures. So far I've gotten one failure out of several thousand runs. It looks like sometimes the HBO stats for the filter node are logged as 0 rows and 0 bytes (in the test we check that it's 2 rows and 199 bytes). The output node stats for that same run are noted as 2 rows and 199 bytes. Possibly a real bug or race condition of some kind for HBO.

- Output[PlanNodeId 6][nationkey, name, regionkey, comment] => [nationkey:bigint, name:varchar(25), regionkey:bigint, comment:varchar(152)]
        Estimates: {source: HistoryBasedSourceInfo, rows: 2 (199B), cpu: ?, memory: ?, network: ?}
    - ScanFilter[PlanNodeId 0,108][table = TableHandle {connectorId='tpch', connectorHandle='nation:sf0.01', layout='Optional[nation:sf0.01]'}, filterPredicate = (substr(name, BIGINT'1', BIGINT'1')) = (VARCHAR'A')] => [nationkey:bigint, name:varchar(25), regionkey:bigint, comment:varchar(152)]
            Estimates: {source: CostBasedSourceInfo, rows: 25 (2.67kB), cpu: ?, memory: ?, network: ?}/{source: HistoryBasedSourceInfo, rows: 0 (0B), cpu: ?, memory: ?, network: ?}
            regionkey := tpch:regionkey (1:15)
            name := tpch:name (1:15)
            nationkey := tpch:nationkey (1:15)
            comment := tpch:comment (1:15)

@hantangwangd
Copy link
Member

Meet the error in another method TestHistoryBasedStatsTracking.testRowNumber:
https://github.com/prestodb/presto/actions/runs/9208530744/job/25330959486?pr=22659#step:7:6454

[ERROR]   TestHistoryBasedStatsTracking.testRowNumber:242->AbstractTestQueryFramework.assertPlan:446->AbstractTestQueryFramework.assertPlan:451->AbstractTestQueryFramework.assertPlan:459->AbstractTestQueryFramework.lambda$assertPlan$7:461 Plan does not match, expected [

- anyTree
    - node(RowNumberNode)
        expectedOutputRowCount(2.0)
        - anyTree
            - node

] but found [

- Output[PlanNodeId 9][nationkey, _col1] => [nationkey:bigint, row_number_1:bigint]
        Estimates: {source: HistoryBasedSourceInfo, rows: 2 (36B), cpu: ?, memory: ?, network: ?}
        _col1 := row_number_1 (1:19)
    - Project[PlanNodeId 5][projectLocality = LOCAL] => [nationkey:bigint, row_number_1:bigint]
        - RowNumber[PlanNodeId 250][partition by (regionkey)][$hashvalue] => [nationkey:bigint, regionkey:bigint, $hashvalue:bigint, row_number_1:bigint]
                row_number_1 := row_number()
            - LocalExchange[PlanNodeId 378][SINGLE] () => [nationkey:bigint, regionkey:bigint, $hashvalue:bigint]
                    Estimates: {source: CostBasedSourceInfo, rows: 2 (36B), cpu: ?, memory: ?, network: ?}
                - ScanFilterProject[PlanNodeId 0,202,2][table = TableHandle {connectorId='tpch', connectorHandle='nation:sf0.01', layout='Optional[nation:sf0.01]'}, filterPredicate = (substr(name, BIGINT'1', BIGINT'1')) = (VARCHAR'A'), projectLocality = LOCAL] => [nationkey:bigint, regionkey:bigint, $hashvalue_9:bigint]
                        Estimates: {source: CostBasedSourceInfo, rows: 25 (450B), cpu: ?, memory: ?, network: ?}/{source: CostBasedSourceInfo, rows: ? (?), cpu: ?, memory: ?, network: ?}/{source: HistoryBasedSourceInfo, rows: 2 (38B), cpu: ?, memory: ?, network: ?}
                        $hashvalue_9 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(regionkey), BIGINT'0')) (1:68)
                        regionkey := tpch:regionkey (1:67)
                        nationkey := tpch:nationkey (1:67)
                        name := tpch:name (1:67)

]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🆕 Unprioritized
Development

No branches or pull requests

3 participants