Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate binary datasets in CI instead of storing them in the repo #3540

Merged
merged 1 commit into from
Jun 12, 2024

Conversation

benjaminwinger
Copy link
Collaborator

@benjaminwinger benjaminwinger commented May 24, 2024

Fixes #3513.

Draft since these changes will need to be propagated to the multiplatform CI workflow, but I'd like to check that it works first.

Edit: also missing generation of binary-demo for local testing.

@benjaminwinger benjaminwinger force-pushed the generate-binary-datasets branch 7 times, most recently from 552e8eb to 08ced36 Compare May 24, 2024 22:09
@acquamarin
Copy link
Collaborator

How are we going to detect the changes and upload to s3 if needed?

@benjaminwinger
Copy link
Collaborator Author

How are we going to detect the changes and upload to s3 if needed?

I don't really know what you mean. I see s3 upload and download tests in the httpfs extension directory, but those seem to be uploading and downloading data, not databases.

@benjaminwinger
Copy link
Collaborator Author

benjaminwinger commented May 27, 2024

@acquamarin I must have been looking at an older branch; I do see the remote database test. I think we'd probably want commit-specific databases which could be uploaded before the tests run (so that different PRs can use different databases). The master branch could use a path like master-${kuzu-version} to avoid having to re-upload every time, and any PR with a version which isn't available on the server could upload and use a new one using the commit id (the path used could be passed to the test as an environment variable). That way they should only need to be re-uploaded when the version changes (and for PRs where the version changes).

It may also be possible to test with a local s3-like server (I'm not sure that's possible to do with s3 itself). Apparently there are a few projects that provide an s3-like interface: https://stackoverflow.com/questions/9210162/is-there-a-server-that-provides-an-amazon-s3-style-api-locally#39305640.

@benjaminwinger benjaminwinger force-pushed the generate-binary-datasets branch 8 times, most recently from a241781 to 87a4740 Compare June 5, 2024 15:30
Copy link

github-actions bot commented Jun 5, 2024

Benchmark Result

Master commit hash: 59385613453f79737bf3cd1588a9aa44650733ed
Branch commit hash: 010ef33783fce037968c61892986c4a524fa0684

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 641.04 643.37 -2.33 (-0.36%)
aggregation q28 11937.27 15529.31 -3592.04 (-23.13%)
copy node-Comment 52406.97 N/A N/A
copy node-Forum 4853.45 N/A N/A
copy node-Organisation 1341.10 N/A N/A
copy node-Person 2481.37 N/A N/A
copy node-Place 1324.76 N/A N/A
copy node-Post 19907.97 N/A N/A
copy node-Tag 1176.01 N/A N/A
copy node-Tagclass 755.85 N/A N/A
copy rel-comment-hasCreator 46781.61 N/A N/A
copy rel-comment-hasTag 59151.77 N/A N/A
copy rel-comment-isLocatedIn 47722.04 N/A N/A
copy rel-containerOf 11362.22 N/A N/A
copy rel-forum-hasTag 3248.65 N/A N/A
copy rel-hasInterest 1832.11 N/A N/A
copy rel-hasMember 38422.18 N/A N/A
copy rel-hasModerator 1082.97 N/A N/A
copy rel-hasType 277.36 N/A N/A
copy rel-isPartOf 156.25 N/A N/A
copy rel-isSubclassOf 148.51 N/A N/A
copy rel-knows 4204.26 N/A N/A
copy rel-likes-comment 74490.05 N/A N/A
copy rel-likes-post 25802.30 N/A N/A
copy rel-organisation-isLocatedIn 138.37 N/A N/A
copy rel-person-isLocatedIn 442.55 N/A N/A
copy rel-post-hasCreator 12514.93 N/A N/A
copy rel-post-hasTag 14662.14 N/A N/A
copy rel-post-isLocatedIn 11901.20 N/A N/A
copy rel-replyOf-comment 49497.06 N/A N/A
copy rel-replyOf-post 40023.40 N/A N/A
copy rel-studyAt 370.46 N/A N/A
copy rel-workAt 507.75 N/A N/A
filter q14 124.46 123.64 0.82 (0.66%)
filter q15 124.76 124.37 0.40 (0.32%)
filter q16 302.56 301.77 0.79 (0.26%)
filter q17 442.43 442.76 -0.34 (-0.08%)
filter q18 1947.64 1906.68 40.96 (2.15%)
fixed_size_expr_evaluator q07 563.30 564.72 -1.42 (-0.25%)
fixed_size_expr_evaluator q08 790.19 790.65 -0.45 (-0.06%)
fixed_size_expr_evaluator q09 787.72 787.78 -0.06 (-0.01%)
fixed_size_expr_evaluator q10 239.52 239.80 -0.28 (-0.12%)
fixed_size_expr_evaluator q11 233.29 233.27 0.02 (0.01%)
fixed_size_expr_evaluator q12 233.27 234.24 -0.97 (-0.42%)
fixed_size_expr_evaluator q13 1474.79 1470.78 4.01 (0.27%)
fixed_size_seq_scan q23 110.78 117.78 -7.00 (-5.94%)
join q29 625.13 705.93 -80.80 (-11.45%)
join q30 1384.08 1531.42 -147.34 (-9.62%)
join q31 45.17 48.95 -3.77 (-7.71%)
ldbc_snb_ic q35 3246.86 3365.63 -118.77 (-3.53%)
ldbc_snb_ic q36 123.50 122.96 0.54 (0.44%)
ldbc_snb_is q32 11.13 12.65 -1.52 (-11.98%)
ldbc_snb_is q33 91.44 90.53 0.91 (1.01%)
ldbc_snb_is q34 101.09 97.10 3.99 (4.11%)
order_by q25 124.88 137.83 -12.95 (-9.40%)
order_by q26 432.60 435.96 -3.36 (-0.77%)
order_by q27 1395.57 1389.03 6.54 (0.47%)
scan_after_filter q01 162.30 165.82 -3.52 (-2.12%)
scan_after_filter q02 149.61 149.77 -0.16 (-0.11%)
shortest_path_ldbc100 q39 55.94 58.33 -2.39 (-4.10%)
var_size_expr_evaluator q03 2038.75 2047.54 -8.79 (-0.43%)
var_size_expr_evaluator q04 2239.57 2262.86 -23.28 (-1.03%)
var_size_expr_evaluator q05 2549.24 2544.20 5.04 (0.20%)
var_size_expr_evaluator q06 1433.28 1437.83 -4.55 (-0.32%)
var_size_seq_scan q19 1453.51 1449.82 3.69 (0.25%)
var_size_seq_scan q20 3084.61 3033.40 51.21 (1.69%)
var_size_seq_scan q21 2381.65 2381.59 0.06 (0.00%)
var_size_seq_scan q22 128.25 127.63 0.62 (0.48%)

@benjaminwinger benjaminwinger force-pushed the generate-binary-datasets branch 2 times, most recently from 4728a06 to 5713c09 Compare June 11, 2024 19:25
Copy link

Benchmark Result

Master commit hash: 9347e2dac54612fdd0e2dfa254ce93bf3f73ad32
Branch commit hash: 8bbefe2d88d3bbbd41264f596d7bcf4d92346bac

Query Group Query Name Mean Time - Commit (ms)
aggregation q24 643.28
aggregation q28 14942.79
filter q14 124.83
filter q15 122.47
filter q16 303.15
filter q17 443.48
filter q18 1930.83
fixed_size_expr_evaluator q07 569.21
fixed_size_expr_evaluator q08 792.50
fixed_size_expr_evaluator q09 793.79
fixed_size_expr_evaluator q10 242.97
fixed_size_expr_evaluator q11 236.45
fixed_size_expr_evaluator q12 238.45
fixed_size_expr_evaluator q13 1477.45
fixed_size_seq_scan q23 121.26
join q29 646.96
join q30 1396.08
join q31 47.90
ldbc_snb_ic q35 3309.14
ldbc_snb_ic q36 120.79
ldbc_snb_is q32 12.31
ldbc_snb_is q33 90.11
ldbc_snb_is q34 95.31
order_by q25 126.95
order_by q26 434.50
order_by q27 1391.31
scan_after_filter q01 169.43
scan_after_filter q02 151.97
shortest_path_ldbc100 q39 187.55
var_size_expr_evaluator q03 2033.67
var_size_expr_evaluator q04 2246.99
var_size_expr_evaluator q05 2578.87
var_size_expr_evaluator q06 1380.80
var_size_seq_scan q19 1449.43
var_size_seq_scan q20 3056.38
var_size_seq_scan q21 2361.56
var_size_seq_scan q22 130.80

Copy link

codecov bot commented Jun 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.02%. Comparing base (2518d65) to head (3e9e41c).
Report is 64 commits behind head on master.

Current head 3e9e41c differs from pull request most recent head 345855f

Please upload reports for the commit 345855f to get more accurate results.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3540      +/-   ##
==========================================
- Coverage   90.01%   89.02%   -1.00%     
==========================================
  Files        1190     1217      +27     
  Lines       42956    44921    +1965     
==========================================
+ Hits        38668    39991    +1323     
- Misses       4288     4930     +642     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

Benchmark Result

Master commit hash: 91d86c3000e7d78c17eb2d379fe797f2026e7ed0
Branch commit hash: f2c2e214e90c000fbbcb5752e177b750017ab085

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 644.21 643.54 0.68 (0.11%)
aggregation q28 11789.14 14489.97 -2700.84 (-18.64%)
copy node-Comment 54038.02 N/A N/A
copy node-Forum 4550.35 N/A N/A
copy node-Organisation 1326.25 N/A N/A
copy node-Person 2661.02 N/A N/A
copy node-Place 1262.81 N/A N/A
copy node-Post 18930.10 N/A N/A
copy node-Tag 1154.01 N/A N/A
copy node-Tagclass 1134.05 N/A N/A
copy rel-comment-hasCreator 45871.15 N/A N/A
copy rel-comment-hasTag 57903.64 N/A N/A
copy rel-comment-isLocatedIn 47870.86 N/A N/A
copy rel-containerOf 11671.12 N/A N/A
copy rel-forum-hasTag 2506.07 N/A N/A
copy rel-hasInterest 1768.81 N/A N/A
copy rel-hasMember 38896.69 N/A N/A
copy rel-hasModerator 1060.22 N/A N/A
copy rel-hasType 152.63 N/A N/A
copy rel-isPartOf 138.69 N/A N/A
copy rel-isSubclassOf 162.89 N/A N/A
copy rel-knows 4190.12 N/A N/A
copy rel-likes-comment 74278.97 N/A N/A
copy rel-likes-post 26189.00 N/A N/A
copy rel-organisation-isLocatedIn 267.19 N/A N/A
copy rel-person-isLocatedIn 335.42 N/A N/A
copy rel-post-hasCreator 11020.60 N/A N/A
copy rel-post-hasTag 14433.30 N/A N/A
copy rel-post-isLocatedIn 12781.20 N/A N/A
copy rel-replyOf-comment 50523.93 N/A N/A
copy rel-replyOf-post 33727.74 N/A N/A
copy rel-studyAt 416.00 N/A N/A
copy rel-workAt 559.88 N/A N/A
filter q14 126.66 123.88 2.78 (2.24%)
filter q15 137.06 122.32 14.73 (12.04%)
filter q16 301.17 299.91 1.25 (0.42%)
filter q17 446.35 443.82 2.53 (0.57%)
filter q18 1931.14 1925.43 5.71 (0.30%)
fixed_size_expr_evaluator q07 568.06 564.85 3.21 (0.57%)
fixed_size_expr_evaluator q08 790.83 790.41 0.42 (0.05%)
fixed_size_expr_evaluator q09 790.77 788.77 2.00 (0.25%)
fixed_size_expr_evaluator q10 242.28 244.89 -2.61 (-1.06%)
fixed_size_expr_evaluator q11 237.17 238.69 -1.51 (-0.63%)
fixed_size_expr_evaluator q12 236.32 236.93 -0.61 (-0.26%)
fixed_size_expr_evaluator q13 1471.42 1475.59 -4.17 (-0.28%)
fixed_size_seq_scan q23 127.21 117.03 10.18 (8.70%)
join q29 667.14 706.68 -39.54 (-5.60%)
join q30 1460.16 1499.34 -39.18 (-2.61%)
join q31 45.66 43.09 2.56 (5.95%)
ldbc_snb_ic q35 3300.50 3324.77 -24.27 (-0.73%)
ldbc_snb_ic q36 132.95 130.38 2.57 (1.97%)
ldbc_snb_is q32 13.02 11.81 1.22 (10.29%)
ldbc_snb_is q33 89.21 95.76 -6.55 (-6.84%)
ldbc_snb_is q34 95.87 96.90 -1.04 (-1.07%)
order_by q25 126.92 126.68 0.24 (0.19%)
order_by q26 434.61 434.07 0.54 (0.12%)
order_by q27 1387.44 1394.13 -6.69 (-0.48%)
scan_after_filter q01 169.37 171.47 -2.11 (-1.23%)
scan_after_filter q02 150.91 153.93 -3.02 (-1.96%)
shortest_path_ldbc100 q39 58.35 187.20 -128.86 (-68.83%)
var_size_expr_evaluator q03 2043.53 2029.18 14.35 (0.71%)
var_size_expr_evaluator q04 2260.10 2242.06 18.04 (0.80%)
var_size_expr_evaluator q05 2583.05 2578.85 4.20 (0.16%)
var_size_expr_evaluator q06 1383.71 1388.15 -4.44 (-0.32%)
var_size_seq_scan q19 1452.98 1448.85 4.13 (0.28%)
var_size_seq_scan q20 3088.85 3202.26 -113.41 (-3.54%)
var_size_seq_scan q21 2376.16 2414.66 -38.50 (-1.59%)
var_size_seq_scan q22 129.14 132.79 -3.65 (-2.75%)

@benjaminwinger benjaminwinger marked this pull request as ready for review June 12, 2024 15:05
Copy link

Benchmark Result

Master commit hash: f715ce86b41198e9007cde5c1aaee732bc24d318
Branch commit hash: b95941cd3dac42710804a8ad1e0681540cd29bfc

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 650.89 653.07 -2.18 (-0.33%)
aggregation q28 11872.60 12131.27 -258.67 (-2.13%)
filter q14 133.97 133.22 0.76 (0.57%)
filter q15 133.93 133.09 0.84 (0.63%)
filter q16 310.26 312.68 -2.41 (-0.77%)
filter q17 455.54 451.22 4.32 (0.96%)
filter q18 1918.06 1906.52 11.55 (0.61%)
fixed_size_expr_evaluator q07 570.63 570.97 -0.34 (-0.06%)
fixed_size_expr_evaluator q08 795.76 794.19 1.57 (0.20%)
fixed_size_expr_evaluator q09 794.14 791.85 2.29 (0.29%)
fixed_size_expr_evaluator q10 249.86 247.97 1.89 (0.76%)
fixed_size_expr_evaluator q11 242.51 241.10 1.41 (0.59%)
fixed_size_expr_evaluator q12 242.42 241.68 0.73 (0.30%)
fixed_size_expr_evaluator q13 1469.95 1472.11 -2.16 (-0.15%)
fixed_size_seq_scan q23 120.41 132.03 -11.61 (-8.79%)
join q29 670.62 702.55 -31.93 (-4.55%)
join q30 1429.88 1477.83 -47.95 (-3.24%)
join q31 41.49 48.13 -6.63 (-13.79%)
ldbc_snb_ic q35 3503.57 3454.78 48.79 (1.41%)
ldbc_snb_ic q36 137.01 131.83 5.18 (3.93%)
ldbc_snb_is q32 12.79 12.09 0.70 (5.79%)
ldbc_snb_is q33 102.87 97.37 5.50 (5.64%)
ldbc_snb_is q34 97.18 95.19 2.00 (2.10%)
order_by q25 132.39 132.16 0.23 (0.17%)
order_by q26 445.53 446.74 -1.21 (-0.27%)
order_by q27 1407.95 1399.64 8.31 (0.59%)
scan_after_filter q01 171.28 175.29 -4.01 (-2.29%)
scan_after_filter q02 160.29 161.04 -0.75 (-0.47%)
shortest_path_ldbc100 q39 54.79 51.85 2.93 (5.66%)
var_size_expr_evaluator q03 2043.76 2059.17 -15.41 (-0.75%)
var_size_expr_evaluator q04 2231.40 2233.53 -2.14 (-0.10%)
var_size_expr_evaluator q05 2594.95 2596.07 -1.12 (-0.04%)
var_size_expr_evaluator q06 1369.27 1369.88 -0.61 (-0.04%)
var_size_seq_scan q19 1460.18 1463.54 -3.36 (-0.23%)
var_size_seq_scan q20 3126.47 3110.67 15.79 (0.51%)
var_size_seq_scan q21 2393.42 2401.93 -8.51 (-0.35%)
var_size_seq_scan q22 129.04 130.12 -1.08 (-0.83%)

Copy link

Benchmark Result

Master commit hash: f715ce86b41198e9007cde5c1aaee732bc24d318
Branch commit hash: 8188edb217c22751ff03cb630e4e3f16a440f4cd

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 653.58 653.07 0.51 (0.08%)
aggregation q28 11768.92 12131.27 -362.35 (-2.99%)
filter q14 133.66 133.22 0.44 (0.33%)
filter q15 138.59 133.09 5.50 (4.13%)
filter q16 311.13 312.68 -1.54 (-0.49%)
filter q17 453.56 451.22 2.33 (0.52%)
filter q18 1914.54 1906.52 8.03 (0.42%)
fixed_size_expr_evaluator q07 574.91 570.97 3.95 (0.69%)
fixed_size_expr_evaluator q08 798.26 794.19 4.07 (0.51%)
fixed_size_expr_evaluator q09 797.50 791.85 5.65 (0.71%)
fixed_size_expr_evaluator q10 255.33 247.97 7.35 (2.97%)
fixed_size_expr_evaluator q11 244.55 241.10 3.45 (1.43%)
fixed_size_expr_evaluator q12 243.17 241.68 1.49 (0.61%)
fixed_size_expr_evaluator q13 1473.35 1472.11 1.24 (0.08%)
fixed_size_seq_scan q23 131.75 132.03 -0.27 (-0.20%)
join q29 682.31 702.55 -20.25 (-2.88%)
join q30 1476.13 1477.83 -1.70 (-0.11%)
join q31 46.43 48.13 -1.70 (-3.53%)
ldbc_snb_ic q35 3536.06 3454.78 81.28 (2.35%)
ldbc_snb_ic q36 127.84 131.83 -3.99 (-3.03%)
ldbc_snb_is q32 11.88 12.09 -0.20 (-1.69%)
ldbc_snb_is q33 94.89 97.37 -2.48 (-2.55%)
ldbc_snb_is q34 99.28 95.19 4.09 (4.30%)
order_by q25 135.82 132.16 3.66 (2.77%)
order_by q26 448.51 446.74 1.77 (0.40%)
order_by q27 1400.06 1399.64 0.42 (0.03%)
scan_after_filter q01 182.52 175.29 7.23 (4.13%)
scan_after_filter q02 160.75 161.04 -0.29 (-0.18%)
shortest_path_ldbc100 q39 53.70 51.85 1.85 (3.57%)
var_size_expr_evaluator q03 2057.37 2059.17 -1.80 (-0.09%)
var_size_expr_evaluator q04 2242.79 2233.53 9.25 (0.41%)
var_size_expr_evaluator q05 2600.80 2596.07 4.73 (0.18%)
var_size_expr_evaluator q06 1370.24 1369.88 0.36 (0.03%)
var_size_seq_scan q19 1461.40 1463.54 -2.14 (-0.15%)
var_size_seq_scan q20 3398.62 3110.67 287.95 (9.26%)
var_size_seq_scan q21 2563.37 2401.93 161.43 (6.72%)
var_size_seq_scan q22 130.76 130.12 0.64 (0.49%)

@andyfengHKU andyfengHKU merged commit 309136f into master Jun 12, 2024
20 checks passed
@andyfengHKU andyfengHKU deleted the generate-binary-datasets branch June 12, 2024 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Binary datasets
4 participants