Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coroutinize commit log #8954

Merged
merged 31 commits into from Jul 20, 2021
Merged

Coroutinize commit log #8954

merged 31 commits into from Jul 20, 2021

Conversation

elcallio
Copy link
Contributor

@elcallio elcallio commented Jun 30, 2021

No real refactoring, just move the various methods to coroutines. Because coroutines are neat.
Broken down into one method per change to make review easier. And hoping I get tipped per change.

Grand idea being that using coroutines will eventually make real refactoring easier.
Unit tests + relevant dtest.

As discussed below, simply coroutinizing the code will, at least in the fast path, cause the slightly naive
compiler to generate multiple unused coroutine frames, dropping raw performance a bit.
The last two patches in this series addresses this, by breaking the fast path into non-coroutine
subroutines (no futures involved) and one coroutine main loop.

Results, as collected by perf_simple_query are:
Master (before changes):

{
        "parameters" : 
        {
                "concurrency" : 100,
                "concurrency,partitions,cpus,duration" : "100,10000,1,30",
                "cpus" : 1,
                "duration" : 30,
                "partitions" : 10000
        },
        "stats" : 
        {
                "allocs_per_op" : 52.237303521776113,
                "instructions_per_op" : 47403.34422198555,
                "mad tps" : 670.12528706749436,
                "max tps" : 140817.0800358199,
                "median tps" : 139391.58369995767,
                "min tps" : 133663.0095463676,
                "tasks_per_op" : 13.189605506751203
        },
        "test_properties" : 
        {
                "type" : "write"
        },
        "versions" : 
        {
                "scylla-server" : 
                {
                        "commit_id" : "1f51bc67fd",
                        "date" : "20210712",
                        "run_date_time" : "2021-07-13 10:26:46",
                        "version" : "4.6.dev"
                }
        }
}

This PR (coroutines + fast path optimization patches):

{
        "parameters" : 
        {
                "concurrency" : 100,
                "concurrency,partitions,cpus,duration" : "100,10000,1,30",
                "cpus" : 1,
                "duration" : 30,
                "partitions" : 10000
        },
        "stats" : 
        {
                "allocs_per_op" : 52.208628061750559,
                "instructions_per_op" : 47300.501878330339,
                "mad tps" : 707.70233700674726,
                "max tps" : 139618.0661493362,
                "median tps" : 137891.11290420164,
                "min tps" : 127551.83433347062,
                "tasks_per_op" : 13.172121395660733
        },
        "test_properties" : 
        {
                "type" : "write"
        },
        "versions" : 
        {
                "scylla-server" : 
                {
                        "commit_id" : "1d4b6f50bd",
                        "date" : "20210719",
                        "run_date_time" : "2021-07-19 09:27:09",
                        "version" : "4.6.dev"
                }
        }
}

I.e. both allocations/op and instruction count seem to be on par.

db/commitlog/commitlog.cc Show resolved Hide resolved
db/commitlog/commitlog.cc Show resolved Hide resolved
db/commitlog/commitlog.cc Show resolved Hide resolved
@bhalevy
Copy link
Member

bhalevy commented Jul 8, 2021

Please rebase on top of #8980

db/commitlog/commitlog.cc Outdated Show resolved Hide resolved
db/commitlog/commitlog.cc Outdated Show resolved Hide resolved
db/commitlog/commitlog.cc Outdated Show resolved Hide resolved
@avikivity
Copy link
Member

Looks mostly good (the complexity of commitlog is still astounding).

I'm concerned about regressions. Please check if perf_simple_query --write uses commitlog (add an option if not). It will report insn per op and allocations per op as well as throughput, we can compare before/after. We can tolerate a small regression and improve it later, but if it's large (>3%) we'll need to think.

@elcallio
Copy link
Contributor Author

Fair enough. Will do.

@elcallio
Copy link
Contributor Author

perf_simple_query write --duration 30 (3 runs)

master:

{
        "parameters" : 
        {
                "concurrency" : 100,
                "concurrency,partitions,cpus,duration" : "100,10000,8,30",
                "cpus" : 8,
                "duration" : 30,
                "partitions" : 10000
        },
        "stats" : 
        {
                "allocs_per_op" : 52.278362050568127,
                "mad tps" : 885.66405237524305,
                "max tps" : 554571.85457205586,
                "median tps" : 500296.00436982565,
                "min tps" : 493656.13463197456,
                "tasks_per_op" : 14.119138463011311
        },
        "test_properties" : 
        {
                "type" : "write"
        },
        "versions" : 
        {
                "scylla-server" : 
                {
                        "commit_id" : "222ef17305",
                        "date" : "20210712",
                        "run_date_time" : "2021-07-12 12:28:31",
                        "version" : "4.6.dev"
                }
        }
}{
        "parameters" : 
        {
                "concurrency" : 100,
                "concurrency,partitions,cpus,duration" : "100,10000,8,30",
                "cpus" : 8,
                "duration" : 30,
                "partitions" : 10000
        },
        "stats" : 
        {
                "allocs_per_op" : 52.279360932795996,
                "mad tps" : 3121.7288884630543,
                "max tps" : 555703.57877868332,
                "median tps" : 501142.04811843345,
                "min tps" : 431350.84547949483,
                "tasks_per_op" : 14.119325331719205
        },
        "test_properties" : 
        {
                "type" : "write"
        },
        "versions" : 
        {
                "scylla-server" : 
                {
                        "commit_id" : "222ef17305",
                        "date" : "20210712",
                        "run_date_time" : "2021-07-12 12:29:09",
                        "version" : "4.6.dev"
                }
        }
}{
        "parameters" : 
        {
                "concurrency" : 100,
                "concurrency,partitions,cpus,duration" : "100,10000,8,30",
                "cpus" : 8,
                "duration" : 30,
                "partitions" : 10000
        },
        "stats" : 
        {
                "allocs_per_op" : 52.312228453308322,
                "mad tps" : 2145.5890656120027,
                "max tps" : 514131.71118491504,
                "median tps" : 466419.25443132687,
                "min tps" : 459853.47800593078,
                "tasks_per_op" : 14.155377342239746
        },
        "test_properties" : 
        {
                "type" : "write"
        },
        "versions" : 
        {
                "scylla-server" : 
                {
                        "commit_id" : "222ef17305",
                        "date" : "20210712",
                        "run_date_time" : "2021-07-12 12:29:45",
                        "version" : "4.6.dev"
                }
        }
}

coroutine

{
        "parameters" : 
        {
                "concurrency" : 100,
                "concurrency,partitions,cpus,duration" : "100,10000,8,30",
                "cpus" : 8,
                "duration" : 30,
                "partitions" : 10000
        },
        "stats" : 
        {
                "allocs_per_op" : 55.27292351786685,
                "mad tps" : 1816.5705907656229,
                "max tps" : 502054.27367427375,
                "median tps" : 493737.83666241454,
                "min tps" : 388336.14530819876,
                "tasks_per_op" : 14.1115164843161
        },
        "test_properties" : 
        {
                "type" : "write"
        },
        "versions" : 
        {
                "scylla-server" : 
                {
                        "commit_id" : "222ef17305",
                        "date" : "20210712",
                        "run_date_time" : "2021-07-12 13:06:41",
                        "version" : "4.6.dev"
                }
        }
}{
        "parameters" : 
        {
                "concurrency" : 100,
                "concurrency,partitions,cpus,duration" : "100,10000,8,30",
                "cpus" : 8,
                "duration" : 30,
                "partitions" : 10000
        },
        "stats" : 
        {
                "allocs_per_op" : 55.274283962865276,
                "mad tps" : 1334.2904216616298,
                "max tps" : 530219.04846329649,
                "median tps" : 495017.11182056292,
                "min tps" : 445257.50330738095,
                "tasks_per_op" : 14.111412232519498
        },
        "test_properties" : 
        {
                "type" : "write"
        },
        "versions" : 
        {
                "scylla-server" : 
                {
                        "commit_id" : "222ef17305",
                        "date" : "20210712",
                        "run_date_time" : "2021-07-12 13:07:16",
                        "version" : "4.6.dev"
                }
        }
}{
        "parameters" : 
        {
                "concurrency" : 100,
                "concurrency,partitions,cpus,duration" : "100,10000,8,30",
                "cpus" : 8,
                "duration" : 30,
                "partitions" : 10000
        },
        "stats" : 
        {
                "allocs_per_op" : 55.274868082420163,
                "mad tps" : 1875.6137657201034,
                "max tps" : 545869.60070414329,
                "median tps" : 492907.62397543306,
                "min tps" : 422702.24671609281,
                "tasks_per_op" : 14.115727327171784
        },
        "test_properties" : 
        {
                "type" : "write"
        },
        "versions" : 
        {
                "scylla-server" : 
                {
                        "commit_id" : "222ef17305",
                        "date" : "20210712",
                        "run_date_time" : "2021-07-12 13:07:54",
                        "version" : "4.6.dev"
                }
        }
}

Seems to be a small perf drop. Best tasks/op for master is 14.155, coroutine 14.115 -> ~0.3%

@elcallio
Copy link
Contributor Author

rebased on master.

@elcallio
Copy link
Contributor Author

ping?

@avikivity
Copy link
Member

The numbers are extremely noisy. The only one I trust is allocs_per_op, and it shows the 3 extra allocations I pointed out.

To get less noisy numbers, please run on tmpfs, set the cpu frequency to some fixed thing, and run with --smp 1 --task-quota-ms 10. This gets more stable numbers. I'll also add instructions per op to the json report (it's already in the text report), it's more stable than tps.

I think we should un-coroutinize those three allocations (or maybe uncoroutinize just the fast paths), but can also decide based on more accurate perf stats.

@avikivity
Copy link
Member

#9017

@elcallio
Copy link
Contributor Author

tmp + opts above. Unfortunately, I could not get any instruction counts (probably because of running inside a container - weird though, it is unconfined). Can maybe spin up an AWS or similar and test that instead.
Yes, the allocs are there. Tasks per op at ~ -.01%. But perhaps not wholly reliable.

old:

{
        "parameters" : 
        {
                "concurrency" : 100,
                "concurrency,partitions,cpus,duration" : "100,10000,1,30",
                "cpus" : 1,
                "duration" : 30,
                "partitions" : 10000
        },
        "stats" : 
        {
                "allocs_per_op" : 52.199675579883092,
                "instructions_per_op" : 0.0,
                "mad tps" : 988.02545691965497,
                "max tps" : 141608.28628718955,
                "median tps" : 138890.31351817003,
                "min tps" : 128551.82512204401,
                "tasks_per_op" : 13.168662731703135
        },
        "test_properties" : 
        {
                "type" : "write"
        },
        "versions" : 
        {
                "scylla-server" : 
                {
                        "commit_id" : "1f51bc67fd",
                        "date" : "20210712",
                        "run_date_time" : "2021-07-13 08:19:11",
                        "version" : "4.6.dev"
                }
        }
}{
        "parameters" : 
        {
                "concurrency" : 100,
                "concurrency,partitions,cpus,duration" : "100,10000,1,30",
                "cpus" : 1,
                "duration" : 30,
                "partitions" : 10000
        },
        "stats" : 
        {
                "allocs_per_op" : 52.202875617667999,
                "instructions_per_op" : 0.0,
                "mad tps" : 826.50582187165855,
                "max tps" : 139804.24693595362,
                "median tps" : 138385.60661807586,
                "min tps" : 133619.91592358533,
                "tasks_per_op" : 13.171623594737865
        },
        "test_properties" : 
        {
                "type" : "write"
        },
        "versions" : 
        {
                "scylla-server" : 
                {
                        "commit_id" : "1f51bc67fd",
                        "date" : "20210712",
                        "run_date_time" : "2021-07-13 08:21:51",
                        "version" : "4.6.dev"
                }
        }
}{
        "parameters" : 
        {
                "concurrency" : 100,
                "concurrency,partitions,cpus,duration" : "100,10000,1,30",
                "cpus" : 1,
                "duration" : 30,
                "partitions" : 10000
        },
        "stats" : 
        {
                "allocs_per_op" : 52.204737211534329,
                "instructions_per_op" : 0.0,
                "mad tps" : 544.18338217405835,
                "max tps" : 139739.20437636707,
                "median tps" : 138641.67905720361,
                "min tps" : 134350.30433903809,
                "tasks_per_op" : 13.171976392728732
        },
        "test_properties" : 
        {
                "type" : "write"
        },
        "versions" : 
        {
                "scylla-server" : 
                {
                        "commit_id" : "1f51bc67fd",
                        "date" : "20210712",
                        "run_date_time" : "2021-07-13 08:22:28",
                        "version" : "4.6.dev"
                }
        }
}

coroutine:

{
        "parameters" : 
        {
                "concurrency" : 100,
                "concurrency,partitions,cpus,duration" : "100,10000,1,30",
                "cpus" : 1,
                "duration" : 30,
                "partitions" : 10000
        },
        "stats" : 
        {
                "allocs_per_op" : 55.201153410957403,
                "instructions_per_op" : 0.0,
                "mad tps" : 894.96604798236513,
                "max tps" : 138152.97180803175,
                "median tps" : 136224.99760518572,
                "min tps" : 131372.46671571763,
                "tasks_per_op" : 13.166142278351645
        },
        "test_properties" : 
        {
                "type" : "write"
        },
        "versions" : 
        {
                "scylla-server" : 
                {
                        "commit_id" : "1f51bc67fd",
                        "date" : "20210712",
                        "run_date_time" : "2021-07-13 08:41:19",
                        "version" : "4.6.dev"
                }
        }
}{
        "parameters" : 
        {
                "concurrency" : 100,
                "concurrency,partitions,cpus,duration" : "100,10000,1,30",
                "cpus" : 1,
                "duration" : 30,
                "partitions" : 10000
        },
        "stats" : 
        {
                "allocs_per_op" : 55.203744630860164,
                "instructions_per_op" : 0.0,
                "mad tps" : 378.85813946920098,
                "max tps" : 136261.96739622834,
                "median tps" : 135303.40764167934,
                "min tps" : 133652.37391582035,
                "tasks_per_op" : 13.168567128014978
        },
        "test_properties" : 
        {
                "type" : "write"
        },
        "versions" : 
        {
                "scylla-server" : 
                {
                        "commit_id" : "1f51bc67fd",
                        "date" : "20210712",
                        "run_date_time" : "2021-07-13 08:41:56",
                        "version" : "4.6.dev"
                }
        }
}{
        "parameters" : 
        {
                "concurrency" : 100,
                "concurrency,partitions,cpus,duration" : "100,10000,1,30",
                "cpus" : 1,
                "duration" : 30,
                "partitions" : 10000
        },
        "stats" : 
        {
                "allocs_per_op" : 55.194991849566279,
                "instructions_per_op" : 0.0,
                "mad tps" : 993.51819265127415,
                "max tps" : 137912.80490406425,
                "median tps" : 136529.4182341657,
                "min tps" : 133622.16750119071,
                "tasks_per_op" : 13.160010479129069
        },
        "test_properties" : 
        {
                "type" : "write"
        },
        "versions" : 
        {
                "scylla-server" : 
                {
                        "commit_id" : "1f51bc67fd",
                        "date" : "20210712",
                        "run_date_time" : "2021-07-13 08:42:31",
                        "version" : "4.6.dev"
                }
        }
}

@elcallio
Copy link
Contributor Author

And with instruction count. About ~1.4% increase in instructions. Not great. But perhaps not unexpected without actually trying to fold coroutines.

old:

{
        "parameters" : 
        {
                "concurrency" : 100,
                "concurrency,partitions,cpus,duration" : "100,10000,1,30",
                "cpus" : 1,
                "duration" : 30,
                "partitions" : 10000
        },
        "stats" : 
        {
                "allocs_per_op" : 52.203516196373037,
                "instructions_per_op" : 47460.051358336765,
                "mad tps" : 732.49384117600857,
                "max tps" : 143070.46692769558,
                "median tps" : 139797.25086602769,
                "min tps" : 132768.99707354061,
                "tasks_per_op" : 13.172543561923968
        },
        "test_properties" : 
        {
                "type" : "write"
        },
        "versions" : 
        {
                "scylla-server" : 
                {
                        "commit_id" : "1f51bc67fd",
                        "date" : "20210712",
                        "run_date_time" : "2021-07-13 10:26:10",
                        "version" : "4.6.dev"
                }
        }
}{
        "parameters" : 
        {
                "concurrency" : 100,
                "concurrency,partitions,cpus,duration" : "100,10000,1,30",
                "cpus" : 1,
                "duration" : 30,
                "partitions" : 10000
        },
        "stats" : 
        {
                "allocs_per_op" : 52.237303521776113,
                "instructions_per_op" : 47403.34422198555,
                "mad tps" : 670.12528706749436,
                "max tps" : 140817.0800358199,
                "median tps" : 139391.58369995767,
                "min tps" : 133663.0095463676,
                "tasks_per_op" : 13.189605506751203
        },
        "test_properties" : 
        {
                "type" : "write"
        },
        "versions" : 
        {
                "scylla-server" : 
                {
                        "commit_id" : "1f51bc67fd",
                        "date" : "20210712",
                        "run_date_time" : "2021-07-13 10:26:46",
                        "version" : "4.6.dev"
                }
        }
}{
        "parameters" : 
        {
                "concurrency" : 100,
                "concurrency,partitions,cpus,duration" : "100,10000,1,30",
                "cpus" : 1,
                "duration" : 30,
                "partitions" : 10000
        },
        "stats" : 
        {
                "allocs_per_op" : 52.218496828767762,
                "instructions_per_op" : 47419.034787023331,
                "mad tps" : 845.45681446063099,
                "max tps" : 141071.35158473032,
                "median tps" : 138700.44582031129,
                "min tps" : 132904.27357105099,
                "tasks_per_op" : 13.185747688611452
        },
        "test_properties" : 
        {
                "type" : "write"
        },
        "versions" : 
        {
                "scylla-server" : 
                {
                        "commit_id" : "1f51bc67fd",
                        "date" : "20210712",
                        "run_date_time" : "2021-07-13 10:27:22",
                        "version" : "4.6.dev"
                }
        }
}

coroutine:

{
        "parameters" : 
        {
                "concurrency" : 100,
                "concurrency,partitions,cpus,duration" : "100,10000,1,30",
                "cpus" : 1,
                "duration" : 30,
                "partitions" : 10000
        },
        "stats" : 
        {
                "allocs_per_op" : 55.201528920500508,
                "instructions_per_op" : 48106.216832700535,
                "mad tps" : 763.12819885797217,
                "max tps" : 138046.13293605321,
                "median tps" : 136681.13684568691,
                "min tps" : 128604.26096129866,
                "tasks_per_op" : 13.165266249335014
        },
        "test_properties" : 
        {
                "type" : "write"
        },
        "versions" : 
        {
                "scylla-server" : 
                {
                        "commit_id" : "1f51bc67fd",
                        "date" : "20210712",
                        "run_date_time" : "2021-07-13 10:24:19",
                        "version" : "4.6.dev"
                }
        }
}{
        "parameters" : 
        {
                "concurrency" : 100,
                "concurrency,partitions,cpus,duration" : "100,10000,1,30",
                "cpus" : 1,
                "duration" : 30,
                "partitions" : 10000
        },
        "stats" : 
        {
                "allocs_per_op" : 55.214842211231236,
                "instructions_per_op" : 48057.996410907341,
                "mad tps" : 736.75503624524572,
                "max tps" : 138405.78873820545,
                "median tps" : 136160.7808177234,
                "min tps" : 132358.54535403103,
                "tasks_per_op" : 13.177105673976159
        },
        "test_properties" : 
        {
                "type" : "write"
        },
        "versions" : 
        {
                "scylla-server" : 
                {
                        "commit_id" : "1f51bc67fd",
                        "date" : "20210712",
                        "run_date_time" : "2021-07-13 10:24:54",
                        "version" : "4.6.dev"
                }
        }
}{
        "parameters" : 
        {
                "concurrency" : 100,
                "concurrency,partitions,cpus,duration" : "100,10000,1,30",
                "cpus" : 1,
                "duration" : 30,
                "partitions" : 10000
        },
        "stats" : 
        {
                "allocs_per_op" : 55.195990962284917,
                "instructions_per_op" : 48084.073554544928,
                "mad tps" : 698.15471085262834,
                "max tps" : 138620.21855985836,
                "median tps" : 136915.70052587308,
                "min tps" : 132039.80052215396,
                "tasks_per_op" : 13.161013266902264
        },
        "test_properties" : 
        {
                "type" : "write"
        },
        "versions" : 
        {
                "scylla-server" : 
                {
                        "commit_id" : "1f51bc67fd",
                        "date" : "20210712",
                        "run_date_time" : "2021-07-13 10:25:29",
                        "version" : "4.6.dev"
                }
        }
}

@avikivity
Copy link
Member

"opts above" is the decoroutinization of the fast paths? Maybe they aren't effective? Hard to believe without switching to batch mode and running on a slow disk.

@elcallio
Copy link
Contributor Author

No, "opts above" as in the perf_simple_query options you requested. The runs are master and the coroutinized version.
I doubt batch mode will make much difference, since the "spill" here is perhaps too many coroutine frames in the fast path (alloc, no flush). Whereas if we actually do disk io, one alloc more or less is probably not gonna make much difference.

@avikivity
Copy link
Member

The batch mode comment was in the context of me thinking you applied the de-coroutinization and attempting to explain why they didn't help. Given the optimizations weren't made, it's not surprising they didn't help.

Batch mode still amortizes, so it's not one alloc more, it's three allocs per {as many writes as fit in a disk write}, which can be quite a lot.

@avikivity
Copy link
Member

So please try undoing those trouble spots and see if it helps. It should recover all of the performance, since the other cases are executed rarely, and should be faster with coroutines (once you wait, you get fewer allocs with coroutines compared to continuations).

@elcallio
Copy link
Contributor Author

Yes, but batch mode is likely to run into suspend-states anyway (i.e. wait for IO), in which case raw futures will allocate as well, so not much difference.

@elcallio
Copy link
Contributor Author

Dropping the patches for pure alloc yields

coroutine:
``´
{
"parameters" :
{
"concurrency" : 100,
"concurrency,partitions,cpus,duration" : "100,10000,1,30",
"cpus" : 1,
"duration" : 30,
"partitions" : 10000
},
"stats" :
{
"allocs_per_op" : 53.201318006144092,
"instructions_per_op" : 47716.653871200644,
"mad tps" : 834.63362257179688,
"max tps" : 142177.732341133,
"median tps" : 140167.17773041374,
"min tps" : 133047.56159665808,
"tasks_per_op" : 13.164679983578012
},
"test_properties" :
{
"type" : "write"
},
"versions" :
{
"scylla-server" :
{
"commit_id" : "1f51bc67fd",
"date" : "20210712",
"run_date_time" : "2021-07-13 11:50:16",
"version" : "4.6.dev"
}
}
}


There is still one additional alloc + ~200 instructions extra. I assume due to eventual flush/cycle. Sounds high, but maybe. 

@avikivity
Copy link
Member

It means we missed a fast-path coroutine. Perhaps it can be split into a coroutine part and a non-coroutine slow path.

@elcallio
Copy link
Contributor Author

Most likely segment_manager::active_segment. Building and testing without it as well.
So one conclusion is that clang is not super great at folding coroutines (I read some of the inline code etc, and this is openly admitted).
Question is, do you want me to try do refactoring already here, to manually fold things? It might make things equally fast, but also horrendous to read, which is kind of orthogonal to main intent. In the long run it would be nicer to save cycles on simplifying the code path(s).

@elcallio
Copy link
Contributor Author

Yup. Verified. Dropping coroutine in active_segment puts instruction count/alloc count pretty exactly at master level.

{
        "parameters" : 
        {
                "concurrency" : 100,
                "concurrency,partitions,cpus,duration" : "100,10000,1,30",
                "cpus" : 1,
                "duration" : 30,
                "partitions" : 10000
        },
        "stats" : 
        {
                "allocs_per_op" : 52.203484793819904,
                "instructions_per_op" : 47489.643215404969,
                "mad tps" : 417.87147938692942,
                "max tps" : 141381.03268001706,
                "median tps" : 140238.32004292417,
                "min tps" : 129661.0700612075,
                "tasks_per_op" : 13.167186626767688
        },
        "test_properties" : 
        {
                "type" : "write"
        },
        "versions" : 
        {
                "scylla-server" : 
                {
                        "commit_id" : "1f51bc67fd",
                        "date" : "20210712",
                        "run_date_time" : "2021-07-13 12:42:24",
                        "version" : "4.6.dev"
                }
        }
}

Calle Wilund added 24 commits July 19, 2021 08:17
Calling frames keeps object alive in all paths. Use references in
allocate()/allocate_when_possible()
Change args to values so stays on coroutine frame.
Remove pointless subscription/stream usage, just iterate.
Removes 2 coroutine frames in fast path (as long as segment + space is
avail). Puts IPS back on track with master.
And call it by-value with the polymorphic writers. This
eliminates outer coroutine frame and ensures we use only one
for fast-case allocation.
@elcallio
Copy link
Contributor Author

Rebased, removed gate_guard (using gate::holder now), updated PR blurb with perf info.

@elcallio elcallio requested a review from avikivity July 19, 2021 09:32
@elcallio
Copy link
Contributor Author

Le ping?

@scylladb-promoter scylladb-promoter merged commit 05fcf11 into scylladb:master Jul 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants