remove autogenerated code from ChannelMixer #1966

Be-ing · 2018-12-27T21:19:35Z

Previously, the autogenerated code allowed for a vectorizing
optimization by combining gain application (channel faders +
crossfader) with channel mixing. When postfader effects were
implemented in Mixxx 2.1 (PR #1254), these two operations
had to be decoupled to process effects between gain
application and channel mixing. Therefore, the
autogenerated code lost its advantage and became an
overcomplication.

Unused autogenerated functions were removed from SampleUtil
as well. Some of the autogenerated SampleUtil functions are
still in use, so they were kept.

daschuer · 2018-12-29T17:29:22Z

Thank you.
Can you add some benchmarks comparing the performance effects of these changes?

Be-ing · 2019-04-24T23:49:04Z

Whoops, I did not intend to close this.

Be-ing · 2019-10-23T18:17:18Z

I tested playing two decks for a few minutes (using the same tracks with each test):

master + 324b2fb cherry-picked:

Debug [Main]: Stat("EngineMaster::process after processChannels","count=8218,sum=4.20202e+08ns,average=51131.9ns,min=22165ns,max=1.15742e+06ns,variance=1.14174e+09ns^2,stddev=33789.6ns")

this branch:

Debug [Main]: Stat("EngineMaster::process after processChannels","count=7411,sum=4.26568e+08ns,average=57558.8ns,min=23141ns,max=1.55683e+06ns,variance=1.20842e+09ns^2,stddev=34762.3ns")

So this branch does seem to be slightly slower. The mean difference is 6426.9 nanoseconds. I think we can afford that to get rid of this awfully overcomplicated code.

Holzhaus · 2020-09-10T01:22:29Z

Can you fix the merge conflicts? I'd like to reduce the number of open PRs. I think this is a candidate for merging.

daschuer · 2020-09-10T05:19:09Z

The benchmarks, that proves that this is a good idea is still missing. Without it, we should not touch this code.

Be-ing · 2020-09-10T09:44:01Z

Huh? I posted that above.

poelzi · 2020-09-10T09:50:05Z

Our engine is still single threaded and with keylock enabled on one channel alone is near buffer maximum even on my machine (t470p). I rather don't like slowing down the engine for the reason of removing som automated code.

Holzhaus · 2020-10-18T13:28:05Z

@Be-ing Do you plan to pick it up and possibly fix the performance issuues or should we close this?

Be-ing · 2020-10-23T23:30:23Z

I do not know if the performance of this could be improved further. But I think this is the wrong place to be nitpicking about performance. The difference I measured before was 6426.9 nanoseconds. If we want to optimize the audio processing code, we should be talking about metering and timestretching. Other software uses assembler or Intel intrinsics to optimize metering.

Holzhaus · 2020-10-23T23:35:11Z

0.006426 milliseconds. I think we can live with that.

Holzhaus · 2020-10-23T23:39:49Z

Maybe we can use intel SIMD intrinsics on to work on multiple channels at once? But we'd need to rewrite that code and maintain multiple code paths for non-x86 architectures.

Be-ing · 2020-10-23T23:43:01Z

A more practical approach is using SIMD instructions for multiple samples within a single buffer.

Be-ing · 2020-10-23T23:45:01Z

https://github.com/Ardour/ardour/blob/8de420aab95d04943689993c3d313e8fe461d6db/libs/ardour/sse_functions_avx_linux.cc

Holzhaus · 2020-10-23T23:48:12Z

If we abstract it like this, we can also extend it to other Architectures that offer a similar SIMD instruction set: https://blog.molecular-matters.com/2011/10/18/simdifying-multi-platform-math/

Swiftb0y · 2021-11-14T16:10:00Z

I have dismissed my critical review since long, to free the space for another reviewer. No one has taken the chance since now, so there is no need to blame me.

I'm sorry, this was not clear to me. So all that is left for this PR is that someone takes on the responsibility for Approving it and then hitting merge?

daschuer · 2021-11-14T16:11:59Z

I don't think its super fair to expect anyone else to do this work if you are the only one insisting on the data resulting from that work.

I don't insist.

I guess I was not clear in my post. It is not to early to vote now.

Basing our votes on comments that seeing no regression from Uwes test is to early. I hope my post has clarified the situation.

daschuer · 2021-11-14T16:20:26Z

.... oh I see it was already considered as bikesheding.

That's not fair, especially because I have just complained about such voice.

Swiftb0y · 2021-11-14T16:18:07Z

src/engine/channelmixer.cpp

+    for (int i = 0; i < activeChannels->size(); ++i) {
+        EngineMaster::ChannelInfo* pChannelInfo = activeChannels->at(i);


consider replacing with ranged-for, might result in some performance benefit even.

How could that provide a performance benefit?

Compilers may be able to optimize better when conveying the intent better.

For example, currently activeChannels->size() is probably called on every iteration while the ranged-for implementation would only compare the begin- and end-pointers of the iterator. I'm not saying this is a huge benefit, I just believe that we should do what compiler writers expect from us so we get better code and they can possibly produce better binaries.

Readability would benefit form a range based for loop. That's the main advantage, everything else is just bonus.

Swiftb0y · 2021-11-14T16:29:51Z

.... oh I see it was already considered as bikesheding.

That's not fair, especially because I have just complained about such voice.

please elaborate, I don't understand what you mean.

daschuer · 2021-11-14T16:35:25Z

The thumb down on my post and the comment: #1966 (comment)

Swiftb0y · 2021-11-14T16:40:23Z

I agree, continuously repeating statements as a means to pressure actions is not nice either. I ask @Be-ing to refrain from this pattern of behavior, it's useless and happens to often.

rryan · 2021-11-14T17:08:52Z

Just providing some context -- I can't remember when this was written, but it was a big win at the time performance wise on my Lenovo T400 and my eeePC. This was around the time I was rewriting the engine to support more than 2 decks, and we were adding samplers and other mixing sources.

I'm surprised there isn't a mixing benchmark testing this code. I was pretty sure I wrote one -- maybe I never committed it.

I don't unroll loops or produce annoying-to-maintain code for fun -- just so everyone is clear :).

Be-ing · 2021-11-14T17:11:44Z

I don't unroll loops or produce annoying-to-maintain code for fun -- just so everyone is clear :).

Of course, I don't think anyone thought you did. :P But those optimizations became irrelevant with #1254 because mixing and applying the gain of the faders cannot be done in the same step with postfader effects.

rryan · 2021-11-14T17:32:41Z

Just providing some context -- I can't remember when this was written

Ah.. 2013.
63b8411

Of course, I don't think anyone thought you did. :P But those optimizations became irrelevant with #1254 because mixing and applying the gain of the faders cannot be done in the same step with postfader effects.

Right.. the original code called out to an inlined autogenerated "mix and apply ramping gain to N channels" routine, which was the key performance win over serial application of ramping gain then accumulation into an output buffer. Writing the code that way let auto-vectorization take over and do good things. (at the time we were writing SIMD by hand in SampleUtil too, but that was a pain to maintain).

The one additional bit of information we're giving the compiler here is telling it that the number of elements in this channels array is small (<32) -- but I don't think the loop body in its current form can be optimized anyway.

rryan · 2021-11-14T18:16:35Z

I'm scratching my head trying to explain the performance differences measured by @uklotzde or @Be-ing.

Even if EngineEffectsManager::processPostFaderAndMix can be inlined (don't think it can) the code in there only deals with a single buffer, then adds it to the output buffer so I believe it's no different than the serial "process then add to output" flow we had before we added generated code and the generated code in its current form.

I suspect the timer measurements are noisy potentially due to reasons @daschuer suggested (e.g. did you use the start/stop timing feature once the engine was already in a steady state, or was the timing from program startup? was mixxx the only program running? cpufreq "performance" governor enabled, etc.).

rryan · 2021-11-14T23:05:19Z

I support merging this as is for the reasons in my above two comments.

Generally +1 to the request for microbenchmarks of channel mixing for a range of common channel counts, I'm sorry they don't already exist.

Can we merge this or is this another victim of bikeshedding?
No more bikeshedding. Vote now.

I think it's reasonable and not bikeshedding to request benchmarks for a change that is primarily concerning a maintenance vs. performance trade-off.

Be-ing · 2021-11-19T20:14:19Z

After 5 days we have 3 👍 s and 0 👎 s.

uklotzde · 2021-11-20T09:11:07Z

The benchmarks should have been provided with the auto-generated code. Requesting them as a requisite for removing code that is difficult to maintain and prevents innovation is unfair. Especially in conjunction with a veto that puts all burden on the one who proposes the change. In this respect I support Jan's position as outlined in #1966 (comment)

uklotzde · 2021-11-20T09:11:16Z

Merge?

uklotzde · 2021-11-20T09:19:32Z

On a second thought: Even if we had those benchmarks we would need to run them on a variety of platforms including different CPU architectures for meaningful results. We don't have the resources to do that.

JoergAtGithub · 2021-11-20T10:30:04Z

On a second thought: Even if we had those benchmarks we would need to run them on a variety of platforms including different CPU architectures for meaningful results. We don't have the resources to do that.

For such encapsulate code units, you can execute a unit test in callgrind. Callgrind emulates an artificial CPU and returns the needed CPU cycles as reproduceable number. Due to the CPU emulation, it's independent of the real CPU and can be run in CI. Using an optimized release binary, this is a good indicator if compiler optimizations like SIMD vectorization work for a code unit.
This is a general remark and I do not recommend to implement this in this PR!

Be-ing · 2021-11-20T11:25:34Z

Merge?

The vote is already 4 in favor with 0 opposing.

Be-ing · 2021-11-20T11:26:37Z

@JoergAtGithub if you want to setup callgrind on CI in another PR, that seems like it could be quite helpful.

Be-ing · 2021-11-20T14:23:25Z

Is anyone going to merge this or will I have to merge my own PR?

uklotzde · 2021-11-20T14:41:33Z

@Be-ing Please be patient. I will merge it.

PS: Being pressured or getting into crossfire is not fun. I don't need to deal with this and could turn away from Mixxx at any time for my own peace of mind.

Be-ing · 2021-11-20T14:45:06Z

What is the reason for not merging it already? I don't see that anybody has articulated one.

uklotzde · 2021-11-20T14:49:09Z

I have left a comment and opened a PR for this branch as you noticed. But apparently you prefer to be impatient.

I will close my PR and not merge this PR. Done. Someone else may take over.

Be-ing · 2021-11-20T15:00:09Z

I don't think it's fair to ask me to be patient on a 3 year old PR.

uklotzde · 2021-11-20T15:03:20Z

It is unfair to pressure me after I started to pick up this PR just a few hours ago. Please rethink your behavior.

rryan · 2021-11-21T16:39:36Z

The benchmarks should have been provided with the auto-generated code.

I agree! Sorry, again :).

Requesting them as a requisite for removing code that is difficult to maintain and prevents innovation is unfair.

I only sort of agree -- the engine code is complex and difficult to maintain, in part because the problem domain of real-time friendly code is sometimes at odds with maintainability. In the general case, I think some changes that have an obvious performance risk should come with benchmarks to show it's safe (and it's ok to put the burden of proof on the code author who claims the change is safe).

rryan · 2021-11-21T16:49:37Z

After 5 days we have 3 👍 s and 0 👎 s.

Speaking for myself, I voted yes to "merge as is".

Since this PR has generated so many bad feelings all around, I think we should merge and move on.

Thanks for the improvements PR, @uklotzde! If you retarget to main, will that work?

Be-ing force-pushed the audio_engine_cleanup branch from 402db72 to 6124eb2 Compare December 27, 2018 21:21

uklotzde added this to the 2.3.0 milestone Jan 12, 2019

Be-ing modified the milestones: 2.3.0, 2.4.0 Feb 7, 2019

Be-ing closed this Apr 24, 2019

Be-ing deleted the audio_engine_cleanup branch April 24, 2019 23:48

Be-ing restored the audio_engine_cleanup branch April 24, 2019 23:48

Be-ing reopened this Apr 24, 2019

Be-ing requested a review from rryan November 12, 2019 04:45

Holzhaus added this to In progress in 2.4 release Apr 9, 2020

Be-ing moved this from In progress to Needs review in 2.4 release Jun 15, 2020

Be-ing added the code quality label Jun 15, 2020

Be-ing changed the base branch from master to main October 23, 2020 23:28

Be-ing force-pushed the audio_engine_cleanup branch from 324b2fb to 6f02998 Compare October 24, 2020 20:38

github-actions bot added the build label Oct 24, 2020

Be-ing force-pushed the audio_engine_cleanup branch from 6f02998 to 0a5a203 Compare October 24, 2020 20:39

Swiftb0y reviewed Nov 14, 2021

View reviewed changes

Be-ing mentioned this pull request Nov 20, 2021

ChannelMixer: Use range-based for loops Be-ing/mixxx#59

Closed

rryan merged commit 4cdc52b into mixxxdj:main Nov 21, 2021

2.4 release automation moved this from Needs review to Done Nov 21, 2021

		for (int i = 0; i < activeChannels->size(); ++i) {
		EngineMaster::ChannelInfo* pChannelInfo = activeChannels->at(i);

remove autogenerated code from ChannelMixer #1966

remove autogenerated code from ChannelMixer #1966

Conversation

Be-ing commented Dec 27, 2018

daschuer commented Dec 29, 2018

Be-ing commented Apr 24, 2019

Be-ing commented Oct 23, 2019

Holzhaus commented Sep 10, 2020

daschuer commented Sep 10, 2020

Be-ing commented Sep 10, 2020

poelzi commented Sep 10, 2020

Holzhaus commented Oct 18, 2020

Be-ing commented Oct 23, 2020

Holzhaus commented Oct 23, 2020 • edited Loading

Holzhaus commented Oct 23, 2020 • edited Loading

Be-ing commented Oct 23, 2020

Be-ing commented Oct 23, 2020

Holzhaus commented Oct 23, 2020

Swiftb0y commented Nov 14, 2021

daschuer commented Nov 14, 2021

daschuer commented Nov 14, 2021

Swiftb0y Nov 14, 2021

Choose a reason for hiding this comment

Be-ing Nov 14, 2021

Choose a reason for hiding this comment

Swiftb0y Nov 14, 2021

Choose a reason for hiding this comment

Swiftb0y Nov 14, 2021

Choose a reason for hiding this comment

uklotzde Nov 20, 2021

Choose a reason for hiding this comment

uklotzde Nov 20, 2021

Choose a reason for hiding this comment

Swiftb0y commented Nov 14, 2021

daschuer commented Nov 14, 2021

Swiftb0y commented Nov 14, 2021

rryan commented Nov 14, 2021

Be-ing commented Nov 14, 2021

rryan commented Nov 14, 2021 • edited Loading

rryan commented Nov 14, 2021

rryan commented Nov 14, 2021

Be-ing commented Nov 19, 2021

uklotzde commented Nov 20, 2021

uklotzde commented Nov 20, 2021

uklotzde commented Nov 20, 2021

JoergAtGithub commented Nov 20, 2021

Be-ing commented Nov 20, 2021

Be-ing commented Nov 20, 2021

Be-ing commented Nov 20, 2021

uklotzde commented Nov 20, 2021

Be-ing commented Nov 20, 2021

uklotzde commented Nov 20, 2021

Be-ing commented Nov 20, 2021

uklotzde commented Nov 20, 2021

rryan commented Nov 21, 2021

rryan commented Nov 21, 2021 • edited Loading

Holzhaus commented Oct 23, 2020 •

edited

Loading

Holzhaus commented Oct 23, 2020 •

edited

Loading

rryan commented Nov 14, 2021 •

edited

Loading

rryan commented Nov 21, 2021 •

edited

Loading