Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute skylines and bottom lines in batches (+WebGL): 30 to 60% performance gain #1158

Merged

Conversation

paxbun
Copy link
Contributor

@paxbun paxbun commented Mar 23, 2022

image

Introduction

The current rendering logic requires calculation of skyline and bottom line information of each measure, which is the main bottleneck of OSMD. For example, when rendering Clementi's Sonatina Op. 36 No. 3 with OSMD on Microsoft Edge in a M1 Mac mini device which takes 765 milliseconds on average, skyline and bottom line computation takes about 450 milliseconds. Almost half of the computation is for drawing measures in a temporary canvas, and another half is for retrieving pixel data (ctx.getImageData()) from the canvas.

The current logic draws a single measure on a single canvas, so the number of calls to ctx.getImageData() is equal to the number of measures, which makes the computation very slow. This fix is to reduce the bottleneck of retrieving pixel data by drawing multiple measures in a single canvas, so OSMD can render the sheet music with fewer number of calls to functions accessing the pixel data. This fix also introduces WebGL-accelerated skyline and bottom line calculation, which computes the lines of measures on a single canvas simultaneously.

Summary of change

  • Removed calculateLines() from SkyBottomLineCalculator.

  • Renamed SkyBottomLineCalculator to SkyBottomLine.ts, as it does not contain the calculation logic.

  • Added classes for batch calculation

    • SkyBottomLineBatchCalculator
    • SkyBottomLineBatchCalculatorBackend
      • PlainSkyBottomLineBatchCalculatorBackend
      • WebGLSkyBottomLineBatchCalculatorBackend
    • SkyBottomLineCalculationResult
  • Updated module rules in webpack.common.js to import GLSL files

    • Added global.d.ts for typing
  • Moved calls to SkyBottomLineCalculator from MusicSheetCalculator to VexFlowMusicSheetCalculator

    • SkyBottomLineCalculator references VexFlow* classes which makes a circular dependency. For some reason, MusicSheetCalculator in
      class VexFlowMusicSheetCalculator extends MusicSheetCalculator
      is undefined in runtime. I had to remove this circular dependency to fix this issue.

Benchmarks

(The width of body is set to 900px in all tests)

  • Initial rendering time in milliseconds
Browser Before Plain WebGL
Edge (Avg) 765.61 586.26 494.49
Edge (Stdev) 34.18 25.80 16.83
Safari (Avg) 656.28 489.55 577.97
Safari (Stdev) 40.76 32.15 40.17
  • Initial rendering time improvements
Browser Plain WebGL
Edge x1.30 x1.55
Safari x1.34 x1.14
  • Rendering time taken in repetitive rendering
    (The improvement becomes x1.55 - x1.7 with WebGL on Edge in subsequent renderings)
    Microsoft Edge 99
    image
    Safari 15 4
    image

What if the browser does not support WebGL?

PlainSkyBottomLineBatchCalculatorBackend operates as the fallback logic.

What makes the WebGL version faster?

From the canvas where the measures are drawn (see the image on the top), the WebGL logic generates an image which has the same width with the canvas and whose height is equal to the number of rows of the canvas. That is, the number of pixels in the output are 300 times fewer than the input, which makes the time taken to gl.readPixels shorter. Also, the WebGL version computes the lines of all measures in the canvas at the same time.

Why is the WebGL version is slower on Safari?

While gl.texImage2D takes about 12 milliseconds on Microsoft Edge (Chromium), gl.texImage2D takes about 90 milliseconds on Safari. gl.texImage2D converts the canvas where the measures are drawn into a texture, so there might be an implicit pixel data copy. There might be another browser which shows similar behavior. We have to test on as many browsers as we can, so we can select specific SkyBottomLineBatchCalculatorBackend according to the value of navigation.userAgent.

Potential issues

  • The canvas where the measures are drawn is big (about 2400 x 1500), so there might be some memory issue.

Benchmark codes

(I don't have detailed banchmark with other IOSMDOptions, but the trend is similar)

import { IOSMDOptions, OpenSheetMusicDisplay } from "opensheetmusicdisplay";

const url = "https://opensheetmusicdisplay.github.io/demo/MuzioClementi_SonatinaOpus36No3_Part1.xml";

function getOSMDOptions(): IOSMDOptions {
  return {
    autoResize: true,
    backend: "svg",
    drawLyricist: false,
    drawLyrics: false,
    drawFingerings: true,
    drawTitle: false,
    drawComposer: false,
    drawCredits: false,
    drawSubtitle: false,
    drawPartNames: false,
    drawPartAbbreviations: false,
    drawingParameters: "compact",
  };
}

async function main() {
  const container = document.getElementById("container")! as HTMLDivElement;
  const options = getOSMDOptions();
  const osmd = new OpenSheetMusicDisplay(container, options);
  await osmd.load(url);

  const samples: number[] = [];
  for (let i = 0; i < 10; ++i) {
    const start = window.performance.now();
    osmd.render();
    const end = window.performance.now();
    samples.push(end - start);
  }

  for (const sample of samples) {
    console.log("OSMD: ", sample);
  }

  const result = `[${navigator.userAgent}] ${samples.join(",")}`;
  await fetch("C# backend URL - code is below", {
    method: "POST",
    body: result
  });
}

window.addEventListener("load", main);
using System.Text.RegularExpressions;

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddCors(options =>
{
    options.AddDefaultPolicy(policyBuilder =>
    {
        policyBuilder.AllowAnyOrigin();
        policyBuilder.AllowAnyHeader();
        policyBuilder.AllowAnyMethod();
    });
});

var app = builder.Build();
var statistics = new Dictionary<string, List<List<double>>>();

app.UseCors();
app.MapPost("/", async context =>
{
    StreamReader reader = new(context.Request.Body);
    Regex bodyRegex = new(@"\[(.*)\] (.*)");

    string body = await reader.ReadToEndAsync();
    Match match = bodyRegex.Match(body);
    if (!match.Success)
        return;

    string userAgent = match.Groups[1].Value;
    string timeElapsed = match.Groups[2].Value;

    List<double> results = new();
    foreach (var element in timeElapsed.Split(","))
    {
        if (!double.TryParse(element, out double result))
            return;
        results.Add(result);
    }

    if (!statistics.TryGetValue(userAgent, out List<List<double>>? list))
    {
        list = new List<List<double>>();
        statistics.Add(userAgent, list);
    }

    list.Add(results);
});

app.MapGet("/", () =>
    statistics.Select(pair =>
    {
        (var userAgent, List<List<double>> samples) = pair;
        int maxLength = samples.Select(list => list.Count).Max();

        List<ResultEntry> entries = new();
        for (int i = 0; i < maxLength; ++i)
        {
            double[] ithSamples = samples.Where(list => i < list.Count).Select(list => list[i]).ToArray();
            double average = ithSamples.Sum() / samples.Count;
            double standardDeviation = 0.0;
            if (ithSamples.Length != 1)
            {
                double variation = ithSamples.Select(sample => Math.Pow(sample - average, 2)).Sum() / (samples.Count - 1);
                standardDeviation = Math.Sqrt(variation);
            }
            
            entries.Add(new ResultEntry(average, standardDeviation, ithSamples.Length));
        }
        
        return new Result(userAgent, entries);
    })
);

app.Run();

record Result(string UserAgent, IEnumerable<ResultEntry> Entries);

record ResultEntry(double Average, double StandardDeviation, int NumSamples);

@paxbun
Copy link
Contributor Author

paxbun commented Mar 23, 2022

Isn't there a possibility that WebGL logic is not tested correctly due to lack of GPU support in CI/CD test environment?

@sschmidTU
Copy link
Contributor

sschmidTU commented Mar 30, 2022

I thought gl provided the WebGLRenderingContext or was otherwise necessary for WebGL to be used in the browser, but that seems to not be the case, because I could run a fresh clone of osmd after immediately deleting gl from the package.json entirely, and npm install and npm start worked without issues. Only npm run generate:current didn't work of course, which uses the library directly.

@paxbun Is it correct that the gl npm package is only needed for the generateImages_browserless.mjs script? If so, it definitely needs to be moved to the devDependencies section, if not to the optionalDependencies section (in package.json). It's fine to move to optionalDependencies as well, the developer just needs to be aware that they can't run visual regression tests then (or generate PNGs/SVGs via this script).

sschmidTU pushed a commit that referenced this pull request Mar 30, 2022
@sschmidTU
Copy link
Contributor

sschmidTU commented Mar 31, 2022

For reference, one very small (and unimportant) issue I just noticed: I ran the visual regression tests with skylines and bottomlines enabled for all samples, and it found one diff, in OSMD_Function_Test_Drums_one_line_snare_plus_piano.musicxml:
before:
image
after:
image

diff:
image

This is probably not important enough to try fixing. It would only change a small part of the skyline that doesn't affect anything and is invisible unless you enable the skyline. Just wanted to mention it for reference.
If that's the only change for all skylines in all our samples and otherwise it's just faster, that's fantastic!

@paxbun
Copy link
Contributor Author

paxbun commented Mar 31, 2022

@sschmidTU I am very very sorry about this. gl is definitely for unit testing, so it must be in devDependencies, not in dependencies. WebGL is supported natively in all modern web browsers (even in Internet Explorer!), so using 3rd-party library for gl stuff is not needed.

@sschmidTU
Copy link
Contributor

sschmidTU commented Mar 31, 2022

No problem, thanks for confirming!
I'll consider whether it should be optionalDependencies or devDependencies. Moving to devDependencies for now.

(see PR #1160 for this, working around gl not being able to install)

sschmidTU pushed a commit that referenced this pull request Mar 31, 2022
…#1158)

gl is only needed for the generateImages_browserless.mjs script (and thus for visual regression testing),
not for the main build.

still considering whether it should be optionalDependencies because of build issues in some cases,
see #1158
@paxbun
Copy link
Contributor Author

paxbun commented Mar 31, 2022

gl is hard to install (which @infojunkie mentioned) as well as in macOS (which I'm using now) since the Python interpreters (2 and 3) which was installed by default in previous versions is now removed in macOS Monterey 12.3, I think it should be moved to optionalDependencies and we have to modify generateImages_browserless.mjs to load gl dynamically so gl is not used when using the plain algorithms or show some error if --webgl was given and gl is not installed. I will make a commit for this.

@sschmidTU
Copy link
Contributor

sschmidTU commented Mar 31, 2022

Ah, yes, that sounds like a very good solution! I agree, I didn't know it was difficult to install on macOS as well, so putting it under optionalDependencies and trying to load it dynamically in the browserless script (otherwise using the non-gl version) sounds like the best solution for everyone.
The problem only affects the browserless script anyways, not the browser builds.

paxbun added a commit to paxbun/opensheetmusicdisplay that referenced this pull request Mar 31, 2022
@paxbun paxbun deleted the improvement/webgl-skybottom-line branch March 31, 2022 02:13
sschmidTU pushed a commit that referenced this pull request Mar 31, 2022
…ild errors (#1160)

The build of gl can fail on linux if you have gcc-11 (instead of gcc-10) installed,
so it was moved to optionalDependency,.
since it's only used in the generateImages_browserless script and visual regression tests,
and the import of gl was made dynamic with try catch,
so that the script still works even if gl could not be installed on your system.

squashed commits from PR #1160:

* chore: dynamic import() for gl in generateImages_browserless.mjs

* chore: move gl to optionalDependencies (from devDependencies) in package.json (#1158)

* chore: restore removed generate:blessed NPM script
sschmidTU pushed a commit that referenced this pull request Apr 19, 2022
…#1158)

Plain was clearly faster than WebGL for me in Firefox on Windows and in a Linux VM, see #1158
more performance testing, also in different machines, could lead to different results.
@sschmidTU
Copy link
Contributor

sschmidTU commented Apr 19, 2022

@paxbun I did some performance benchmarks, and WebGL was clearly faster for me in Chrome and Edge, but clearly slower in Firefox. So I've now set the preferred backend to Plain instead of WebGL for Firefox as well. Obviously, more testing on different machines and better benchmarks would be great.

In #1160 (comment) you mentioned that non-batch was faster than batch for you on a Windows machine. Which browser did you use? Non-batch was about 3 times as slow as batch for me in Chrome, and around 10% slower in Firefox.

My benchmarks:

// render Actor prelude once:
console.time();
osmd.clear();
await osmd.load("ActorPreludeSample.xml");
osmd.render();
console.timeEnd();
// render a piece 30 times in a row (Beethoven Geliebte) (unrealistic use case):
console.time();
for (let i=1; i <= 30; i++) {
    osmd.render();
}
console.timeEnd()

Results (Windows):

WebGL-Chrome (Actor 1 times):
2528.541015625 ms
2816.9599609375 ms
2306.72998046875 ms
2286.68896484375 ms
2324.35302734375 ms
2347.677001953125 ms

Plain-Chrome (Actor 1 times):
4152.33203125 ms
3906.240966796875 ms
3926.009765625 ms
3952.625 ms
3905.329833984375 ms

WebGL-Chrome-nonbatch (Actor 1 times):
6858.423095703125 ms
7329.387939453125 ms
6508.819091796875 ms
6274.072265625 ms
6364.10302734375 ms

WebGL-LinuxVM-Firefox (Actor 1 times):
7585ms
7859ms
7815ms
7960ms
7968ms

Plain-LinuxVM-Firefox (Actor 1 times):
7355ms
6150ms
6266ms
6415ms
6143ms

WebGL-Chrome (Beethoven 1 times):
459.828125 ms
300.888916015625 ms
471.35205078125 ms
378.969970703125 ms
327.675048828125 ms

Plain-Chrome (Beethoven 1 times):
358.01611328125 ms
377.591796875 ms
346.84619140625 ms
340.8759765625 ms
310.9580078125 ms

WebGL-Firefox (Actor 1 times):
7053ms
7326ms
6897ms
7197ms
6981ms

Plain-Firefox (Actor 1 times):
5411ms
5306ms
5284ms
5334ms
5282ms

Plain-Firefox-nonbatch (Actor 1 times):
5682ms
6352ms
5499ms
5392ms
5883ms

WebGL-Edge (Actor 1 times):
2465.232177734375 ms
2216.76416015625 ms
2071.905029296875 ms
2005.43212890625 ms
2031.97509765625 ms

Edge-plain (Actor 1 times):
3639.065185546875 ms
3543.73583984375 ms
3614.02197265625 ms
3571.411865234375 ms
3749.271728515625 ms

Rendering the same piece 30 times in a row is slightly slower in WebGL compared to Plain, but i think that's an unrealistic use case, and might be due to the startup performance cost of creating a WebGL context (sometimes the browser also complains about too many WebGL contexts created):
WebGL-Chrome (30 renders, Beethoven Geliebte):
7173.1201171875 ms
7107.591064453125 ms
7045.744140625 ms
6982.450927734375 ms
7134.962158203125 ms

Plain-Chrome (30 renders, Beethoven Geliebte):
6159.014892578125 ms
6113.953125 ms
5474.030029296875 ms
5697.508056640625 ms
5676.974853515625 ms

This was on a Windows machine. There were other background processes with around ~18% CPU usage.

The most important performance issue in OSMD and performance use case is loading one big piece once, which can take several seconds on slow machines. And in Chrome and Edge, that's significantly faster with WebGL. (In Firefox and Safari, we use Plain by default)

sschmidTU pushed a commit that referenced this pull request Apr 19, 2022
@paxbun
Copy link
Contributor Author

paxbun commented Apr 19, 2022

@sschmidTU It was tested on Windows Edge 99 as well, but with only 100 measures. I don't have detailed results now, but the trend was as follows:
image

That's why I said WebGL was slower. The slope of WebGL's graph was obviously more gradual, but I thought the x-intercept was too high.

@sschmidTU
Copy link
Contributor

sschmidTU commented Apr 19, 2022

Yes, it seems like WebGL has some startup/windup cost. I like your hand-drawn graph!
But as I said, the most important performance issue in OSMD is rendering large scores not too slowly. It's important whether a large score takes 4 seconds or 7 seconds, but it's not so important whether a small score takes 350ms or 400ms.
That's why I (would) set the default to WebGL and batch for Chrome and Edge.

@sschmidTU
Copy link
Contributor

sschmidTU commented Apr 19, 2022

@paxbun We could only use WebGL if the sheet has a minimum number of measures, like with batch processing. Our default Beethoven piece has 15 measures and 3 staves = 45 graphical measures (3x15) and it's slightly slower with WebGL, so maybe we start using WebGL at 60 graphical measures minimum?
(edit: now set to 80, since scores with 60 graphical measures were still slightly faster or just as fast with plain)

So, we should count graphical measures, not just the "number of measures" (SourceMeasures) the piece has. For example, the Beethoven piece has 15 measures, multiplied by 3 instruments/staves (plural of staff) (voice, piano left+right hand). The Actor prelude, which is our default very large piece for testing, has "only" 33 measures, but it also has 23 staves, so 33*23 = 759 graphical measures.
To get the number of staves (vertical staffs/graphical measures per measure), you can check osmd.Sheet.Staves.length or osmd.GraphicSheet.MeasureList[0].length (multiplied by osmd.Sheet.SourceMeasures.length).
Or just go through the whole 2D-Array osmd.GraphicSheet.MeasureList[i][j] and count each (non-undefined) graphical measure in a loop, which could be more accurate in some cases (e.g. with multiple measure rests, where the graphical measures that are included in the multiple rest measure are undefined, because they're not rendered).

Would you be interested in writing some tests or benchmarks for performance? We could also use something like BrowserStack or Selenium to automatically test different browsers and systems.

@paxbun
Copy link
Contributor Author

paxbun commented Apr 20, 2022

@sschmidTU

... maybe we start using WebGL at 60 graphical measures minimum?

I think 60 is a reasonable choice. I set this to 5, but considering the test results on my Windows machine and yours, 60 is much better than 5.

To get the number of staves ...

I thought I was counting the number of graphical measures (as in VexFlowMusicSheetCalculator.ts), but does this incorrectly calculates the total number of graphical measures? BTW, are measures filled with multi-measure rests also counted as graphical measures?

Would you be interested in writing some tests or benchmarks for performance?

I would really like to, but since I'm working on another project now, I can't guarantee that I can write benchmarks for this at the moment. I will let you know before this weekend when I finish the task right now.

@sschmidTU
Copy link
Contributor

sschmidTU commented Apr 20, 2022

I thought I was counting the number of graphical measures (as in VexFlowMusicSheetCalculator.ts), but does this incorrectly calculates the total number of graphical measures?

This does look correct, that's just a different way to get all the graphical measures. (though it doesn't respect multi-measure rests, see below)

BTW, are measures filled with multi-measure rests also counted as graphical measures?

A measure with a multi-rest is one graphical measure. The graphical measures that would appear if we didn't use a multi-rest measure are undefined and not rendered.
For example here, measure 1 is a (multi-rest) graphical measure that is rendered, while graphical measures 2-5 are undefined and not rendered.
image

So, a piece with multiple measure rests rendered has less graphical measures than Staffs x SourceMeasures.
You would still get the right number of graphical measures by not counting those that are undefined. So I guess that is an adjustment that would need to be made in the line of code you referenced above.

sschmidTU pushed a commit that referenced this pull request Apr 22, 2022
…nd AlwaysSetPreferredSkyBottomLineBackendAutomatically (#1158), fix measure counting

fix measure threshold to include only graphical measures that are rendered (applies to multi-measure rests)
refactors, comments, jsdoc

getting ready for release 1.5.0
sschmidTU pushed a commit that referenced this pull request Apr 22, 2022
…leWebGLInSafariAndIOS for options (#1158)

Currently WebGL is always slower in Firefox and Safari, but that may change with new versions of these browsers,
so it's good to have the option to enable WebGL in these cases
@sschmidTU
Copy link
Contributor

sschmidTU commented Apr 22, 2022

I did some performance tests on a Macbook (late 2012 Pro),
basically confirming that WebGL is slower in Safari and Firefox,
but WebGL is faster in Chrome on MacOS:

Webgl Safari Actor
19082.098ms
16388.795ms
15661.948ms
14636.030ms
14693.365ms

Plain Safari Actor
14736.309ms
13940.483ms
14715.820ms
14822.173ms
14957.695ms

Plain Firefox Actor
11600ms
10208ms
9771ms
10323ms
9860ms

WebGL Firefox Actor
13191ms
12929ms
13251ms
12298ms
13142ms

WebGL Chrome (MacOS) Actor
7257.363037109375 ms
6878.738037109375 ms
6146.2353515625 ms
6081.171142578125 ms
6288.977783203125 ms

Plain Chrome (MacOS) Actor
9005.740966796875 ms
9246.75390625 ms
9706.58203125 ms
9239.13720703125 ms
9062.65869140625 ms

(WebGL is enabled by default for Chrome on MacOS, because navigator.vendor doesn't include Apple)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants