Fractions comparison efficiency #4699

velochy · 2019-02-18T16:18:18Z

Unrolled the recursion in gcd and inlined both gcd and lcm.
Handled special case where denominators match.
Any comments welcome.

jthistle

Looks good apart from the few style things I've mentioned. Also, since we're making lcm and gcd inline, why not make Fraction::ticks() inline as well to save on the overheads? It'd probably make a significant, if not immediately noticeable difference.

jthistle · 2019-02-18T16:22:24Z

libmscore/fraction.cpp

@@ -139,6 +141,7 @@ Fraction& Fraction::operator*=(const Fraction& val)
      {
      _numerator *= val._numerator;
      _denominator  *= val._denominator;
+      if (val._denominator!=1) reduce(); // We should be free to fully reduce here


space around operators again

jthistle · 2019-02-18T16:22:43Z

libmscore/fraction.cpp

@@ -156,6 +159,8 @@ Fraction& Fraction::operator/=(const Fraction& val)
      {
      _numerator   *= val._denominator;
      _denominator *= val._numerator;
+      if (_denominator<0) { _denominator *= -1; _numerator *= -1; }
+      if (val._numerator!=1) reduce();


space around operators again

jthistle · 2019-02-18T16:23:36Z

libmscore/fraction.cpp

+      if (_denominator == val._denominator) _numerator += val._numerator;  // Common enough use case to be handled separately for efficiency
+      else {
+            int l = lcm(_denominator,val._denominator);
+            _numerator = _numerator * (l/_denominator) + val._numerator * (l/val._denominator);


space around operators again

jthistle · 2019-02-18T16:24:11Z

libmscore/fraction.cpp

      {
-      const auto product = static_cast<int_least64_t>(a) * b;
-      return static_cast<int>(product / gcd(a, b));
+      return (a/gcd(a, b)) * b; // Divide first to minimize overflow risk


Needs space around operators, as per MuseScore style guidelines.

jthistle · 2019-02-18T16:24:16Z

libmscore/fraction.cpp

-            return a < 0 ? -a : a;
-      return gcd(b, a % b);
+      int bp;
+      while(b!=0) {


Needs space around operators, as per MuseScore style guidelines.

dmitrio95 · 2019-02-18T16:49:12Z

libmscore/fraction.cpp

@@ -21,22 +21,24 @@ namespace Ms {
 //    greatest common divisor
 //---------------------------------------------------------

-static int gcd(int a, int b)
+static inline int gcd(int a, int b)


inline doesn't make sense here actually. This keyword does not influcence function call inlining by a compiler, and actual inlining depends on compiler settings. A compiler should anyway be able to inline this function calls since it is defined in the same translation unit as those functions that use it (and before them).

But would it make sense, and if so would you recommend it, for ticks()?

Giving a possibility for a compiler to inline functions would certainly make sense for something used very often, but, as @shoogle already noted, it should be done by moving its definition to a header file rather than by simply prepending inline keyword.

shoogle · 2019-02-18T16:59:04Z

To get the real speed increase you should try moving all definitions into fraction.h. Having everything in the .h file allows the compile to inline it rather than having to resolve links, but you should let the compiler decide whether to do this rather than using the inline keyword. Of course this is at the cost of having to recompile everything whenever fraction.h changes, but that won't happen very often. There doesn't really need to be a .cpp file for something as basic and fundamental as Fraction.

…ns more efficient

shoogle · 2019-02-20T17:40:03Z

Benchmarks:

Hardware

2015 MacBook Pro with Retina display. (specs)

OS: Ubuntu 18.04.2 LTS (Bionic Beaver)
CPU: Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz (from output of lscpu)
- 4 cores + hyper-threading (8 logical cores)
RAM: 16GB
- 2x 8GiB SODIMM DDR3 Synchronous 1600 MHz (from output of sudo lshw -short -C memory)

Software

MuseScore AppImage nightly builds:

MuseScoreNightly-201902171259-master-8057bc5-x86_64.AppImage
- Build from 3 days ago. Uses integer ticks.
MuseScoreNightly-201902182337-master-95ce138-x86_64.AppImage
- Build from 2 days ago. Uses unoptimized fractions.
MuseScoreNightly-201902201112-master-cb404f9-x86_64.AppImage
- Build from today. Uses optimized fractions.

Input

I used the OpenScore Edition of Mozart's Jupiter Symphony.

Pages: 157
Duration: 39m19s
Measures: 925
Parts: 17

The score was created in MuseScore 2, so I converted it to MuseScore 3 format before running these tests.

Method

Each test is the time (from Bash time) taken to convert the MSCZ score to one other format on the command line. Each test was run three time and the median (middle) value reported. The WAV test were only run once.

Results

Format	Time	Ticks	Fractions (unoptimized)	Fractions (optimised)
MSCX	real user sys user+sys	7.835s 6.056s 0.630s 6.686s	8.370s (+6.83%) 6.407s (+5.80%) 0.697s (+10.6%) 7.104s (+6.25%)	8.222s (+4.94%) 6.301s (+4.05%) 0.664s (+5.40%) 6.965s (+4.17%)
MIDI	real user sys user+sys	5.415s 4.109s 0.139s 4.248s	6.094s (+12.5%) 4.839s (+17.8%) 0.128s (-7.91%) 4.967s (+16.9%)	5.657s (+4.47%) 4.323s (+5.21%) 0.146s (+5.04%) 4.469s (+5.20%)
PDF	real user sys user+sys	5.081s 3.574s 0.181s 3.755s	5.338s (+5.06%) 3.810s (+6.60%) 0.163s (-9.94%) 3.973s (+5.81%)	5.207s (+2.48%) 3.630s (+1.57%) 0.175s (-3.31%) 3.805s (+1.33%)
SVG	real user sys user+sys	11.382s 9.795s 0.320s 10.115s	11.730s (+3.06%) 10.164s (+3.77%) 0.351s (+9.69%) 10.515s (+3.95%)	11.455s (+0.641%) 10.011s (+2.21%) 0.330s (+3.13%) 10.341s (+2.23%)
WAV	real user sys user+sys	1m50.981s 1m47.836s 0.983s 1m48.819s	1m51.892s (+0.821%) 1m49.027s (+1.10%) 0.969s (-1.42%) 1m49.996s (+1.08%)	1m50.948s (-0.0297%) 1m48.552s (+0.664%) 0.867s (-11.8%) 1m49.419s (+0.551%)

Notes:

See this StackOverflow answer for the meaning of real, user and sys.
Markdown table was generated with this spreadsheet.

Interpretation

Overall, unoptimised Fractions led to around a 5% slowdown compared to integer ticks, but @velochy's optimisations claw back about a third of this penalty. A pretty good result then!

The optimizations were definitely worth it in this case, but we should remember that this is only because Fractions are used literally everywhere in the code! For most other things it is far more important to write clean code that is easy to maintain than to spend time worrying about data types and optimizations.

In particular, the trick of putting everything in the .h file must not be used for classes that change frequently, as this causes all classes that #include the header file to be recompiled each time.

shoogle · 2019-02-20T18:32:30Z

I can also confirm that with the exception of the PDFs, all files created by each version of MuseScore were byte-for-byte identical. Even the WAV files were identical, so I guess we will need a file with some extreme tuplets to see a difference there.

I thought the difference between the PDF files was down the them just containing a timestamp of when the file was created, but looking at the output of git diff --no-index --text file1.pdf file2.pdf shows there are more differences than that, including some different fonts (!) so maybe this was affected by other PRs in the last few days. The PDFs are created by Qt anyway, so they are kind of black boxes. The fact that the SVGs were identical is probably more informative.

shoogle · 2019-02-24T15:20:52Z

I was asked to update the benchmark to include MuseScore 2.3.2 for comparison. Unfortunately there isn't room for another column in the original table, so here is a new table with the ticks and optimised fractions columns copied from the previous one.

Code

for fmt in mscx mid pdf svg wav; do
  echo "#### Format: $fmt ####"
  for n in {1..3}; do
    sleep 2 # allow time to recover from previous run
    time MuseScore*.AppImage Jupiter.mscz -o ms2.$fmt 2>/dev/null
  done
  ls -l ms2.$fmt
  md5sum ms2.$fmt
done

Results

Format	Time	Ticks	Fractions (optimised)	MuseScore 2.3.2
MSCX	real user sys user+sys	7.835s 6.056s 0.630s 6.686s	8.222s (+4.94%) 6.301s (+4.05%) 0.664s (+5.40%) 6.965s (+4.17%)	30.369s (+288%) 28.338s (+368%) 0.588s (-6.67%) 28.926s (+333%)
MIDI	real user sys user+sys	5.415s 4.109s 0.139s 4.248s	5.657s (+4.47%) 4.323s (+5.21%) 0.146s (+5.04%) 4.469s (+5.20%)	7.753s (+43.2%) 6.321s (+53.8%) 0.174s (+25.2%) 6.495s (+52.9%)
PDF	real user sys user+sys	5.081s 3.574s 0.181s 3.755s	5.207s (+2.48%) 3.630s (+1.57%) 0.175s (-3.31%) 3.805s (+1.33%)	7.565s (+48.9%) 5.947s (+66.4%) 0.168s (-7.18%) 6.115s (+62.8%)
SVG	real user sys user+sys	11.382s 9.795s 0.320s 10.115s	11.455s (+0.641%) 10.011s (+2.21%) 0.330s (+3.13%) 10.341s (+2.23%)	13.249s (+16.4%) 11.637s (+18.8%) 0.353s (+10.3%) 11.990s (+18.5%)
WAV	real user sys user+sys	1m50.981s 1m47.836s 0.983s 1m48.819s	1m50.948s (-0.0297%) 1m48.552s (+0.664%) 0.867s (-11.8%) 1m49.419s (+0.551%)	2m8.275s (+15.6%) 2m5.504s (+16.4%) 0.845s (-14.0%) 2m6.349s (+16.1%)

Interpretation

MuseScore 3 is significantly faster than MuseScore 2, regardless of whether MuseScore 3 uses fractions or integer ticks. This will be due to MuseScore 3 using more efficient algorithms for expensive operations like layout. The benefits of small scale optimizations to data types pale in comparison to these changes. This demonstrates that it is more important to ensure low level classes are easy to use than to optimize them for speed, as the penalty of using them incorrectly will be much greater.

Edit: Naturally, the above interpretation is only valid for converting files on the command line. It does not necessarily apply to the GUI, though we know that the layout improvements in MuseScore 3 have made the GUI much faster for large scores. However, it is possible that regressions elsewhere have led to slower performance with smaller scores where layout is less of a burden.

anatoly-os · 2019-02-24T15:58:37Z

Thank you @shoogle. I think we need more benchmarks for common operations like inserting notes to a big score, deleting measures, etc. @dmitrio95's script tests could be used for such benchmarks. It is interesting to compare 2.3.2 and current master.

shoogle · 2019-03-03T00:44:12Z

Here is how the binary sizes compare.

`mscore` executable (release build)

Version	Build	Size (bytes)	Size (MB)	Change
2.3.2	Integer ticks	24674456	24.7	-7.71%
3.0 nightly	Integer ticks	26737664	26.7	(reference)
3.0 nightly	Unoptimized fractions	26844160	26.8	+0.398%
3.0 nightly	Optimized fractions	26958848	27.0	+0.827%

Switching to fractions increased MuseScore 3's size by less than half a percent. Optimizing the fractions (which potentially involves the compiler inlining the functions in fraction.h) led to a similar increase in size, but improved speed by 2% or better compared to unoptimized fractions. The overall size increase is less than 1%.

MuseScore 3 is bigger than MuseScore 2 due to it having more features. Resources embedded via the Qt Resource System also contribute to the size of the executable, so any additional resources would also lead to an increase in size (and therefore memory footprint).

Size of AppImage

Note: Type 1 AppImages are compressed ISO 9660 filesystems.

Version	Build	Size (bytes)	Size (MB)	Change
2.3.2	Integer ticks	122748928	122.75	-16.8%
3.0 nightly	Integer ticks	147587072	147.59	(reference)
3.0 nightly	Unoptimized fractions	147652608	147.65	+0.044%
3.0 nightly	Optimized fractions	147718144	147.72	+0.089%

In terms of switching to fractions, the AppImage compression has reduced the size differences by a factor of 10. This means that the size of the file that users actually download and that takes up disk space on their system has increased by less than 0.1%.

As well as the binary executable, the AppImage also contains all libraries, templates, plugins, soundfonts and icons. These accompanying resources are responsible for most of the increase in size of MuseScore 3's AppImage compared to MuseScore 2's. MuseScore 3 uses a newer version of Qt, for example.

jthistle reviewed Feb 18, 2019

View reviewed changes

dmitrio95 reviewed Feb 18, 2019

View reviewed changes

velochy force-pushed the FractionPerformance branch 5 times, most recently from f6cc513 to ffb9cb5 Compare February 19, 2019 10:29

Moved fractions computation to .h file, unrolled gcd, made computatio…

70868cd

…ns more efficient

velochy force-pushed the FractionPerformance branch from ffb9cb5 to 70868cd Compare February 19, 2019 14:10

anatoly-os merged commit cb404f9 into musescore:master Feb 20, 2019

dmitrio95 mentioned this pull request Mar 5, 2019

Skylines construction optimization #4768

Merged

velochy deleted the FractionPerformance branch April 10, 2019 10:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fractions comparison efficiency #4699

Fractions comparison efficiency #4699

velochy commented Feb 18, 2019

jthistle left a comment •

edited

Loading

jthistle Feb 18, 2019

jthistle Feb 18, 2019

jthistle Feb 18, 2019

jthistle Feb 18, 2019

jthistle Feb 18, 2019

dmitrio95 Feb 18, 2019

jthistle Feb 18, 2019

dmitrio95 Feb 18, 2019

shoogle commented Feb 18, 2019

shoogle commented Feb 20, 2019 •

edited

Loading

shoogle commented Feb 20, 2019

shoogle commented Feb 24, 2019 •

edited

Loading

anatoly-os commented Feb 24, 2019

shoogle commented Mar 3, 2019

Fractions comparison efficiency #4699

Fractions comparison efficiency #4699

Conversation

velochy commented Feb 18, 2019

jthistle left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shoogle commented Feb 18, 2019

shoogle commented Feb 20, 2019 • edited Loading

Benchmarks:

Hardware

Software

Input

Method

Results

Interpretation

shoogle commented Feb 20, 2019

shoogle commented Feb 24, 2019 • edited Loading

Code

Results

Interpretation

anatoly-os commented Feb 24, 2019

shoogle commented Mar 3, 2019

mscore executable (release build)

Size of AppImage

jthistle left a comment •

edited

Loading

shoogle commented Feb 20, 2019 •

edited

Loading

shoogle commented Feb 24, 2019 •

edited

Loading

`mscore` executable (release build)