Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fractions comparison efficiency #4699

Merged
merged 1 commit into from
Feb 20, 2019

Conversation

velochy
Copy link
Contributor

@velochy velochy commented Feb 18, 2019

Unrolled the recursion in gcd and inlined both gcd and lcm.
Handled special case where denominators match.
Any comments welcome.

Copy link
Contributor

@jthistle jthistle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good apart from the few style things I've mentioned. Also, since we're making lcm and gcd inline, why not make Fraction::ticks() inline as well to save on the overheads? It'd probably make a significant, if not immediately noticeable difference.

@@ -139,6 +141,7 @@ Fraction& Fraction::operator*=(const Fraction& val)
{
_numerator *= val._numerator;
_denominator *= val._denominator;
if (val._denominator!=1) reduce(); // We should be free to fully reduce here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space around operators again

@@ -156,6 +159,8 @@ Fraction& Fraction::operator/=(const Fraction& val)
{
_numerator *= val._denominator;
_denominator *= val._numerator;
if (_denominator<0) { _denominator *= -1; _numerator *= -1; }
if (val._numerator!=1) reduce();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space around operators again

if (_denominator == val._denominator) _numerator += val._numerator; // Common enough use case to be handled separately for efficiency
else {
int l = lcm(_denominator,val._denominator);
_numerator = _numerator * (l/_denominator) + val._numerator * (l/val._denominator);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space around operators again

{
const auto product = static_cast<int_least64_t>(a) * b;
return static_cast<int>(product / gcd(a, b));
return (a/gcd(a, b)) * b; // Divide first to minimize overflow risk
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs space around operators, as per MuseScore style guidelines.

return a < 0 ? -a : a;
return gcd(b, a % b);
int bp;
while(b!=0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs space around operators, as per MuseScore style guidelines.

@@ -21,22 +21,24 @@ namespace Ms {
// greatest common divisor
//---------------------------------------------------------

static int gcd(int a, int b)
static inline int gcd(int a, int b)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inline doesn't make sense here actually. This keyword does not influcence function call inlining by a compiler, and actual inlining depends on compiler settings. A compiler should anyway be able to inline this function calls since it is defined in the same translation unit as those functions that use it (and before them).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But would it make sense, and if so would you recommend it, for ticks()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Giving a possibility for a compiler to inline functions would certainly make sense for something used very often, but, as @shoogle already noted, it should be done by moving its definition to a header file rather than by simply prepending inline keyword.

@shoogle
Copy link
Contributor

shoogle commented Feb 18, 2019

To get the real speed increase you should try moving all definitions into fraction.h. Having everything in the .h file allows the compile to inline it rather than having to resolve links, but you should let the compiler decide whether to do this rather than using the inline keyword. Of course this is at the cost of having to recompile everything whenever fraction.h changes, but that won't happen very often. There doesn't really need to be a .cpp file for something as basic and fundamental as Fraction.

@velochy velochy force-pushed the FractionPerformance branch 5 times, most recently from f6cc513 to ffb9cb5 Compare February 19, 2019 10:29
@anatoly-os anatoly-os merged commit cb404f9 into musescore:master Feb 20, 2019
@shoogle
Copy link
Contributor

shoogle commented Feb 20, 2019

Benchmarks:

Hardware

2015 MacBook Pro with Retina display. (specs)

  • OS: Ubuntu 18.04.2 LTS (Bionic Beaver)
  • CPU: Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz (from output of lscpu)
    • 4 cores + hyper-threading (8 logical cores)
  • RAM: 16GB
    • 2x 8GiB SODIMM DDR3 Synchronous 1600 MHz (from output of sudo lshw -short -C memory)

Software

MuseScore AppImage nightly builds:

  • MuseScoreNightly-201902171259-master-8057bc5-x86_64.AppImage
    • Build from 3 days ago. Uses integer ticks.
  • MuseScoreNightly-201902182337-master-95ce138-x86_64.AppImage
    • Build from 2 days ago. Uses unoptimized fractions.
  • MuseScoreNightly-201902201112-master-cb404f9-x86_64.AppImage
    • Build from today. Uses optimized fractions.

Input

I used the OpenScore Edition of Mozart's Jupiter Symphony.

  • Pages: 157
  • Duration: 39m19s
  • Measures: 925
  • Parts: 17

The score was created in MuseScore 2, so I converted it to MuseScore 3 format before running these tests.

Method

Each test is the time (from Bash time) taken to convert the MSCZ score to one other format on the command line. Each test was run three time and the median (middle) value reported. The WAV test were only run once.

Results

Format Time Ticks Fractions (unoptimized) Fractions (optimised)
MSCX real
user
sys
user+sys
7.835s
6.056s
0.630s
6.686s
8.370s (+6.83%)
6.407s (+5.80%)
0.697s (+10.6%)
7.104s (+6.25%)
8.222s (+4.94%)
6.301s (+4.05%)
0.664s (+5.40%)
6.965s (+4.17%)
MIDI real
user
sys
user+sys
5.415s
4.109s
0.139s
4.248s
6.094s (+12.5%)
4.839s (+17.8%)
0.128s (-7.91%)
4.967s (+16.9%)
5.657s (+4.47%)
4.323s (+5.21%)
0.146s (+5.04%)
4.469s (+5.20%)
PDF real
user
sys
user+sys
5.081s
3.574s
0.181s
3.755s
5.338s (+5.06%)
3.810s (+6.60%)
0.163s (-9.94%)
3.973s (+5.81%)
5.207s (+2.48%)
3.630s (+1.57%)
0.175s (-3.31%)
3.805s (+1.33%)
SVG real
user
sys
user+sys
11.382s
9.795s
0.320s
10.115s
11.730s (+3.06%)
10.164s (+3.77%)
0.351s (+9.69%)
10.515s (+3.95%)
11.455s (+0.641%)
10.011s (+2.21%)
0.330s (+3.13%)
10.341s (+2.23%)
WAV real
user
sys
user+sys
1m50.981s
1m47.836s
0.983s
1m48.819s
1m51.892s (+0.821%)
1m49.027s (+1.10%)
0.969s (-1.42%)
1m49.996s (+1.08%)
1m50.948s (-0.0297%)
1m48.552s (+0.664%)
0.867s (-11.8%)
1m49.419s (+0.551%)

Notes:

Interpretation

Overall, unoptimised Fractions led to around a 5% slowdown compared to integer ticks, but @velochy's optimisations claw back about a third of this penalty. A pretty good result then!

The optimizations were definitely worth it in this case, but we should remember that this is only because Fractions are used literally everywhere in the code! For most other things it is far more important to write clean code that is easy to maintain than to spend time worrying about data types and optimizations.

In particular, the trick of putting everything in the .h file must not be used for classes that change frequently, as this causes all classes that #include the header file to be recompiled each time.

@shoogle
Copy link
Contributor

shoogle commented Feb 20, 2019

I can also confirm that with the exception of the PDFs, all files created by each version of MuseScore were byte-for-byte identical. Even the WAV files were identical, so I guess we will need a file with some extreme tuplets to see a difference there.

I thought the difference between the PDF files was down the them just containing a timestamp of when the file was created, but looking at the output of git diff --no-index --text file1.pdf file2.pdf shows there are more differences than that, including some different fonts (!) so maybe this was affected by other PRs in the last few days. The PDFs are created by Qt anyway, so they are kind of black boxes. The fact that the SVGs were identical is probably more informative.

@shoogle
Copy link
Contributor

shoogle commented Feb 24, 2019

I was asked to update the benchmark to include MuseScore 2.3.2 for comparison. Unfortunately there isn't room for another column in the original table, so here is a new table with the ticks and optimised fractions columns copied from the previous one.

Code

for fmt in mscx mid pdf svg wav; do
  echo "#### Format: $fmt ####"
  for n in {1..3}; do
    sleep 2 # allow time to recover from previous run
    time MuseScore*.AppImage Jupiter.mscz -o ms2.$fmt 2>/dev/null
  done
  ls -l ms2.$fmt
  md5sum ms2.$fmt
done

Results

Format Time Ticks Fractions (optimised) MuseScore 2.3.2
MSCX real
user
sys
user+sys
7.835s
6.056s
0.630s
6.686s
8.222s (+4.94%)
6.301s (+4.05%)
0.664s (+5.40%)
6.965s (+4.17%)
30.369s (+288%)
28.338s (+368%)
0.588s (-6.67%)
28.926s (+333%)
MIDI real
user
sys
user+sys
5.415s
4.109s
0.139s
4.248s
5.657s (+4.47%)
4.323s (+5.21%)
0.146s (+5.04%)
4.469s (+5.20%)
7.753s (+43.2%)
6.321s (+53.8%)
0.174s (+25.2%)
6.495s (+52.9%)
PDF real
user
sys
user+sys
5.081s
3.574s
0.181s
3.755s
5.207s (+2.48%)
3.630s (+1.57%)
0.175s (-3.31%)
3.805s (+1.33%)
7.565s (+48.9%)
5.947s (+66.4%)
0.168s (-7.18%)
6.115s (+62.8%)
SVG real
user
sys
user+sys
11.382s
9.795s
0.320s
10.115s
11.455s (+0.641%)
10.011s (+2.21%)
0.330s (+3.13%)
10.341s (+2.23%)
13.249s (+16.4%)
11.637s (+18.8%)
0.353s (+10.3%)
11.990s (+18.5%)
WAV real
user
sys
user+sys
1m50.981s
1m47.836s
0.983s
1m48.819s
1m50.948s (-0.0297%)
1m48.552s (+0.664%)
0.867s (-11.8%)
1m49.419s (+0.551%)
2m8.275s (+15.6%)
2m5.504s (+16.4%)
0.845s (-14.0%)
2m6.349s (+16.1%)

Interpretation

MuseScore 3 is significantly faster than MuseScore 2, regardless of whether MuseScore 3 uses fractions or integer ticks. This will be due to MuseScore 3 using more efficient algorithms for expensive operations like layout. The benefits of small scale optimizations to data types pale in comparison to these changes. This demonstrates that it is more important to ensure low level classes are easy to use than to optimize them for speed, as the penalty of using them incorrectly will be much greater.

Edit: Naturally, the above interpretation is only valid for converting files on the command line. It does not necessarily apply to the GUI, though we know that the layout improvements in MuseScore 3 have made the GUI much faster for large scores. However, it is possible that regressions elsewhere have led to slower performance with smaller scores where layout is less of a burden.

@anatoly-os
Copy link
Contributor

Thank you @shoogle. I think we need more benchmarks for common operations like inserting notes to a big score, deleting measures, etc. @dmitrio95's script tests could be used for such benchmarks. It is interesting to compare 2.3.2 and current master.

@shoogle
Copy link
Contributor

shoogle commented Mar 3, 2019

Here is how the binary sizes compare.

mscore executable (release build)

Version Build Size (bytes) Size (MB) Change
2.3.2 Integer ticks 24674456 24.7 -7.71%
3.0 nightly Integer ticks 26737664 26.7 (reference)
3.0 nightly Unoptimized fractions 26844160 26.8 +0.398%
3.0 nightly Optimized fractions 26958848 27.0 +0.827%

Switching to fractions increased MuseScore 3's size by less than half a percent. Optimizing the fractions (which potentially involves the compiler inlining the functions in fraction.h) led to a similar increase in size, but improved speed by 2% or better compared to unoptimized fractions. The overall size increase is less than 1%.

MuseScore 3 is bigger than MuseScore 2 due to it having more features. Resources embedded via the Qt Resource System also contribute to the size of the executable, so any additional resources would also lead to an increase in size (and therefore memory footprint).

Size of AppImage

Note: Type 1 AppImages are compressed ISO 9660 filesystems.

Version Build Size (bytes) Size (MB) Change
2.3.2 Integer ticks 122748928 122.75 -16.8%
3.0 nightly Integer ticks 147587072 147.59 (reference)
3.0 nightly Unoptimized fractions 147652608 147.65 +0.044%
3.0 nightly Optimized fractions 147718144 147.72 +0.089%

In terms of switching to fractions, the AppImage compression has reduced the size differences by a factor of 10. This means that the size of the file that users actually download and that takes up disk space on their system has increased by less than 0.1%.

As well as the binary executable, the AppImage also contains all libraries, templates, plugins, soundfonts and icons. These accompanying resources are responsible for most of the increase in size of MuseScore 3's AppImage compared to MuseScore 2's. MuseScore 3 uses a newer version of Qt, for example.

@velochy velochy deleted the FractionPerformance branch April 10, 2019 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants