Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don’t calculate whole sets of unicode codepoints #1984

Merged
merged 1 commit into from Jun 5, 2020
Merged

Don’t calculate whole sets of unicode codepoints #1984

merged 1 commit into from Jun 5, 2020

Conversation

liZe
Copy link
Contributor

@liZe liZe commented Jun 3, 2020

_getUnicodeRangeSets used to calculate sets containing lots of numbers, only to get intersections between a set and ranges. Creating and manipulating a lot of big sets is both slow and memory consuming.

The function has been replaced by _getUnicodeRanges, returning ranges instead of sets. Ranges are both really lightweight and really fast.

Tests on intersectUnicodeRanges are now 3 times faster and save about 130 MB (!) of RAM.

Before: 0.28 seconds, 148 MB RAM

time python fonttools/Lib/fontTools/ttLib/tables/O_S_2f_2.py
0.23user 0.05system 0:00.28elapsed 100%CPU (0avgtext+0avgdata 147760maxresident)k

After: 0.08 seconds (-70%), 14 MB RAM (-90%)

time python fonttools/Lib/fontTools/ttLib/tables/O_S_2f_2.py
0.08user 0.00system 0:00.08elapsed 100%CPU (0avgtext+0avgdata 13940maxresident)k

@anthrotype
Copy link
Member

thanks. You only timed the module loading time. It would be interesting to benchmark also the time it takes to run the intersectUnicodeRanges function on random sets of unicode codepoints between the two set vs ranges implementations.
If it turns out that building up the sets on module import makes it faster to check for intersection, despite a slower import time and increased memory usage, it may be a reasonable trade off.

@anthrotype
Copy link
Member

As I was expecting, running intersectUnicodeRanges with your approach is much slower.
This is the benchmark code I used for testing:

import timeit

num_executions = 1000
total_time = timeit.timeit(
    "bits = intersectUnicodeRanges(random.choices(range(0x110000), k=1000))",
    setup="import random; from fontTools.ttLib.tables.O_S_2f_2 import intersectUnicodeRanges",
    number=num_executions,
)

print(total_time / num_executions)

With the current set implementation, on my MBP, I get:

0.0004779480810000223

With your range and for-loop based implementation, I get:

0.015389551754999956

That is, almost 30 times slower.
So I guess, it all depends on how many times one runs the intersectUnicodeRanges function per session, and on the size of the font in terms of number of unicode characters to test for intersection.

I am still not 100% convinced that your solution would be preferable.

@liZe
Copy link
Contributor Author

liZe commented Jun 4, 2020

You only timed the module loading time.

That’s not exactly true, because the module launches 3 unit tests…

It would be interesting to benchmark also the time it takes to run the intersectUnicodeRanges function on random sets of unicode codepoints between the two set vs ranges implementations.

… but you’re right: having some speed improvement for the set generation + 3 intersectUnicodeRanges is not the same as only testing intersectUnicodeRanges speed.

So I guess, it all depends on how many times one runs the intersectUnicodeRanges function per session, and on the size of the font in terms of number of unicode characters to test for intersection.

👍

I am still not 100% convinced that your solution would be preferable.

Storing 130 MB in RAM during the whole execution time of the program is really bad for my use cases, but I get your point. I’ll try to improve my solution.

@liZe
Copy link
Contributor Author

liZe commented Jun 4, 2020

I’ve changed the implementation, here are the results using the script you provided.

Before: 0.82 ms / loop, 147 MB RAM

time python speed.py
0.0008236442100023851
0.90user 0.07system 0:00.98elapsed 99%CPU (0avgtext+0avgdata 146996maxresident)k

After: 0.85 ms / loop, 13 MB RAM

time python speed.py
0.0008471214559976943
0.90user 0.01system 0:00.92elapsed 100%CPU (0avgtext+0avgdata 13296maxresident)k

@anthrotype
Copy link
Member

Nice! bisect did the trick :)
+1 on simplifying the building of the steps list like Just suggested.

@justvanrossum
Copy link
Collaborator

Another approach would be to special-case the non-BMP codepoints, as that makes up more than 81% of the current set-based data set.

@liZe
Copy link
Contributor Author

liZe commented Jun 4, 2020

New results are equivalent, with less code.

time python speed.py
0.000836255980990245
0.90user 0.00system 0:00.91elapsed 100%CPU (0avgtext+0avgdata 13360maxresident)k

I’m sure that it could be even better, but I think that it’s good enough for now 😉. Thanks a lot for the quick review.

Lib/fontTools/ttLib/tables/O_S_2f_2.py Outdated Show resolved Hide resolved
Lib/fontTools/ttLib/tables/O_S_2f_2.py Outdated Show resolved Hide resolved
Lib/fontTools/ttLib/tables/O_S_2f_2.py Outdated Show resolved Hide resolved
_getUnicodeRangeSets used to calculate sets containing lots of numbers, only to
get intersections between a set and ranges. Creating and manipulating a lot of
big sets requires a lot of memory.

The function has been replaced by _getUnicodeRanges, returning a list of range
starts boundaries and a list of range stops + corresponding bits.

Tests on intersectUnicodeRanges save about 130 MB (!) of RAM, with no
significant speed penalty.
Copy link
Member

@anthrotype anthrotype left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@anthrotype anthrotype merged commit babca16 into fonttools:master Jun 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants