Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1200 +/- ##
==========================================
+ Coverage 91.69% 91.71% +0.02%
==========================================
Files 27 27
Lines 4696 4685 -11
==========================================
- Hits 4306 4297 -9
+ Misses 390 388 -2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
🎊 1200 🎊 |
|
@AA-Turner Thanks for the suggestions! I measured things and turns out Explicit Is Better Than Implicit – I just made them regular old for loops... |
| yield from iterable | ||
| return | ||
| seen = set() | ||
| for item in iter(iterable): |
There was a problem hiding this comment.
The itertools recipe suggests
| for item in iter(iterable): | |
| for item in filterfalse(seen.__contains__, iterable): |
Another alternative would be:
yield from dict.fromkeys(iterable)
The language now guarantees that dictionaries preserve order, but this approach is non-iterable.
There was a problem hiding this comment.
dict.fromkeys(iterable) is clever! It's 5% faster than this in a microbenchmark.
What do you mean with "but this approach is non-iterable"?
There was a problem hiding this comment.
As in, next(distinct(lst)) will only process one item currently, but the fromkeys approach eagerly de-duplicates the entire list. I don't know enough about babel's use to know if the lazy iteration is important here, or just a side effect of the original design.
A
There was a problem hiding this comment.
Yeah, we don't expect there to be very many items, and all of the internal uses are list(distinct(...)) or list(distinct(... + ...)) – TBH, I think we could micro-optimize that a bit too with an internal-use-only _distinct_of(*iterables) -> list function...
Inspired by #1199.
This PR improves parsing performance for a large PO file (https://projects.blender.org/blender/blender-manual-translations/raw/branch/main/fi/LC_MESSAGES/blender_manual.po) by approximately 1.4x on my machine.
On a corpus of 11,600 .po files (namely all that I had hanging around on my machine, heh) the speed-up is 1.30x:
The only "breaking" change here is the one implemented by f988d79 – you can no longer pass a mixed iterable of byteses or strs into
read_po. Typing-wise, you never could (AnyStris eitherbytesorstr, but not an union of them), but now it'll actually break.Internally,
_NormalizedStringlost some of its lustre, but it's fine, since it is an internal helper no one should ever see.