Match lexers and formatters by extension separatly #2328
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
d0487e3 "Remove filename pattern caches" (#2153) introduced a huge performance regession. While it is true that fnmatch already uses functools.lru_cache, that cache is limited to 127 entries and we have over 1000 matching patterns, which means the cache is evicted entirely on every iteration.
We can be more clever without reverting that patch by avoiding fnmatch calls alltogether, which provides an even better speedup. The bulk (> 99%) of all filename matches are on the filename extension, so in gen_mapfiles.py we split these into a separate tuple list that we match without calling fnmatch().
This restores previous performance speedup without the overhead of an extra regexp cache.