Speed up RssIgnores::matches #345
Labels
good first issue
Working on this issue is an easy way to start with Newsboat development
refactoring
This issue describes a way in which some particular part of the code could be improved
I have 7
ignore-article
commands and setignore-mode
to "display", and I noticed that if I comment them out, startup time goes down by 10% (when cache.db is already in the disk cache). GNU Prof shows that quite a bit of time is spent inRssIgnores::matches
.That method takes an
RssItem
, loops through allignore-article
rules looking for the ones that match item's feed, and checks if their associated regexes match. There are two inefficiencies here:ignore-article
rules are stored invector<pair<string, regex>>
, which is basically anstd::map<string, regex>
in disguise. If we switch to an actual map, the lookup time will become near-zero and won't grow with the number ofignore-article
rules.std::unordered_multimap
seems the most fitting;RssFeed::update_items
, where it is called on all items of each feed. In that scenario, we can get feed's URL once, lookup the associatedignore-article
rules, and use that "shortlist" when checking individual items.RssIgnores::matches
lacks any tests. They need to be written before doing any of the aforementioned optimizations.Evaluation
Since we don't have benchmarks, I have to describe how I'll evaluate the results of these optimizations.
I have a large cache file: over 400 feeds, almost 1 gigabyte of data. I'll put in on
tmpfs
to make sure I/O doesn't screw the results.I'll run the following command five times in a row and take the smallest result:
Newsboat will be compiled in release mode (i.e. just
make newsboat
).My config file will contain one
ignore-mode "display"
entry and 0 to 20ignore-article
entries. I will be looking at two things:ignore-article
entries; andignore-article
entries.I will be comparing the results to results from then-current
master
. The goal is to improve onmaster
.The text was updated successfully, but these errors were encountered: