Skip to content

Optimize the feed reader for websites with a large number of users #10996

@rhymes

Description

@rhymes

Current situation

Currently Forem's functionality to import feeds takes too long, especially for DEV which has 3420 feeds to go through.

Currently the import is sequential: the feed is downloaded, then parsed, then articles are built in memory and saved in the DB

Unfortunately, at least in the case of DEV, we don't really have solid metrics on what is actually slow in production, as it only tracks errors but nothing else. We should consider adding instrumentation before replacing it one way or another

Variables to consider and things learned in benchmarking:

  • network latency is not a constant (fetching thousands of feeds can have different performance results depending on network conditions of the upstream servers)
  • different feeds can have different lenghts thus users have a variable amount of articles to process at each run
  • currently we skip a random number of feeds at each run because feed fetching can be slow
  • Nokogiri parsing occupies a lot of memory (there are literally millions of objects allocated by the nokogiri gem)

Optimization ideas

There are two main things we can optimize (my opinion is that we should find a combination of both that suits us):

  1. make the actual fetching of feeds and parsing of them faster (it's all I/O, there's no reason for it to be sequential)
  2. process multiple users in parallel (basically by doing things more or less sequentially but splitting the workload in separate workers, one per each user)

This why I think we should employ a combination of both:

  • we process feeds sequentially, but downloading bytes from the web is inherently parallelizable, so we can download a bunch (in batches obviously) from the web and then start processing those
  • we parse feeds sequentally, but those also can be parallelized
  • both the above steps have an upper ceiling not just based on how many cores the machines will run on but also due to memory consumption (the most memory resource hungry of the two operations is parsing for obvious reasons)
  • writing articles on the DB can be parallelized but that doesn't really need to be per user (we'd have 3420 jobs in the queue that could still be individually fast or slow)
  • we can parallelize articles creation in batches by changing the logic a little bit: right now it's the single "future article" that's responsible to know if they actually exist. Each of them has a conditional check to see if it already exists. We could do this in one swoop for the entire batch and then remove from the batch of workers those that don't need to be processed at all
  • we need to be careful at how many workers we add concurrently that write to the articles table as we could end up using too many ActiveRecord connections and exhaust the pools

Plan of action

The first step is to write a POC which parallelizes network fetching and parsing, this is part of a multi step plan (not necessarily in this order):

This PR is one step in a multi step plan (not necessarily in this order) which comprises:

  • add monitoring to the existing RssReader to undestand what it's profile in production
  • adding a Feeds::Import class which takes advantage of concurrency to fetch and parse feeds into articles - Add Feeds::Import service class #10998
  • add a related sidekiq worker
  • split the writing of articles in the DB in batch workers
  • measure the performance (speed an memory occupation) in production
  • hide the new service behind a feature flag so it can be activated and deactivated at will, transparently
  • refactor, tune, improve and optimize what can be refactored, tuned, improved and optimized (both in the logic and in knobs)

Benchmarks, more or less

Disclaimer: these benchmarks don't really count as all benchmarks don't really count. These especially because they were conducted unscientifically, while using the computer for other things. They are only to give a really rough idea of what is going on with the RSSReader and the future service (called Feeds::Import as of today).

With 100 feeds, on October 21st 2020, tested on a Macbook Pro 2,4 GHz 8-Core Intel Core i9, 16 cores, 64GB RAM:

158.67s user 8.14s system 404.78s real 817832kB mem -- rails fetch_all_rss
151.71s user 7.14s system 258.57s real 781528kB mem -- rails fetch_feeds_import

Feeds::Import was run with 8 fetching threads, 4 parsing threads, with batches of 50 users/feeds

Metadata

Metadata

Assignees

Labels

internal team onlyinternal tasks only for Forem team memberstype: optimizationcode and performance optimizations

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions