Distributed rate limiter with tuneable scheduler.
Imagine, that you need to scan 1M of pages/images/videos. To do it effectively, you may wish:
- limit each domain to 4 scans/sec to avoid bans
- increase limit to 100 scans/sec for well known services like google
- restrict number of parallel connections to each domain
- restrict total outgoing connections from process to control resources
- distribute scans to cluster for miltiple IPs use and speed increase
All this things can be done with relimit
! Built in redis adapter available
for distributed use.
node.js
8+ and redis
3.0+ required.
npm install idoit --save
- scheduler (String) - optional, redis connection url if shared use needed. If not defined - local scheduler will be used.
- rate(item) (Function|String) -
"4/s"
,100/2m
or function(item), returning such kind of string. Whereitem
is single element ofrelimit
input. - normalize(item) (Function) - return grouping key for incoming item. For example, if input items are URLs, you may return domain name.
- consume(item) - consumer strategy, return
true
to allocate execution slot immediately, orfalse
to postpone attempt. Default - returntrue
. - process(item) (AsyncFunction) - function to process incoming item, when time come.
- ns (String) - data namespace, currently used as redis keys prefix, "relimit:" by default.
Note on consume(item)
. Imagine, that you run distributed limiter, and some
process can crash. If you allocate execution time for all incoming items
immediately, then after crash all such slots will be lost (will not be used).
Better idea will be to allocate slots only until available, and retry with the
rest of items later. Also, you may use internal stats about domain or total
active connections from local relimit
instance to restrivt those.
Note on rate(item)
. You may wish return different rates for different domains.
But value MUST be the same for the same domain (normalized item), that's
important.
Note on process(item)
. This function is shceduled to do necessary actions on
incoming items. IT SHOULD NOT FAIL. Failure state is NOT relimit
duty. You
may store result to retry later, readd new item to queue and so on. If you
return error, it will be forwarded to logger, but item will be marked as
processed.
Place Array[item] or single item to incoming queue.
Resolve when relimit
idle (all job done).
Return statistics about specific items group or global one if key
not set.
key
is result of normalize(item)
(for example, link -> domain).
We needed customizeable rates and more convenient API for our needs. This one is focused on massive URL requests use cases.
Note. Don't try to use this package for CPU management (reinventing system scheduler for job queue). Use idoit instead.
Other rate limiters: