Skip to content

A simple Ruby library to discover popular items in a stream of events

Notifications You must be signed in to change notification settings

ncalca/forgettable

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Forgettable

Build Status Code Climate Gem Version

Forgettable helps you find the probability of non-stationary categorical distributions. To put it simply, you can find the most "popular" items in a stream of events, when their popularity changes unpredictably.

Why?

Imagine you have a web application in which your users can create a post and comment on it. Finding the "hottest" posts might be simply achieved by finding the most commented posts or most recently commented posts.

While these solution are simple to implement and work in many cases, they have some drawbacks. For example a post with a lot of old comments might be still reported as popular although nobody is actually commenting/reading it anymore. Or using the last commented time might generate a very unstable/fast changing list of "hottest" posts which does not really capture the trends among posts.

The main problem with these approaches is that consider old data as important as the new data: they don't forget.

Forgettable gives you a simple way to keep track of the most recent "trends" and smoothly forget about the past facts.

Forgettable is heavily inspired by Forget-Table, developed at bitly.

How to use Forgettable?

Creating a new distribution

The main concept used in Forgettable is a distribution which is initialised with a name and a Redis client:

popular_guitars = ForgetTable::Distribution.new(name: "guitars", redis: redis)
Incrementing a bin

A distribution is a container of "bins", i.e., items we want to track. In order to insert a new item we just use the increment method and pass the name of the bin we want to increment and the amount:

popular_guitars.increment(bin: "fender", amount: 100)

If not specified, the amount defaults to 1:

popular_guitars.increment(bin: "gibson")
Getting the probability distribution

Once bins are inserted in the distribution we can fetch the list of bins sorted by popularity:

popular_guitars.distribution
=> ["fender", "gibson"]

Weights for the bins can be retrieved by setting the optional argument with_scores to true:

popular_guitars.distribution(with_scores: true)
=> [["fender", 63.0], ["gibson", 1.0]]
Getting the probability for a given bin

You can also retrieve the score for a single bin:

popular_guitars.score_for_bin("fender")
=> [30]
Configuring the decay rate

The decay rate is a float number representing "how fast" the score for an item will go down. The lower the decay rate the slowest will be the decrement.

The decay rate can be configured using the following option:

ForgetTable::Configuration.decay_rate = 0.01

If not specified this value falls back to the default one.

References

=========

This software is release under the MIT license.

About

A simple Ruby library to discover popular items in a stream of events

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages