Skip to content
A microlib for an auto polling data container
Scala
Branch: master
Clone or download
Latest commit 79986b0 Sep 10, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
core/src added guard against zero refresh period Sep 5, 2019
project Update sbt-catalysts to 0.25 Sep 10, 2019
.gitignore inital commit Aug 29, 2019
.jvmopts reducing heap size Sep 4, 2019
.travis.yml inital commit Aug 29, 2019
README.MD Update README.MD Sep 4, 2019
build.sbt minor improvement Aug 30, 2019
version.sbt Setting version to 0.0.2-SNAPSHOT Aug 30, 2019

README.MD

Build Status

Mau - A tiny library for an auto polling data container

Installation

mau is available on scala 2.11, 2.12, 2.13 and scalajs.

Add the following to your build.sbt

libraryDependencies += 
  "com.kailuowang" %% "mau" % "0.0.1"

Motivation

It is common that applications cache data from upstream in local memory to avoid latency while allowing some data staleness. This is usually achieved through some caching library which lazy loads the data from upstream, together with a TTL (Time to Live) to control the maximum staleness allowed.

This lazy loading approach has one inconvenience - when cached data expires TTL, data in the cache needs to be invalidated and re-retrieved from upstream, during which time incoming requests from downstream need to wait for the data from upstream and hence experience the higher latency.

Another missed opportunity for lazy loading cache is resilience against upstream disruptions. When the upstream becomes temporarily unavailable, for some cases, we can tolerate a slightly more stale data to maintain availability to downstream. That is, we might want to maintain service to downstream using the data in memory for a little longer while waiting for the upstream to get back online.

A different approach is polling upstream periodically to retrieve the latest version and update memory. This periodical polling keeps the data fresh. Also, when upstreams becomes unavailable, this approach can, optionally, allow a grace period of failed polls to maintain availability to downstream. A good example of this approach in Http cache is Fastly's Stale-While-Revalidate and Stale-If-Error.

Mau is a pure functional implementation of this pooling approach in Scala. It provides a data container called RefreshRef, which can be used as a single entry cache with auto polling from upstream.

This single-entry cache keeps the data in memory regardless of usage; hence it's only applicable when the number of such containers is bounded in the application. I.E., it's suitable to keep in memory information whose size is relatively fixed.

A possible example use case might be a top 10 most popular songs on a music app. The query can be expensive, but the data size is fixed and can allow some staleness.

Another example is distributed configuration. The applications can work with slightly stale configuration, but ideally, the reads should be directly from memory, and in case of configuration service going down, the applications need to be able to remain functioning for a while.

Best to demonstrate through examples:

Usages

import cats.implicits._
import cats.effect.IO
import scala.concurrent.duration._

mau.RefreshRef.resource[IO, MyData] //create a resource of a RefreshRef that cancels itself after use,  
  .use { ref =>  // In real uses cases, `ref` should reused to serve multiple requests concurrently

  ref.getOrFetch(10.second) {  //refresh every 10 second
    getDataFromUpstream    //higher latency effect to get data from upstream
  }
}

ref.getOrFetch either gets the data from the memory if available, or use the getDataFromUpstream to retrieve the data, and setup a polling to periodically update the data in memory using getDataFromUpstream. Hence the first call to ref.getOrFetch will take longer to actually load the data from upstream to memory. Subsequent call will always return the data from memory.

In the above usage, since no error handler given, when any exception occurs during getDataFromUpstream, the refresh stops, and the data is removed from the memory. All subsequent requests will hit upstream through getDataFromUpstream, whose failure will surface to downstream, until upstream restores.

As pointed out by the comment, a RefreshRef is provided as a cats.effect.Resource, which gaurantees that the polling gets canceled after usage.

Here is a more advanced example that enables resilience against upstream disruption.

ref.getOrFetch(10.second, 60.seconds) {  //refresh every 10 second, but when refresh fails, allow 60 seconds of staleness
  getDataFromUpstream     
} {
  case e: SomeBackendException => IO.unit   //tolerate a certain type of errors from upstream
}

In this example, SomeBackendException from getDataFromUpstream will be tolerated for 60 seconds, during which time data in memory will be returned. After 60 seconds of continuous polling failures, the polling will stop and data removed.
A success getDataFromUpstream resets the timer. BTW, you can also choose to log the error and either rethrow or tolerate it.

Mau is built on top of Ref from cats-effect. It has only roughly 100 lines of code, but with extensive tests.

Contribution

Any contribution is more than welcome. The main purpose of open sourcing this is to seek collaboration. If you have any questions feel free to submit an issue.

Please follow the Scala Code of Conduct.

License

Copyright (c) 2017-2019 Kailuo Wang

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
You can’t perform that action at this time.