Skip to content

keatext/scala-utils

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scala-utils

The codebase of keatext.ai is divided into multiple microservices. Since each microservice lives in a separate codebase, the only way to share code is to put it in a separate library. We have two such libraries: an internal one for the pieces which are specific to keatext.ai, and this public one for the pieces which are generic enough that they could benefit any Scala project.

Downsides

At the moment, all the pieces are piled up in a single heterogeneous library. As we accumulate more, it might make more sense to split them into several smaller, specialized libraries. Another good reason for doing so is that since some of the pieces use akka-http, Slick, and spray-json, any microservice using any piece from scala-utils will now have to depend on all three dependencies, even if that microservice chose a completely different way to define their routes, their database tables, or their JSON conversions. Worse, even if that microservice does use all three, depending on scala-utils means that they now have to use the same version of those three dependencies as scala-utils does.

This makes it more difficult to migrate to a more recent version of either of those dependencies. Ideally, we would like to perform such upgrades gradually, one microservice at a time. But if we do that, there is a period of time during which different microservices will use different versions of scala-utils, which means one of them is using an older version. In turn, this means that whenever we write reusable code in that microservice, we won't be able to move it to scala-utils, and this might in turn lead to code duplication.

I must also admit that adding code to scala-utils is more annoying than adding it directly to a module of the current microservice, and as a result, we currently have a lot of reusable code which lives in individual microservices. I think it's okay for it to stay there for now, until we happen to need it in a different module: performing the refactoring is annoying, but not nearly as annoying as maintaining two separate copies of the same code.

If you do want to add your reusable code directly to scala-utils, run sbt '~ publish-local' so that sbt continuously publishes your latest scala-utils modifications to your local Ivy repository, and in your microservice, set your scala-utils version to the X.Y.Z-SNAPSHOT version listed in version.sbt. Then, once you are ready to commit, push your changes to scala-utils first so that Bamboo publishes a non-snapshot version X.Y.Z, and change your microservice to use that version of scala-utils. You must not commit a build.sbt which uses a -SNAPSHOT version, because on other machines, sbt compile will complain that it cannot find this version.

Reusable pieces

DatabaseManager makes it easy to conditionally initialize a postgres database using Slick. Simply define a concrete MyDatabaseManager singleton for your project, instantiating tables to the concrete list of all the Slick classes which represent your tables, and call createTablesIfNeeded at the start of your program: the tables will be created unless they already exist.

This doesn't just check that tables with the correct name exist, it also makes sure that those existing tables have the expected schema. This way, if you update the schema of your tables and later run a version of your microservice which expects the older version, you'll get an exception on startup instead of later on when the table is needed. This is even more useful when you did upgrade the schema of your local and staging databases and you do run the latest version of the code which expected this new schema, but you forgot to upgrade the schema in prod.

One bit of related code which should probably be made more reusable and moved here is the migration logic in mintosci-subscriptions. Instead of expecting the person who performs the deployment to manually migrate the table at deployment time, that version creates a separate table and migrates all the existing data to it; this way, the older version of the code can continue to use the older version of the table, so we can perform the migration in advance. I'm not sure what to do about the events which come in after that migration is performed but before the event source is switched from the older version of the code to the newer version of the code: the older version of the table will be modified accordingly, but then those changes will be lost after the switch. This kind of problem is one of the reasons I have been pushing for a datomic-style database in which we can work on a snapshot from a particular time and then replay all the events starting from that time.


FutureTraverse was originally created for the sole purpose of adding the method Future.traverse, which didn't exist at the time. Instead, there was Future.sequence, which executes a bunch of Futures in parallel.

It is easy to map a function A => Future[B] onto a List[B] to obtain a List[Future[B]], and then to sequence that list into a Future[List[B]]. But then all of those functions will execute in parallel, and that exhausted all of the database handles we had and led to a failure. We were looking for a version of Future.sequence which would execute its futures, well, sequentially.

Unfortunately such a version cannot exist, because once it receives its List[Future[B]], all of those futures are already running and it is too late to stop them. So instead, we want to delay the creation of the futures, using something like a List[() => Future[B]]. Taking inspiration from Haskell's traverse, I chose the following nicer API instead:

def traverse(inputs: List[A])(f: A => Future[B]): Future[List[B]]

The actual type is a bit scarier than that in order to also support sequences other than List, but it's much simpler to think of it as having the above type.

Today, Scala's standard library does have a Future.traverse method, but it uses map and sequence to run all of those functions in parallel. Since that's still not what we want to do, FutureTraverse.traverse is still useful, but it would probably be a good idea to give it a better name.


FutureTraverse.filter is a variant of FutureTraverse.traverse in which the function is used to filter the input list instead of transforming its elements. We should keep adding such helper methods here whenever we encounter a situation in which we want to execute a function sequentially but the standard Future either executes it in parallel or doesn't support it.


FutureTraverse.fromBlocking is a simple wrapper for the idiom Future { blocking {...} }. The blocking annotation is important (or so I have read on the internet), it tells the ExecutionContext that this thread is going to be blocked on some synchronous call. Given this, it is unfortunate that the Future {...} constructor exists at all: given a computation, it is either a slow operation which needs both Future and blocking, or it is a fast computation like x + y which isn't worth executing in a different thread and Future.successful(x + y) should be used instead of Future {x + y}.

We cannot ban Future {...} from the language, but we can train ourselves to see it as a code smell, and to insist on always using either Future.successful or Future.fromBlocking instead.

There is no link between FutureTraverse.fromBlocking and FutureTraverse.traverse, they are only defined in the same singleton object because they are both related to Futures. The singleton object should probably be renamed to FutureUtils or something.


FutureTry.sequence is a version of Future.sequence which runs N long-running and possibly-failing computations in parallel and returns the failures and successes as N values of type Try[A]. Since normal Future[A] computations can fail, the normal Future.sequence also runs N long-running and possibly-failing computations in parallel, but it returns a Future[List[A]], so if any of the computations fail, the entire sequence is deemed to have failed.

Note that despite the name, acheiving the result of FutureTry.sequence is not as easy as running each of those computations inside a Try block. If you put the Try block outside of the Future block, the creation of the Future block will succeed even if the computation it describes fails at runtime. And if you put the Try block inside of the Future block, you won't be able to run any Future computation inside of it.

Also note that there is no benefit to adding map and flatMap to FutureTry in order to benefit from a for..yield syntax for it. If we did, we would have to convert many of the Future sub-computations into FutureTry sub-computations, like this:

val computeX: Future[Int]
val computeY: Future[Int]
val futureTry: FutureTry[Int] =
  for {
    x <- FutureTry(computeX)
    y <- FutureTry(computeY)
  } yield x + y

Whereas without a for..yield notation for FutureTry, the user is naturally pushed towards a shorter syntax in which a single FutureTry wrapper is sufficient:

val futureTry: FutureTry[Int] =
  FutureTry {
    for {
      x <- computeX
      y <- computeY
    } yield x + y
  }

This works because Future already keeps track of the exceptions raised within its computation, we simply expose them with a more convenient interface based on Try.


HttpRequests is a wrapper around akka-http's Http().singleRequest method for sending HTTP requests. I wrote this wrapper because I wrote two microservices which constantly need to make authenticated HTTP requests in order to talk to external services like Stripe or Zendesk. So I wrote StripeRequests and ZendeskRequests to automatically add the authentication boilerplate to every call, and I refactored the common part into HttpRequests. It also automatically waits and retries if we use an API too much and receive a 429 error. This was modelled after Zendesk's 429 responses, and might be useful for other external services as well, assuming the error code and headers used are standard.


JsonColumn is a helper for writing Slick table descriptions. It's much easier to write JSON converters using spray-json than to write table descriptions using Slick, so for those sets of columns which are never used to perform lookups, it's easier to just convert the data to JSON and to store it in an opaque string column.

I know that postgres supports a JSON column type, but the goal here is not to store an arbitrary JSON value, I don't want postgres to waste its time trying to parse that value. Rather, I want to store a value of a particular type A, and the fact that the serialization format I use to store it into the database happens to be JSON is a mere implementation detail.

This isn't used by any of our microservices anymore, but it seems likely that we might want to use it in the future. Since having this short class lying around isn't much of a maintenance burden, I'd keep it here just in case.


QueryOption extends Slick with three new methods: take1, update1 and delete1. The idea is that we often use a WHERE clause to identify a single row to be fetched, updated or edited, but Slick's API will consider the operation a success if zero, two, or more rows are found.

For UPDATEs and DELETEs, we're not doing anything with the result, so this would be a silent failure. With update1 and delete1, we check that exactly one row was affected and we throw an exception otherwise. This will help us identify situations in which our code is incorrect or in which our database is inconsistent.

For SELECTs, it is possible to express the fact that we only expect a single row using LIMIT 1, but Slick's API still returns a list and it feels a bit unsafe to do a get(0) on that list to fetch the only result. And indeed, it is unsafe: LIMIT 1 guarantees that there is at most one result, but it doesn't guarantee that there is at least one. So the proper way to represent the result is not a list, but an Option. The take1 method does a LIMIT 1 and wraps the result in an Option instead of a list, encouraging the caller to add some error-checking code to throw a more readable exception if zero rows are returned.


StringBasedEnumeration is used like this:

object Environment extends StringBasedEnumeration {
  val Dev  = Value("dev")
  val QA   = Value("qa")
  val Prod = Value("prod")
}

val env: Environment.Value = Environment.Dev

The result is very similar to an enumeration based on case classes:

sealed trait Env
case object Dev  extends Env
case object QA   extends Env
case object Prod extends Env

The difference is that StringBasedEnumeration defines methods to convert between a value of type Environment.Value and a string, and that those methods are automatically used by Slick and spray-json when serializing such a value to JSON or to the database.

Note that the type really is Environment.Value, not Environment. It would make more sense if Environment was the companion object of a type named Environment, but I don't know how to re-expose an existing type under a different top-level name.


TransactionalFuture is a version of Future which does not run in parallel with other TransactionalFuture computations, on the contrary, a mutex is used to make sure that at most one such computation runs at a time. The idea is to provide critical sections using a very familiar interface: it's a regular Future, so like all Futures it can take some time to complete, in this case because it could be waiting to obtain the lock. This version is very simple, there is only one global lock so there are no race conditions, but I don't think it would be too hard to extend this implementation to support more locks if needed.

The only place in which this is used is in mintosci-zendesk, to obtain transactional semantics despite the fact that Slick's transactional guarantees are very weak.


TypedValue is used a lot, it's used to distinguish types like OrgId and UserId which are semantically distinct despite having the same underlying representation. Like StringBasedEnumeration, the main advantage is to make it easier to define spray-json and Slick conversions by delegating the bulk of the work to the underlying representation.


Unzip4 is... what is this? I think it's for transposing a list of four-tuples into a four-tuples of lists.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages