Skip to content
Noah D Brenowitz edited this page Sep 15, 2017 · 25 revisions

Typical analysis pipeline

Problems with I/O based pipeline

Surface area of earth is 510,007,200 km^2. In the future, model resolutions might be as small as 1km^2. Storing one snapshot of this model state is then

8 * 510 Mbyte * (num vertical levels) * (num vars).

If 50 vertical levels are used and 6 variables are outputted, then each snapshot of the state is 1.2 Tb, which costs $20/month to store on AWS, Google Cloud, etc. It will never be feasible to store all this data (or even one snapshot).

This implies that some sort of in memory analysis/data reduction is needed before storing anything to disk. Unfortunately, it is can be difficult to write this sort of complicated analyses in Fortran, and the number of scientists who can effectively write fortran code is decreasing with time. Solution: stream data to an in memory data store. Use a python client to access this data store.

Spec

Message format specification

Demonstration

Conways game of life. A very simple dynamical system. The rules of the game of life are:

The universe of the Game of Life is an infinite two-dimensional orthogonal grid of square cells, each of which is in one of two possible states, alive or dead, or "populated" or "unpopulated". Every cell interacts with its eight neighbours, which are the cells that are horizontally, vertically, or diagonally adjacent. At each step in time, the following transitions occur:

  1. Any live cell with fewer than two live neighbours dies, as if caused by underpopulation.
  2. Any live cell with two or three live neighbours lives on to the next generation.
  3. Any live cell with more than three live neighbours dies, as if by overpopulation.
  4. Any dead cell with exactly three live neighbours becomes a live cell, as if by reproduction. The initial pattern constitutes the seed of the system. The first generation is created by applying the above rules simultaneously to every cell in the seed—births and deaths occur simultaneously, and the discrete moment at which this happens is sometimes called a tick (in other words, each generation is a pure function of the preceding one). The rules continue to be applied repeatedly to create further generations. from wikipedia

Bokeh Server

Lessons learned

  1. Creating a consistent environment is difficult. Docker is nice, but it is difficult getting a containerized game of life process to communicate with the redis server on the host operating system.

Future work

  1. Formalize the communications specs.
  2. Use docker.
  3. Implement and xarray redis backend.
  4. Distributed computing.