Home

Gregory Szorc edited this page Oct 11, 2011 · 25 revisions

zippylog is a fast and efficient message bus, store, and stream processing platform.

Some uses of zippylog include:

  • event logging- Use zippylog to record, stream, aggregate, and analyze your data. zippylog takes care of the encoding, transport, aggregation, and decoding, so you can focus on what's important: using your data.
  • monitoring and alerting - Hook monitors up to zippylog streams and react to events milliseconds after they are produced.
  • data warehousing - zippylog messages are small and fast. Use zippylog to efficiently store huge amounts of data.
  • distributed data analysis - Connect to a zippylog agent on a remote system and request to search, copy, stream, etc the zippylog data on that system. You could use this as the base of a distributed data processing platform (like MapReduce).
  • real-time distributed tracing - Zippylog clients can simultaneously connect to multiple remote agents and install code that executes on the remote instances and sends events of interest to the client. Dynamically probe your network and you have real-time distributed event tracing.
  • basis for other systems - Zippylog's code follows the principle of everything is a library. As such, the core classes and services could be used to build more advanced and domain-specific services. These include, but are not limited to: distributed databases, real-time analytics, complex stream processing systems, and data archiving.

zippylog defines a common envelope, a message format for persisting multiple messages, a format for encoding envelopes in a stream (see stream encoding), a persistence store for streams (stream store), and a protocol for client-server interaction. zippylog also ships with a daemon (zippylogd), which implements the protocol, and a handful of utilities for producing and interacting with zippylog data and servers.

zippylog currently uses Google's Protocol Buffers for data encoding and 0MQ for transport. The stream encoding and protocol are versioned, allowing for future diversions from current design decisions (e.g. supporting encoding other than Protocol Buffers).

Lua is a 1st class citizen in zippylog. Lua code can be inserted into zippylog processes and user-provided callbacks will be executed to influence program execution. This transforms zippylog from a static data transport tool to a distributed real-time data and stream processing platform. Other languages (like JavaScript) could also be added at a later date if there is desire.

zippylog can be used to encode and transport existing streams. Or, you can use zippylog throughout your entire event/logging stack. See Data Loading for more.

The Comparisons page compares and contrasts zippylog with other programs/tools in this space.

Status

Currently, zippylog is considered pre-alpha and is not ready for production or near-production deployment. Changes can and will be backwards incompatible. For more, see Status.

How it Works

You define protocol buffer message types for each piece of data (application logs, system statistics, etc) you wish to record. You use the protocol buffer compiler (via some extensions in zippylog) to generate code in your programming language of choice. Next, you tool your applications to generate these messages and hand them off to zippylog.

Once your messages are in the hands of zippylog, your life becomes happier. For nearly free, you get the ability encode, decode, stream, download, search, etc all your data.

Did we mention it is fast? Both protocol buffers and 0MQ sockets are insanely quick. zippylog aims not to disrupt this desirable trait.

Supported Programming Languages

zippylog is a light layer on top of protocol buffers. Therefore, it should be possible to write zippylog messages from any programming language that supports Protocol Buffers.

The zippylog server protocol utilizes 0MQ, so you'll additionally need a 0MQ client library if you wish to communicate with a server. While 0MQ bindings are available in nearly every programming language, 0MQ is not required to simply write zippylogs. For that, a simple protocol buffer library writing to a file descriptor is sufficient.

For highest performance and to reduce the surface area for bugs to crop up, it is recommended to utilize the core zippylog C++ API whenever possible.

Performance

zippylog is extremely fast. The core transport routines are capable of shuffling around over 1,000,000 envelopes/second/thread on modern hardware. When you introduce Lua for dynamic processing, we still manage rates well above 100,000 envelopes/second/thread.

For some actual performance numbers, see Benchmarks.

In real world scenarios, zippylog will likely be limited by I/O, not CPU.

zippylog should scale well horizontally with the number of cores available, assuming I/O can keep pace. If you throw enough cores and hard drives at the problem, you should be able to achieve lofty-sounding processing rates, like a billion events per second.