FastLZ format #2750

idelpivnitskiy · 2014-08-07T21:59:54Z

I'm trying to implement FastLZ compression codec for Netty.
Official web site: http://fastlz.org/
Official repo: https://code.google.com/p/fastlz/
Java port: https://code.google.com/p/jfastlz/

These resources doesn't have a format description so I read a source code.
FastLZ compressor (fastlz.c / JFastLZ.java) only compresses the data. So at the end of compression you will have a compressed array without any additional information (chunk length / original length / etc.). All additional information for decompression will be added by 6PACK (6pach.c / JFastLZPack.java). 6PACK is only a file compressor. And if you compress a file with 6PACK it will have this format:

Length (bytes)	What
8	Stream magic

2	Chunk ID (1 - chunk of metadata)
2	Options (0 - not compressed)
4	Chunk length (length of file name + 11)
4	Checksum
4	0 (zeros)
4	File length
4	0 (zeros)
2	Length of file name
Length of file name + 1	File name

2	Chunk ID (17 - chunk of data)
2	Options (1 - compressed / 0 - not compressed, if chunkLength < 32 bytes)
4	Chunk length
4	Checksum
4	Original length
Chunk length	Data (compressed or not)

	...

Short description:
Stream magic + Chunk of metadata + N * Chunk of data
Each chunk contains header (16 bytes, see italic rows) + data (0+ bytes) / metadata (length of file name + 11 bytes)

This is an unsuitable format for Netty. Also FastLZ is not a very popular compression algorithm and it doesn't have a good Java library to create Java Input/OutputStreams (jfastlz can compress only files from your file system and uses command line for this). So I think that we may create our custom header format and use only compressor and decompressor from jfastlz. It may be something like LZF format:

0-2: Magic 'FLZ' (like extension of 6PACK's compressed files)
3: Type
   0x00: Non-compressed chunk
   0x01: Compressed chunk
4-7: 'Checksum' (uint32)
8-9: 'ChunkLength' (uint16)

In addition, compressed chunks (type 0x01) will contain additional 2 header bytes:

10-11: 'OriginalLength' (uint16)

And it will be more convenient to use Big Endian notation (6PACK uses Little Endian notation for chunk headers).

@trustin @normanmaurer WDYT?

The text was updated successfully, but these errors were encountered:

trustin · 2014-08-07T22:13:23Z

Do LZF and FastLZ employ different compression algorithms? Also, could you explain why 6PACK is not suitable for Netty? For example, we could use an arbitrary file name like data?

trustin · 2014-08-07T22:48:31Z

I guess I know why:

too much overhead
6pack is only provided as an example to show how to use FastLZ.

Here's my feed back about your proposed format:

We don't need magic number for all chunks. Shouldn't we have it only at the beginning of the stream?
We should make checksum optional. For example, we can use one bit of the 'type' field to signify if it has checksum or not.

WDYT?

trustin · 2014-08-07T23:02:03Z

Another thing to consider is the community around the compression algorithm and format. Looking from the FastLZ's repository, there's no commit since 2007. If you don't mind, it would be nice if we could focus on more actively maintained formats like LZ4.

Also, according to the benchmark from ning, LZ4, Snappy, and LZF are the winners across multiple test cases, although they did not include FastLZ in the list.

What do you think?

idelpivnitskiy · 2014-08-08T10:35:22Z

@trustin nign considered FastLZ algorithm but said that FastLZ doesn't have a Java version (see at the bottom of Codecs included). But I found jfastlz library.

FastLZ (according to sources) has good results:

and may be used for real-time compression/decompression.

FastLZ is used by Apache Traffic Server and PHP-memcached. But they have their own header format, for example in Apache Traffic Server:

struct RamCacheCLFUSEntry {
  INK_MD5 key;
  uint32_t auxkey1;
  uint32_t auxkey2;
  uint64_t hits;
  uint32_t size; // memory used including paddding in buffer
  uint32_t len;  // actual data length
  uint32_t compressed_len;
  union {
    struct {
      uint32_t compressed:3; // compression type
      uint32_t incompressible:1;
      uint32_t lru:1;
      uint32_t copy:1; // copy-in-copy-out
    } flag_bits;
    uint32_t flags;
  };
  LINK(RamCacheCLFUSEntry, lru_link);
  LINK(RamCacheCLFUSEntry, hash_link);
  Ptr<IOBufferData> data;
};

They use only original fastlz.c for compression/decompression. And I didn't found any other compression libraries which provide FastLZ compression/decompression.
So I think that we also may use only JFastLZ.fastlzCompress() and JFastLZ.fastlzDecompress() and create our custom header, because (you are right) 6pack's format has unnecessary overhead and it looks like an archiver format.

What about your feed back:

I think that framed format is more convenient (when you don't have a stream and all chunks are independent). And 3 magic bytes is a very little overhead. It will look like UDP or LZF frame.
I'm agree with you about checksum.

New proposal:

Length (bytes)	What
3	Magic 'FLZ' (like extension of 6PACK's compressed files)
1	Options (checksum and type)
4	Checksum (optional)
2	ChunkLength
2	OriginalLength (optional)

Options:
0x00: Non-compressed chunk without checksum
0x01: Compressed chunk without checksum
0x10: Non-compressed chunk with checksum
0x11: Compressed chunk with checksum

trustin · 2014-08-08T18:05:26Z

OK. The proposed frame format looks reasonable. Please go ahead and make modification. Please ping me once it's ready for a review.

trustin · 2014-08-14T01:05:16Z

Superceded by #2755

trustin mentioned this issue Aug 7, 2014

Implemented LZ4 compression codec (block mode) #2739

Closed

trustin added this to the 4.1.0.Final milestone Aug 7, 2014

trustin added the feature label Aug 7, 2014

idelpivnitskiy mentioned this issue Aug 11, 2014

Implemented FastLZ compression codec #2755

Closed

trustin modified the milestones: 4.1.0.Final, 4.1.0.Beta2 Aug 14, 2014

trustin closed this as completed Aug 14, 2014

trustin removed this from the 4.1.0.Beta2 milestone Aug 14, 2014

trustin added the won't fix label Aug 14, 2014

trustin self-assigned this Aug 14, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FastLZ format #2750

FastLZ format #2750

idelpivnitskiy commented Aug 7, 2014

trustin commented Aug 7, 2014

trustin commented Aug 7, 2014

trustin commented Aug 7, 2014

idelpivnitskiy commented Aug 8, 2014

trustin commented Aug 8, 2014

trustin commented Aug 14, 2014

FastLZ format #2750

FastLZ format #2750

Comments

idelpivnitskiy commented Aug 7, 2014

trustin commented Aug 7, 2014

trustin commented Aug 7, 2014

trustin commented Aug 7, 2014

idelpivnitskiy commented Aug 8, 2014

trustin commented Aug 8, 2014

trustin commented Aug 14, 2014