Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FastLZ format #2750

Closed
idelpivnitskiy opened this issue Aug 7, 2014 · 6 comments
Closed

FastLZ format #2750

idelpivnitskiy opened this issue Aug 7, 2014 · 6 comments
Assignees

Comments

@idelpivnitskiy
Copy link
Member

I'm trying to implement FastLZ compression codec for Netty.
Official web site: http://fastlz.org/
Official repo: https://code.google.com/p/fastlz/
Java port: https://code.google.com/p/jfastlz/

These resources doesn't have a format description so I read a source code.
FastLZ compressor (fastlz.c / JFastLZ.java) only compresses the data. So at the end of compression you will have a compressed array without any additional information (chunk length / original length / etc.). All additional information for decompression will be added by 6PACK (6pach.c / JFastLZPack.java). 6PACK is only a file compressor. And if you compress a file with 6PACK it will have this format:

Length (bytes) What
8 Stream magic
2 Chunk ID (1 - chunk of metadata)
2 Options (0 - not compressed)
4 Chunk length (length of file name + 11)
4 Checksum
4 0 (zeros)
4 File length
4 0 (zeros)
2 Length of file name
Length of file name + 1 File name
2 Chunk ID (17 - chunk of data)
2 Options (1 - compressed / 0 - not compressed, if chunkLength < 32 bytes)
4 Chunk length
4 Checksum
4 Original length
Chunk length Data (compressed or not)
...

Short description:
Stream magic + Chunk of metadata + N * Chunk of data
Each chunk contains header (16 bytes, see italic rows) + data (0+ bytes) / metadata (length of file name + 11 bytes)

This is an unsuitable format for Netty. Also FastLZ is not a very popular compression algorithm and it doesn't have a good Java library to create Java Input/OutputStreams (jfastlz can compress only files from your file system and uses command line for this). So I think that we may create our custom header format and use only compressor and decompressor from jfastlz. It may be something like LZF format:

0-2: Magic 'FLZ' (like extension of 6PACK's compressed files)
3: Type
   0x00: Non-compressed chunk
   0x01: Compressed chunk
4-7: 'Checksum' (uint32)
8-9: 'ChunkLength' (uint16)

In addition, compressed chunks (type 0x01) will contain additional 2 header bytes:

10-11: 'OriginalLength' (uint16)

And it will be more convenient to use Big Endian notation (6PACK uses Little Endian notation for chunk headers).

@trustin @normanmaurer WDYT?

@trustin
Copy link
Member

trustin commented Aug 7, 2014

Do LZF and FastLZ employ different compression algorithms? Also, could you explain why 6PACK is not suitable for Netty? For example, we could use an arbitrary file name like data?

@trustin
Copy link
Member

trustin commented Aug 7, 2014

I guess I know why:

  • too much overhead
  • 6pack is only provided as an example to show how to use FastLZ.

Here's my feed back about your proposed format:

  • We don't need magic number for all chunks. Shouldn't we have it only at the beginning of the stream?
  • We should make checksum optional. For example, we can use one bit of the 'type' field to signify if it has checksum or not.

WDYT?

@trustin
Copy link
Member

trustin commented Aug 7, 2014

Another thing to consider is the community around the compression algorithm and format. Looking from the FastLZ's repository, there's no commit since 2007. If you don't mind, it would be nice if we could focus on more actively maintained formats like LZ4.

Also, according to the benchmark from ning, LZ4, Snappy, and LZF are the winners across multiple test cases, although they did not include FastLZ in the list.

What do you think?

@idelpivnitskiy
Copy link
Member Author

@trustin nign considered FastLZ algorithm but said that FastLZ doesn't have a Java version (see at the bottom of Codecs included). But I found jfastlz library.

FastLZ (according to sources) has good results:

and may be used for real-time compression/decompression.

FastLZ is used by Apache Traffic Server and PHP-memcached. But they have their own header format, for example in Apache Traffic Server:

struct RamCacheCLFUSEntry {
  INK_MD5 key;
  uint32_t auxkey1;
  uint32_t auxkey2;
  uint64_t hits;
  uint32_t size; // memory used including paddding in buffer
  uint32_t len;  // actual data length
  uint32_t compressed_len;
  union {
    struct {
      uint32_t compressed:3; // compression type
      uint32_t incompressible:1;
      uint32_t lru:1;
      uint32_t copy:1; // copy-in-copy-out
    } flag_bits;
    uint32_t flags;
  };
  LINK(RamCacheCLFUSEntry, lru_link);
  LINK(RamCacheCLFUSEntry, hash_link);
  Ptr<IOBufferData> data;
};

They use only original fastlz.c for compression/decompression. And I didn't found any other compression libraries which provide FastLZ compression/decompression.
So I think that we also may use only JFastLZ.fastlzCompress() and JFastLZ.fastlzDecompress() and create our custom header, because (you are right) 6pack's format has unnecessary overhead and it looks like an archiver format.

What about your feed back:

  • I think that framed format is more convenient (when you don't have a stream and all chunks are independent). And 3 magic bytes is a very little overhead. It will look like UDP or LZF frame.
  • I'm agree with you about checksum.

New proposal:

Length (bytes) What
3 Magic 'FLZ' (like extension of 6PACK's compressed files)
1 Options (checksum and type)
4 Checksum (optional)
2 ChunkLength
2 OriginalLength (optional)

Options:
0x00: Non-compressed chunk without checksum
0x01: Compressed chunk without checksum
0x10: Non-compressed chunk with checksum
0x11: Compressed chunk with checksum

@trustin
Copy link
Member

trustin commented Aug 8, 2014

OK. The proposed frame format looks reasonable. Please go ahead and make modification. Please ping me once it's ready for a review.

@trustin trustin modified the milestones: 4.1.0.Final, 4.1.0.Beta2 Aug 14, 2014
@trustin trustin closed this as completed Aug 14, 2014
@trustin trustin removed this from the 4.1.0.Beta2 milestone Aug 14, 2014
@trustin trustin self-assigned this Aug 14, 2014
@trustin
Copy link
Member

trustin commented Aug 14, 2014

Superceded by #2755

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants