Deflate

Pure Elixir implementation of DEFLATE (RFC 1951) and zlib (RFC 1950) decompression with exact byte consumption tracking.

Why This Library?

Erlang's :zlib module is fast (it's a C NIF), but it has a critical limitation: you can't determine exactly how many bytes were consumed when decompressing a stream.

This matters when parsing binary formats with concatenated compressed streams, like:

Git pack files - multiple zlib-compressed objects back-to-back
PNG chunks - IDAT chunks contain zlib streams
PDF streams - FlateDecode compressed content
ZIP files - DEFLATE compressed entries

# With :zlib - no way to know where stream 1 ends and stream 2 begins
:zlib.uncompress(concatenated_streams)  # Returns data, but how many bytes consumed?

# With Deflate - exact byte tracking
{:ok, data1, bytes_consumed} = Deflate.inflate(concatenated_streams)
remaining = binary_part(concatenated_streams, bytes_consumed, byte_size(concatenated_streams) - bytes_consumed)
{:ok, data2, _} = Deflate.inflate(remaining)

Why Not Use Existing Libraries?

There are several compression libraries on Hex. Here's why we built this one:

Feature	deflate	ezlib	compresso	zip_stream
Byte consumption tracking	✅	❌	❌	❌
Chunked network input	✅	❌	❌	❌
Pure Elixir (no NIFs)	✅	❌ (C NIF)	❌ (wraps :zlib)	❌ (wraps :zlib)
Streaming output	✅	❌	❌	✅

The core problem: All existing libraries wrap :zlib or use C NIFs. None of them tell you how many compressed bytes were consumed. This makes it impossible to parse formats with concatenated streams.

ezlib - C NIF bindings to zlib. Fast, but no byte tracking, and adds native dependency.
compresso - Gleam package wrapping :zlib. No byte tracking.
zip_stream - Great for ZIP files specifically, but uses :zlib internally. No byte tracking for raw streams.

We needed byte tracking for SourceKeep (a git hosting platform) to parse git pack files. We chose pure Elixir over a C/Rust NIF to avoid native dependencies and keep deployment simple - the performance difference is negligible for real-world workloads.

Installation

Add deflate to your list of dependencies in mix.exs:

def deps do
  [
    {:deflate, "~> 1.0"}
  ]
end

Usage

Decompress zlib-wrapped data

# Standard zlib format (2-byte header + deflate + 4-byte adler32)
compressed = :zlib.compress("hello world")

{:ok, "hello world", bytes_consumed} = Deflate.inflate(compressed)

Decompress raw DEFLATE data

# Raw DEFLATE without zlib wrapper
{:ok, data, bytes_consumed} = Deflate.inflate_raw(raw_deflate_data)

Parse concatenated streams (e.g., git pack files)

defmodule GitPack do
  def parse_objects(data, count, objects \\ [])

  def parse_objects(_data, 0, objects), do: Enum.reverse(objects)

  def parse_objects(data, remaining, objects) do
    {type, size, header_len} = parse_object_header(data)
    rest = binary_part(data, header_len, byte_size(data) - header_len)

    # Deflate tells us exactly how many compressed bytes were consumed
    {:ok, content, consumed} = Deflate.inflate(rest)

    after_object = binary_part(rest, consumed, byte_size(rest) - consumed)
    parse_objects(after_object, remaining - 1, [{type, content} | objects])
  end
end

Streaming decompression (hash/store without full memory)

For large files or when you want to compute hashes without holding the entire decompressed content in memory, use the streaming API:

# Stream directly to SHA-256 hasher
hasher = :crypto.hash_init(:sha256)

{:ok, final_hasher, bytes_consumed} = Deflate.inflate_stream(compressed, hasher, fn chunk, h ->
  :crypto.hash_update(h, chunk)
end)

hash = :crypto.hash_final(final_hasher)

# Stream to file
{:ok, file} = File.open("output.bin", [:write, :binary])

{:ok, _, _} = Deflate.inflate_stream(compressed, nil, fn chunk, _ ->
  IO.binwrite(file, chunk)
  nil
end)

File.close(file)

Note: :zlib cannot do streaming with byte tracking at all. It either gives you:

Streaming decompression (:zlib.inflateInit/1 + :zlib.inflate/2) but no way to know where the stream ends
Full decompression (:zlib.uncompress/1) but no byte tracking

This library provides both streaming output AND exact byte consumption tracking.

Chunked input (for network streams)

When data arrives in chunks (e.g., from SSH or HTTP streaming), use the stateful decoder:

alias Deflate.Decoder

# Initialize decoder
{:ok, decoder} = Decoder.new()

# Feed chunks as they arrive from network
{:ok, output1, decoder} = Decoder.decode(decoder, packet1)
{:ok, output2, decoder} = Decoder.decode(decoder, packet2)
{:ok, output3, decoder} = Decoder.decode(decoder, packet3)

# When stream ends, get final results
{:done, <<>>, bytes_consumed} = Decoder.finish(decoder)

The decoder maintains state between chunks, handling all the complexity of:

Bit-level boundaries (Huffman codes can span byte boundaries)
Back-references that cross chunk boundaries
Dynamic Huffman table parsing across chunks

This is essential for protocols like git-over-SSH where pack data arrives in network packets.

Performance

The Honest Numbers

For decompression speed, :zlib (C NIF) is faster:

Input Type	Deflate (Elixir)	:zlib (C NIF)	Difference
Random 100KB	13x faster	-	Stored blocks, minimal decoding
Text 100KB	6x slower	-	~177 μs difference
Text 10KB	30x slower	-	~149 μs difference

Why It Doesn't Matter

Those ratios sound scary, but look at the absolute times:

Input	Deflate	:zlib	Actual Difference
Text 100KB	213 μs	36 μs	0.18 ms
Text 10KB	154 μs	5 μs	0.15 ms
Text 100B	102 μs	2 μs	0.10 ms

In real-world applications, decompression is rarely the bottleneck:

Typical file processing pipeline:
  Decompress:     ~200 μs  (this library)
  Parse content:  ~5,000 μs  (JSON/AST/etc)
  Database I/O:   ~2,000 μs
  Network I/O:    ~10,000+ μs
  ─────────────────────────────
  Decompression: ~1% of total time

Use this library when you need byte tracking. Use :zlib when you need raw speed and don't care about consumption tracking.

Features

Pure Elixir - No NIFs, no ports, works everywhere BEAM runs
Exact byte tracking - Know precisely how many bytes were consumed
Streaming output - Decompress to callbacks for hashing/storage without full memory
Chunked input - Feed data as it arrives from network (SSH, HTTP streams)
Full DEFLATE support - Stored, fixed Huffman, and dynamic Huffman blocks
zlib wrapper support - Handles standard zlib header/trailer
Zero dependencies - Only uses Erlang/OTP standard library

Limitations

Decompression only - This library does not compress data (use :zlib.compress/1 for that)
No preset dictionaries - zlib preset dictionary feature not supported

How It Works

The library implements:

zlib header parsing (RFC 1950) - CMF, FLG bytes, optional dict, Adler-32 checksum
DEFLATE decompression (RFC 1951):
- Stored blocks (type 0) - uncompressed data
- Fixed Huffman (type 1) - predefined code tables
- Dynamic Huffman (type 2) - custom code tables in stream
Bit-level reading - LSB-first bit extraction with precise tracking
LZ77 back-references - Copy from sliding window with overlap handling

Compile-time generated lookup tables provide O(1) Huffman symbol decoding.

License

MIT License - see LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
bench		bench
lib		lib
test		test
.credo.exs		.credo.exs
.formatter.exs		.formatter.exs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
mix.exs		mix.exs
mix.lock		mix.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Deflate

Why This Library?

Why Not Use Existing Libraries?

Installation

Usage

Decompress zlib-wrapped data

Decompress raw DEFLATE data

Parse concatenated streams (e.g., git pack files)

Streaming decompression (hash/store without full memory)

Chunked input (for network streams)

Performance

The Honest Numbers

Why It Doesn't Matter

Features

Limitations

How It Works

License

About

Uh oh!

Releases 1

Packages

Contributors 2

Uh oh!

Languages

License

jtwebman/deflate

Folders and files

Latest commit

History

Repository files navigation

Deflate

Why This Library?

Why Not Use Existing Libraries?

Installation

Usage

Decompress zlib-wrapped data

Decompress raw DEFLATE data

Parse concatenated streams (e.g., git pack files)

Streaming decompression (hash/store without full memory)

Chunked input (for network streams)

Performance

The Honest Numbers

Why It Doesn't Matter

Features

Limitations

How It Works

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Uh oh!

Languages

Packages