# [GZIP](http://www.gzip.org)

* `gzip` (GNU zip) is a open-source text compressor, written by Mark Adler and Jean-loup Gailly.

* It is described in [RFC 1951](https://www.ietf.org/rfc/rfc1951.txt) and [RFC 1952](https://www.ietf.org/rfc/rfc1952.txt).

### DEFLATE algorithm

1. Divide the input into blocks.
2. LZ77 encode each block.
3. Compute a Huffman tree for the offsets.
4. Huffman encode the Huffman tree using a static probabilistic model.
3. Huffman encode each block using an adaptive probabilistic model.


### Test image corpus:

#### Structure

```
struct 16bpp_PGM_pixel {
  uint16 Y;
};

struct 8bpp_PGM_pixel {
  uint8 Y;
};

struct 8bpp_PPM_pixel {
  uint8 Red;
  uint8 Green;
  uint8 Blue;
};

struct 16bpp_PPM_pixel {
  uint16 Red;
  uint16 Green;
  uint16 Blue;
};
 
struct PPM_image {
  uint8[3] magic_number = "P6\n"; /* PPM = Portable Pix Map */
  uint8* width;                   /* Pixels in X (in ASCII) */
  uint8 space = " ";
  uint8* height;                  /* Pixels in Y (in ASCII) */
  uint8 new_line_1 = "\n";
  uint8* maximum_component_value; /* 255 or 65535 */
  uint8 new_line_2 = "\n";
  if (maximum_component_value == 255) {
    struct 8bpp_PPM_pixel[Width][Height] matrix;
  } else {
    struct 16bpp_PPM_pixel[Width][Height] matrix;
  }
};

struct PGM_image {
  uint8[3] magic_number = "P5\n"; /* PPM = Portable Pix Map */
  uint8* width;                   /* Pixels in X (in ASCII) */
  uint8 space = " ";
  uint8* height;                  /* Pixels in Y (in ASCII) */
  uint8 new_line_1 = "\n";
  uint8* maximum_component_value; /* 255 or 65535 */
  uint8 new_line_2 = "\n";
  if (maximum_component_value == 255) {
    struct 8bpp_PGM_pixel[Width][Height] matrix;
  } else {
    struct 16bpp_PGM_pixel[Width][Height] matrix;
  }
};
```

#### Images

| lena (RGB 512x512x(8+8+8) bits)                     | peppers (RGB 512x512x(16+16+16) bits)                      |
|-----------------------------------------------------|------------------------------------------------------------|
| <img src="01-gzip/lena.png" style="width: 400px;"/> | <img src="01-gzip/peppers.png" style="width: 400px;"/>     |
| wget http://www.hpca.ual.es/~vruiz/images/lena.png  | wget http://www.hpca.ual.es/~vruiz/images/peppers.png      |
| convert lena.png lena.ppm                           | convert peppers.png peppers.ppm                            |

| boats (Y 512x512x8 bits)                                 | zelda (Y 512x512x8 bits)                             |
|----------------------------------------------------------|------------------------------------------------------|
| <img src="01-gzip/boats.png" style="width: 400px;"/>     | <img src="01-gzip/zelda.png" style="width: 400px;"/> |
| wget http://www.hpca.ual.es/~vruiz/images/boats.png      | wget http://www.hpca.ual.es/~vruiz/images/zelda.png  |
| convert boats.png boats.pgm                              | convert zelda.png zelda.pgm                          |

In [92]:
lena = !wc -c < 01-gzip/lena.ppm
lena = lena[0]
peppers = !wc -c < 01-gzip/peppers.ppm
peppers = peppers[0]
boats = !wc -c < 01-gzip/boats.pgm
boats = boats[0]
zelda = !wc -c < 01-gzip/zelda.pgm
zelda = zelda[0]
average = int((int(lena) + int(peppers) + int(boats) + int(zelda))/4)

In [93]:
import io
with io.open('../table.txt', 'w') as file:
    file.write('    codec |    lena peppers   boats   zelda average\n'.format(lena, peppers, boats, zelda, average))
    file.write('----------+----------------------------------------\n')

In [94]:
import io
with io.open('../table.txt', 'a') as file:
    file.write(' original |{:8}{:8}{:8}{:8}{:8}\n'.format(lena, peppers, boats, zelda, average))

In [95]:
with io.open('../table.txt', 'r') as file:
    print(file.read())

    codec |    lena peppers   boats   zelda average
----------+----------------------------------------
 original |  786447 1572883  262159  262159  720912



In [96]:
!cp 01-gzip/lena.ppm /tmp
!gzip -9 -f -v /tmp/lena.ppm
lena = !wc -c < /tmp/lena.ppm.gz
lena = lena[0]

/tmp/lena.ppm:	    6.6% -- replaced with /tmp/lena.ppm.gz


In [97]:
!cp 01-gzip/peppers.ppm /tmp
!gzip -9 -f -v /tmp/peppers.ppm
peppers = !wc -c < /tmp/peppers.ppm.gz
peppers = peppers[0]

/tmp/peppers.ppm:	   86.0% -- replaced with /tmp/peppers.ppm.gz


In [98]:
!cp 01-gzip/boats.pgm /tmp
!gzip -9 -f -v /tmp/boats.pgm
boats = !wc -c < /tmp/boats.pgm.gz
boats = boats[0]

/tmp/boats.pgm:	   23.4% -- replaced with /tmp/boats.pgm.gz


In [99]:
!cp 01-gzip/zelda.pgm /tmp
!gzip -9 -f -v /tmp/zelda.pgm
zelda = !wc -c < /tmp/zelda.pgm.gz
zelda = zelda[0]

/tmp/zelda.pgm:	   16.7% -- replaced with /tmp/zelda.pgm.gz


In [100]:
average = int((int(lena) + int(peppers) + int(boats) + int(zelda))/4)

In [101]:
import io
with io.open('../table.txt', 'a') as file:
    file.write('     gzip |{:8}{:8}{:8}{:8}{:8}\n'.format(lena, peppers, boats, zelda, average))

In [102]:
with io.open('../table.txt', 'r') as file:
    print(file.read())

    codec |    lena peppers   boats   zelda average
----------+----------------------------------------
 original |  786447 1572883  262159  262159  720912
     gzip |  733836  219854  200670  218198  343139

