Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huffman Encoding #8765

Closed
nathanncohen mannequin opened this issue Apr 25, 2010 · 24 comments
Closed

Huffman Encoding #8765

nathanncohen mannequin opened this issue Apr 25, 2010 · 24 comments

Comments

@nathanncohen
Copy link
Mannequin

nathanncohen mannequin commented Apr 25, 2010

This is a basic implementation of Huffman's encoding. May it be useful to teach ! :-)

Apply patches in this order:

  1. trac_8765-huffman.patch
  2. trac_8765-clean-ups.patch

CC: @wdjoyner @sagetrac-mvngu @nexttime

Component: coding theory

Author: Nathann Cohen

Reviewer: Minh Van Nguyen

Merged: sage-4.4.4.alpha0

Issue created by migration from https://trac.sagemath.org/ticket/8765

@nathanncohen nathanncohen mannequin added this to the sage-4.4.4 milestone Apr 25, 2010
@nathanncohen nathanncohen mannequin assigned wdjoyner Apr 25, 2010
@nexttime
Copy link
Mannequin

nexttime mannequin commented Apr 25, 2010

comment:2

Much room for extensions, e.g.: ;-)

  • accept (also) list of symbols of arbitrary alphabet
    (type should be checked anyway)
  • binary file (stream) I/O..., with or without encoding table
  • generate encoding/decoding functions

There's actually a possible application within Sage itself: compression of prime_pi and nth_prime tables. (The table-based functions are still work in progress.)

OT:

  • I could perhaps contribute FAX G3 en/decoding, too, if I'm able to find my dead old implementation... :)
  • Another nice thing is encoding and decoding of machine instructions (assembling, disassembling).

-Leif

@nathanncohen nathanncohen mannequin added the s: needs review label Apr 25, 2010
@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Apr 25, 2010

comment:4

To think I just needed a function for Huffmann encoding yesterday to test something.. Sounds like this file will soon be out of my reach :-D

Nathann

@nexttime
Copy link
Mannequin

nexttime mannequin commented Apr 25, 2010

comment:5

Replying to @nathanncohen:
Haven't tested yet, but there are at least some typos.
(I could fix them, but most probably not today.)

I'd also add [more] type checks on the inputs.

@nexttime nexttime mannequin added s: needs work and removed s: needs review labels Apr 25, 2010
@sagetrac-ylchapuy
Copy link
Mannequin

sagetrac-ylchapuy mannequin commented Apr 27, 2010

comment:6

Hi, I only had a very quick look and I'm a bit concerned about where to put this code in the library.

sage.coding is about channel coding (error correcting code) but this ticket is about source coding (data compression) and I think it's a bad idea to mix those in the library and in the doc.

I would feel a lot more confortable with sage.coding.source_coding.huffman.

 Yann

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Apr 27, 2010

comment:7

Sounds sensible enough ! I just moved the file in a source_coding subdirectory, but I can not import sage.coding.source_coding.huffman, as sage does not find the source_coding module... Where should I tell it it exists ? :-)

Nathann

@nathanncohen nathanncohen mannequin added s: needs info and removed s: needs work labels Apr 27, 2010
@wdjoyner
Copy link

comment:8

Replying to @nathanncohen:

Sounds sensible enough ! I just moved the file in a source_coding subdirectory, but I can not import sage.coding.source_coding.huffman, as sage does not find the source_coding module... Where should I tell it it exists ? :-)

Nathann

Maybe you could try to add an (empty) init.py file to it before adding huffman.py?

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Apr 28, 2010

comment:9

Hmmm... I added this empty init.py file in source_coding/, but it made no difference... Is that what you meant ? :-/

Nathann

@sagetrac-mvngu
Copy link
Mannequin

sagetrac-mvngu mannequin commented Apr 28, 2010

comment:10

Replying to @nathanncohen:

Hmmm... I added this empty init.py file in source_coding/, but it made no difference... Is that what you meant ? :-/

Try putting a comment in that __init__.py file. For example, the content of that init file might be:

# Just a comment so that __init__.py is not an empty file.

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Apr 28, 2010

comment:11

I just tried it (you were right, the file I created was empty), but noticed no difference... In the end, is it really worth creating a directory anyway ? I could just rename the file to source_coding.py and let this Huffman class stay inside ?..

But I still would like to understand how to get this directory detected :-/

anything to do in modules_list.py even though it is not a Cython file ?

Nathann

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Apr 28, 2010

comment:12

What about this version (no directory, but a file source_coding.py ) ? :-)

@wdjoyner
Copy link

comment:13

This applies to 4.4.rc0 fine and passes sage -testall.
I have not checked if the documentation builds okay.
Are the other reviewers satisfied with the changes in this last patch?

@nexttime
Copy link
Mannequin

nexttime mannequin commented Apr 29, 2010

comment:14

Replying to @wdjoyner:

This applies to 4.4.rc0 fine and passes sage -testall.
I have not checked if the documentation builds okay.

We should do that before giving a positive review, I guess... :)

Are the other reviewers satisfied with the changes in this last patch?

There are still typos in the description:

  • s/occurence/occurrence/
  • s/frquency/frequency/
  • s/its its/its/ (two times)
  • s/occurencies/occurrences/ (two times)
  • s/eah/each/

encoding_table():
s/each letter/each trained letter/

__init__:
It's not tested if both string and frequencies are given (=> error).

And as I said before I'd prefer type checks on the parameters (rather than relying on automatically raised exceptions later in the code).

I also prefer having the return type documented at the top rather than (more or less) implicitly in the examples (e.g. in frequency_table(); tree() actually returns a DiGraph which happens to also be a tree.)

Doctests?

(Still haven't tested the code, just read the patches, sorry.)

-Leif

@nexttime nexttime mannequin added s: needs work and removed s: needs review labels Apr 29, 2010
@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Apr 30, 2010

comment:15

Updated ! :-)

Nathann

@nexttime
Copy link
Mannequin

nexttime mannequin commented Apr 30, 2010

comment:16

Attachment: trac_8765.patch.gz

from operator import xor... ;-)

I'll test it this evening.

@sagetrac-mvngu
Copy link
Mannequin

sagetrac-mvngu mannequin commented May 2, 2010

Attachment: trac_8765-huffman.patch.gz

new module sage.coding.source_coding.huffman

@sagetrac-mvngu
Copy link
Mannequin

sagetrac-mvngu mannequin commented May 2, 2010

comment:17

The patch trac_8765-huffman.patch replaces the earlier one trac_8765.patch. Now the Huffman module is in the directory source_coding/. Nothing has changed in the replacement patch, except for changes to get Sage to recognize this new module.

@sagetrac-mvngu

This comment has been minimized.

@nexttime
Copy link
Mannequin

nexttime mannequin commented May 2, 2010

comment:18

Replying to @sagetrac-mvngu:
I guess frequency_table should be imported in all.py as well.

Have to clone again and import the new patch... ;-)

-Leif

@sagetrac-mvngu

This comment has been minimized.

@sagetrac-mvngu
Copy link
Mannequin

sagetrac-mvngu mannequin commented May 2, 2010

comment:19

Attachment: trac_8765-clean-ups.patch.gz

Changes in the reviewer patch include:

  1. Add substantially more documentation to the module.
  2. Clean-ups in accordance with PEP 008.
  3. Don't use the module string nor its join function. These have been deprecated. You should now use str and the function "".join.
  4. Get the whole module to 100% coverage.

This means I have reviewed trac_8765-huffman.patch, so only my patch needs review by anyone but me.

@sagetrac-mvngu
Copy link
Mannequin

sagetrac-mvngu mannequin commented May 2, 2010

Reviewer: Minh Van Nguyen

@nexttime
Copy link
Mannequin

nexttime mannequin commented May 2, 2010

comment:20

Replying to @sagetrac-mvngu:

This means I have reviewed trac_8765-huffman.patch, so only my patch needs review by anyone but me.

That patch looks very good.
(I wished all module documentations had that quality.)

I'll only add some more doctests and perhaps edit some comments.

I think improvements to the algorithm can be done on another ticket.

-Leif

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented May 9, 2010

comment:21

The only reason why I still read your patches, apply them, check they are correct, check the docstrings, build the documentation and read it, is that I fear some God may be watching over my shoulder.

Another perfect patch from Minh :-D

Thank you very much !

Nathann

@mwhansen
Copy link
Contributor

mwhansen commented Jun 6, 2010

Merged: sage-4.4.4.alpha0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants