Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in Huffman algorithm #10864

Closed
johanrosenkilde opened this issue Mar 1, 2011 · 19 comments
Closed

Bug in Huffman algorithm #10864

johanrosenkilde opened this issue Mar 1, 2011 · 19 comments

Comments

@johanrosenkilde
Copy link
Contributor

There seems to be a bug in the Huffman build algorithm when given a frequency dictionary.

The following example results in a wrong encoding table: Let there be 10 symbols numbered 1,..,10 where number i occurs with probability i/55.
The Huffman table I end up with, manually running the algorithm is something like the following:

{ 1: '00000', 2: '00001', 3: '0001', 4: '1000', 5: '1001',
  6: '001', 7: '110', 8: '111', 9: '101', 10: '01' }

which has expected length 173/55.

The current Huffman-algorithm returns

{1: '1100', 2: '1101', 3: '1110', 4: '1111', 5: '000',
 6: '001', 7: '010', 8: '011', 9: '100', 10: '101'}

which has expected length 175/55.

Apply :

  • trac_10864.patch
  • trac_10864-2.patch

CC: @nathanncohen @sagetrac-mvngu

Component: coding theory

Keywords: huffman

Author: Nathann Cohen

Reviewer: Johan Sebastian Rosenkilde Nielsen

Merged: sage-4.7.alpha3

Issue created by migration from https://trac.sagemath.org/ticket/10864

@nathanncohen
Copy link
Mannequin

nathanncohen mannequin commented Mar 1, 2011

comment:1

This happens when Sage is fed with a dictionnary where it expects a string.

sage: freq = dict([(i,i) for i in range(1,11)])
sage: freq
{1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, 10: 10}

sage: Huffman(freq).encoding_table()
{1: '1100', 2: '1101', 3: '1110', 4: '1111', 5: '000', 6: '001', 7: '010', 8: '011', 9: '100', 10: '101'}

sage: Huffman(table = freq).encoding_table()
{1: '01110', 2: '01111', 3: '0110', 4: '1110', 5: '1111', 6: '010', 7: '100', 8: '101', 9: '110', 10: '00'}
sage: enc = Huffman(table = freq).encoding_table()
sage: sum(map(lambda (x,y): x*len(y),enc.items()))
173

This would take half a second to notice if only Sage was returning an error rather than work on a dictionary instead of a string. This patch should avoid this problem in the future :-)

Nathann

@nathanncohen nathanncohen mannequin added this to the sage-4.6.2 milestone Mar 1, 2011
@nathanncohen nathanncohen mannequin added the s: needs review label Mar 1, 2011
@nathanncohen
Copy link
Mannequin

nathanncohen mannequin commented Mar 1, 2011

Author: Nathann Cohen

@sagetrac-mvngu sagetrac-mvngu mannequin modified the milestones: sage-4.6.2, sage-4.7 Mar 1, 2011
@nathanncohen
Copy link
Mannequin

nathanncohen mannequin commented Mar 1, 2011

comment:5

Oh.. Thanks ! :-)

@sagetrac-dsm
Copy link
Mannequin

sagetrac-dsm mannequin commented Mar 2, 2011

comment:6

Minor: the doctest says " Feeding a dictionary instead of a string:: ", but then feeds it a list of strings, not a dictionary.
And I don't know if this was intended, but the use of isinstance(string,str) rules out the use of unicode strings.

@johanrosenkilde
Copy link
Contributor Author

comment:7

Ok, I see that I used the class in the wrong way :-)

Wouldn't it be more user-friendly to change the Huffman constructor to accept only one unnamed argument, and then treat it according to type, e.g:

def __init__(self, source):
  if isinstance(source, string):
     # init by string
  elif isinstance(source, dict):
     # init by dict
  else:
     # give the user an error

(Because of deprecation policy, I guess we would need to support the current interface in some way, which would make the code a bit messy, but that will only be for a year)

@nathanncohen
Copy link
Mannequin

nathanncohen mannequin commented Mar 2, 2011

comment:8

Hello !

Replying to @sagetrac-dsm:

Minor: the doctest says " Feeding a dictionary instead of a string:: ", but then feeds it a list of strings, not a dictionary.

Right. I just thought : "Let's feed it anything but a string"

And I don't know if this was intended, but the use of isinstance(string,str) rules out the use of unicode strings.

I don't know what an unicode string is, and I will try to find it out immediately. How do you think we should filter it then ?

Nathann

@sagetrac-dsm
Copy link
Mannequin

sagetrac-dsm mannequin commented Mar 2, 2011

comment:9

Replying to @nathanncohen:

I don't know what an unicode string is, and I will try to find it out immediately. How do you think we should filter it then ?

I think the usual idiom in python 2 is "isinstance(s, basestring)" to allow both.

@nathanncohen
Copy link
Mannequin

nathanncohen mannequin commented Mar 2, 2011

comment:10

Hello !!

This is an updated patch in which string has been replaced by basename, and an unique argument is expected by the constructor.

This class being pretty new and not really advertised, perhaps we can do without backward compatibility for once ?.. :-p

I mean, this class is perhaps only useful to illustrate what the Huffman algorithm is (slow encoding in Python..), and updating a possibly uncompatible (if it exists somewhere in the world) should not take more than a few seconds :-)

Nathann

@nathanncohen
Copy link
Mannequin

nathanncohen mannequin commented Mar 2, 2011

Attachment: trac_10864.patch.gz

@johanrosenkilde
Copy link
Contributor Author

comment:11

Nice patch. I agree with sidestepping the deprecation policy :-)
I'll try and review once my 4.6.2 finish compiling :-P

Johan

@johanrosenkilde
Copy link
Contributor Author

comment:12

Yep, works fine for me on 4.6.2, and code looks fine as well. Green light.

@sagetrac-mvngu
Copy link
Mannequin

sagetrac-mvngu mannequin commented Mar 3, 2011

comment:13

Replying to @johanrosenkilde:

Yep, works fine for me on 4.6.2, and code looks fine as well. Green light.

I object to the positive review. Note that the documentation for the class Huffman is out of sync with the definition of that class. For instance, the documentation still refers to the deleted argument table. Sorry for being nasty, but this ticket goes to "needs_work".

@nathanncohen
Copy link
Mannequin

nathanncohen mannequin commented Mar 3, 2011

comment:14

Argggggg !!! Sorry about that !!!

I had been looking for occurrences of table = in the code to replace them by source or remove them, and I forgot some occurrences of table in the documentation.... I hope there is none left :-)

Nathann

@nathanncohen

This comment has been minimized.

@nathanncohen
Copy link
Mannequin

nathanncohen mannequin commented Mar 3, 2011

Attachment: trac_10864-2.patch.gz

@johanrosenkilde
Copy link
Contributor Author

comment:16

Woops, sorry about that. Good thing you're awake Minh :-)

I reread the whole thing with the new patch, and retested, rebuilt and redoc'ed. It seems alright now.

@nathanncohen
Copy link
Mannequin

nathanncohen mannequin commented Mar 3, 2011

comment:17

Replying to @johanrosenkilde:

Woops, sorry about that. Good thing you're awake Minh :-)

Minh is always awake. Minh does not rest, and sees everything. Minh is like a better version of Chuck Nurris invented by Chuck Nurris O_O

Nathann

@jdemeyer
Copy link

Reviewer: Johan Sebastian Rosenkilde Nielsen

@jdemeyer
Copy link

Merged: sage-4.7.alpha3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants