Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

all::encodings() returns an errornous list (and should be sorted alphabetically). #104

Closed
getreu opened this issue Oct 12, 2016 · 4 comments

Comments

@getreu
Copy link

getreu commented Oct 12, 2016

The bug is in src/all.rs:

   const ENCODINGS: &'static [EncodingRef] = &[ ....

This is the way I collect the list (please confirm that it should be done this way):

   let list = all::encodings().iter().map(|&e|format!(" {}\n",e.name())).collect::<String>();

the following names do not work:

 error mac-roman mac-roman mac-cyrillic hz big5-2003 pua-mapped-binary encoder-only-utf-8

These two do work but are not listed:

  x-user-defined macintosh

PLEASE return them alphabetically sorted!

@lifthrasiir
Copy link
Owner

lifthrasiir commented Oct 12, 2016

@getreu I'm confused as to what do you mean by names "not working".

My best guess is that you are trying to get encodings by encoding::label::encoding_from_whatwg_label function, which is (as the name suggests) a lookup by the names from WHATWG Encoding standard, which may have a different name from Encoding::name (which is a descriptive, arbitrary name for the inspection only). I believe the function names are clear enough that this does not need any fix.

There is no guarantee that encoding::all::encodings() will return encodings in any particular order, and I cannot see any benefit in doing so (enumerating encodings is extremely rare). If you think that this should be communicated clearly, please file a pull request with documentation added.

@getreu
Copy link
Author

getreu commented Oct 12, 2016

Maybe I am using your API not in the right way:

a) My tools prints a list with all encodings supported:

let list = all::encodings().iter().map(|&e|format!(" {}\n",e.name())).collect::<String>();

b) Then the user choses one entry from the list and enters a string enc_name with the encoding name:

encoding_from_whatwg_label(enc_name.as_str())

These encodings are listed in a) but are not understood in b)

error mac-roman mac-roman mac-cyrillic hz big5-2003 pua-mapped-binary encoder-only-utf-8
  • As I understood the list a) is generated from a constant string in all.rs. It would be nice if this list
    is ordered alphabetically. Then I do not need to do sort it before showing it to the user.

@lifthrasiir
Copy link
Owner

lifthrasiir commented Oct 13, 2016

First, the primary WHATWG name (which encoding_from_whatwg_label accepts) is available from Encoding::whatwg_name (which can be absent, by the way, if it's not in the standard).

I think you are doing this in the wrong way, however. You should have a separate mapping from the name (whatever it is) to the encoding object, like HashMap<&'static str, EncodingRef> (or BTreeMap if you want sorted ones). The Encoding standard often changes the entire set of aliases, primarily to reflect what the current web browsers should do. Also I haven't explicitly mentioned this but it's my intention for all Encoding-supported encodings to have unique, descriptive and semi-stable [1] Encoding::names, so if you need some human-readable names you can at least use it as a unique identifier.

[1] That is, while I will keep a unique name for each encoding (so that you can, say, map "big5-2003" to the actual encoding mostly described by Big5-2003 and also to more human-readable name like "Traditional Chinese (Big5-2003)"), if the encoding implementation has changed substantially I may change the name without any explicit support for the prior encoding (which is generally hard). Or you can commit to whatever web browsers do by using a stable WHATWG name instead, but I guess this is not your use case.

@getreu
Copy link
Author

getreu commented Oct 13, 2016

Thanks a lot for your help! I modified my supported encodings list generation code as follows:

let list = all::encodings().iter().filter_map(|&e|e.whatwg_name()).sorted();

As you can see I sorted the list myself. I agree with you that this feature is not of interest for many people.

@getreu getreu closed this as completed Oct 13, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants