Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

return encoding from detect methods #107

Closed
kostya opened this issue Jun 7, 2017 · 5 comments
Closed

return encoding from detect methods #107

kostya opened this issue Jun 7, 2017 · 5 comments

Comments

@kostya
Copy link
Contributor

kostya commented Jun 7, 2017

would be nice if myhtml_encoding_prescan_stream_to_determine_encoding and myhtml_encoding_extracting_character_encoding_from_charset return finded encoding as string also. example:

<html>
  <meta charset=cp-1251>
  <body>
    ����
  </body>
</html>

return MyENCODING_DEFAULT and "cp-1251". and i can use my detector.

It needed because myhtml not detect all encodings (with misprints), but browser detects.

@lexborisov
Copy link
Owner

Hi @kostya
You mean that he has to cut out a fragment from the example:

<html>
  <meta charset=cp-1251>
  <body>
    ����
  </body>
</html>

some like myhtml_encoding_prescan_stream_to_determine_encoding returning cp-1251 from charset=cp-1251 without any transformation?

or you can act differently, you can give a complete list of encodings in the format:

windows-1251 => cp-1251, cp1251

I supported all in this list

@kostya
Copy link
Contributor Author

kostya commented Jun 7, 2017

yes, i mean cut substring "cp-1251" (when not detected), and return to me, and i detect it by myself.

@lexborisov
Copy link
Owner

There is not so easy to return the encoding raw data, all api will break. I think about how to do it

@lexborisov
Copy link
Owner

lexborisov commented Jun 16, 2017

@kostya
Copy link
Contributor Author

kostya commented Jun 16, 2017

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants