Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unzip cyrillic text in win1251 encoding #39

Closed
beshkenadze opened this issue Mar 4, 2013 · 14 comments
Closed

Unzip cyrillic text in win1251 encoding #39

beshkenadze opened this issue Mar 4, 2013 · 14 comments

Comments

@beshkenadze
Copy link

Алексей(cp1251).zip -> Àëåêñåé(cp1252).unzip (must be utf8 :) )

@Jacob-Christian-Munch-Andersen

Exactly what are you trying to do? The library has no native support for charsets other than UTF-8 and a basic ASCII encoding, if you want to use a different encoding you must make the transformation yourself.

@beshkenadze
Copy link
Author

I get it. It seems the problem is not solved by JS :(

@Jacob-Christian-Munch-Andersen

The best solution is usually to use UTF-8 all the way through, if that is not an option, for instance because an external source provide data in a different format you'll have to write a converter. It's not that difficult, you just need a table of the Unicode values of the 256 characters in CP1251: http://en.wikipedia.org/wiki/Cp1251

For instance when you find the character with code 192 you should convert it to 0x410 (1040 in decimal), and so forth.

@beshkenadze
Copy link
Author

The problem is that not always the file will be in cp1251, it can be in other encodings, but thanks for the answer!

@gildas-lormeau
Copy link
Owner

Yes Jacob is right, zip.js can't support all encodings because it's beyond its scope. The right option here is to write your own Reader and Writer constructors instead of using TextReader/TextWriter.

@gildas-lormeau
Copy link
Owner

Actually, I am wrong, there's maybe a fix (for file data) by using FileReader#readAsText 2nd parameter...

@beshkenadze
Copy link
Author

I will try

@gildas-lormeau
Copy link
Owner

Thanks!

@beshkenadze
Copy link
Author

Thanks for the help! Work, but not for zip.js :)
zip.js gives me already broken text.
I've used https://code.google.com/p/bitjs he gives data in Uint8Array, I've read with readAsText (bb, 'cp1251') and this work!

@gildas-lormeau
Copy link
Owner

I think I just have to change the TextWriter constructor signature to accept a new encoding parameter and add the optional parameter here and it should be ok in zip.js :).

@beshkenadze
Copy link
Author

Thanks! :)

@beshkenadze
Copy link
Author

Hi!
The problem is solved for BlobReader, but when using HttpReader problem persists.
Why not use in HttpReader:
var dataView = new DataView(request.response);
that.data = new Blob([dataView], {type: request.getResponseHeader("Content-Type")});
and use as BlobReader?

And it may be need to add an additional parameter to the "charset" TextWriter to specify the encoding, and use like this? :
if(typeof charset != "undefined")
reader.readAsText(blob, charset);
else
reader.readAsText(blob);

@gildas-lormeau
Copy link
Owner

Hi :)

I may be missing something here but I don't really what's wrong with the current implementation of HttpReader constructor. There are 2 main use cases when using it:

  • reading a zip file to uncompress : it's the zip binary data so the charset is not an issue
  • reading a text file (or binary file) to compress : it may be a text file but it's also read as binary data because there is no need to read it as text when compressing it

Could you explain me when using "Content-Type" header is really useful?

@beshkenadze
Copy link
Author

reading a zip file to uncompress : it's the zip binary data so the charset is not an issue

There is no easy way to convert the encoding but to specify when readAsText.

Could you explain me when using "Content-Type" header is really useful?

Content-Type is not strictly required (in zip.js), but improves the semantics.
And "DataView & Blob" does not apply to the Content-type, but "DataView & Blob" method to receive file from the network.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants