Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode support #89

Open
GoogleCodeExporter opened this issue Mar 24, 2015 · 2 comments
Open

Unicode support #89

GoogleCodeExporter opened this issue Mar 24, 2015 · 2 comments

Comments

@GoogleCodeExporter
Copy link

Would be great to have unicode support in xdelta. 

There are different levels of support. 

For example, the bare minimum is to be able to pass in a Unicode file path
to the executable and have it read the file (currently can't do this on
Windows).

I guess more fuller support would allow embedding unicode strings in the
headers, but I suspect this isn't a high priority.

I had a go at trying to get this working on Windows. I came up with a hack
that will allow you to pass in Unicode file paths to the command line tool. 

The way I did it was:

 1. Renamed the existing "main" to "___main"
 1. Created the new Unicode main "wmain(int, wchar_t**)"
 2. Converted all the args from "wmain" into UTF-8 strings using the Win32
API WideCharToMultiByte
 3. Passed those strings into the old "___main"
 4. Then, in main_file_open, I use the Unicode version of CreateFile
(CreateFileW), first converting the UTF-8 strings into wide strings.

What this means is that for ASCII filenames, the behaviour remains the
same: these get turned into UTF-8, but this is a no-op (since UTF-8 ==
ASCII if you're only using ASCII chars).

Note that other arguments are also unaffected (because these will be ASCII
strings and so will look exactly the same after turning into UTF-8).

For Unicode filenames, the open file function will pass those back to
Windows in the wide format and so you can open the file even with unicode
filenames.

Caveats:

 * Suggested patch only works with Windows (though one can follow the
example for other platforms I'm sure)
 * Not tested...

Original issue reported on code.google.com by jdmw...@gmail.com on 16 Jun 2009 at 3:49

Attachments:

@GoogleCodeExporter
Copy link
Author

FYI: the patch I added won't let you do anything else in Unicode - it just 
supports
the filenames being in unicode. 

E.g., if you try and set the VCDIFF "header" to some Unicode string, it just 
won't
work. Will probably enter a load of garbage.

Original comment by jdmw...@gmail.com on 16 Jun 2009 at 4:25

@GoogleCodeExporter
Copy link
Author

I guess I need to study the use of wchar and any portability implications here.

Original comment by josh.mac...@gmail.com on 9 Jan 2010 at 1:30

  • Changed state: Accepted

dreamer2908 added a commit to dreamer2908/YAXBPC that referenced this issue Oct 9, 2016
…dows-1252

It turns out that xdelta3 doesn't have proper unicode support.
In Linux, Unix, Mac OS, and the like, command-line options (filenames included) are passed to xdelta3 in UTF-8 encoding (byte array). xdelta3 receives them as is, uses them as is to access files, stores them as is in vcdiff file header, loads them from vcdiff file header as is, displays them as is. The filenames are not transformed/encoded/decoded/whatever-ed at all. UTF-8 is an Unicode encoding, supporting all characters. So, everything works fine, unicode or ascii or whatever. Nothing special is required. You can say it happens to work.
In Windows, command-line options are passed to xdelta3's main (int argc, char **argv) in Microsoft Windows Codepage 1252 (also byte array). xdelta3 also receives/blablah-s them as is. The filenames are not transformed/etc-ed at all. However, cp1252 only supports a small set of character (see here http://www.ascii.ca/cp1252.htm ). All characters in path/filename not supported in cp1252 will become either question marks (?) or equivalent cp1252 character if available (like full-width  <>:"/\|?* becomes ascii <>:"/\|?*). Unicode filenames are doomed.
For Unicode support in Windows, xdelta3 needs a new wmain(int, wchar_t**) to receive command-line options as wide char (UTF-16), them uses the unicode versions of IO API like CreateFileW, etc. This has been requested years ago here jmacd/xdelta#89
The workaround (previously called trick) still works.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant