Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String encoding in source code #91

Closed
Cirn09 opened this issue Aug 27, 2021 · 2 comments
Closed

String encoding in source code #91

Cirn09 opened this issue Aug 27, 2021 · 2 comments
Milestone

Comments

@Cirn09
Copy link
Contributor

Cirn09 commented Aug 27, 2021

There are too many potential coding errors in source code
for example:
https://github.com/end2endzone/ShellAnything/blob/master/src/Win32Clipboard.cpp#L42

  static const UINT gFormatDescriptorBinary     = RegisterClipboardFormat("Binary");
  static const UINT gFormatDescriptorDropEffect = RegisterClipboardFormat("Preferred DropEffect");

it only work right when not define UNICODE. And RegisterClipboardFormatA will convert "Binary" to L"Binary", then call RegisterClipboardFormatW.
To improve compatibility, should write like RegisterClipboardFormatA("Binary"), or RegisterClipboardFormatW(L"Binary"),
or more appropriate: RegisterClipboardFormat(TEXT"Binary"))

I tried to modify a part of the code, but I found that this caused coding confusion in the code.
So I recommend ​using TCHAR instead of char, _stprintf_s instead of sprintf_s, TEXT("bala") instead of "bala"

#ifdef UNICODE
 ​#define tstring std::wstring
#else
 ​#define tstring std::string
#endif

or more cpp style

typedef basic_string<TCHAR, char_traits<TCHAR>, allocator<TCHAR> >  tstring;

instead of std::string
(ref: https://stackoverflow.com/questions/36197514/what-is-the-macro-for-stdstring-stdwstring-in-vc)

And define UNICODE and _UNICODE

@end2endzone
Copy link
Owner

You are right about the confusion with the coding standard from Win32Clipboard.cpp (and Win32Registry.h, Win32Registry.cpp and Win32Clipboard.h) with the rest of the software. These files comes from another of my projects and were written in Windows 2000 or Windows XP days. The code standard and style does not exactly match the main code. In the example above, the word "Binary" only contains English letters and does not require UTF-8 or UNICODE encoding. The code should be modified to implicitly call RegisterClipboardFormatA().

  static const UINT gFormatDescriptorBinary     = RegisterClipboardFormatA("Binary");
  static const UINT gFormatDescriptorDropEffect = RegisterClipboardFormatA("Preferred DropEffect");

I don't want to sound rude but I do not understand your point about supporting the UNICODE preprocessor. The UNICODE preprocessor was created in order to be able to compile an application for ANSI code page and for Microsoft UNICODE code page. The ANSI version of ShellAnything does not exists and I do not have any plans on supporting an ANSI version of the ShellExtension.

ShellAnything is designed to be compiled without UNICODE (or _UNICODE). The main code is written to implicitly use the Wide variety of Win32 functions and not rely on macros based on UNICODE preprocessor. In order words, the code should call CreateFileW() (or in rare occasions CreateFileA()) directly instead of using CreateFile() macro. The code may still be using Win32 macros when calling a system function but each location in the code should be identified and change accordingly.

Since data in ShellAnything is stored as UTF-8 strings , using char instead of TCHAR is preferable to prevent confusion. In my experience, supporting UNICODE preprocessor is a real pain. It leads to multiple encoding bugs. Everytime you have to save a tstring to a file, you need to convert the encoding from TCHAR to char anyway. This is why I decided to use UTF-8 for encoding strings. It is not ANSI nor UNICODE (a.k.a Microsoft utf-16 ish). I do not see any benefit in using TCHAR.

Even if the Shell Extension is compiled without the UNICODE preprocessor, it is still registering itself as a wide-characters compatible application to the system. It is also using the wide variety of Win32 functions (functions names ending with W).

If you need to define a string with Chinese characters, just use the wide function and use the L"" prefix. Maybe I could better understand your need if you would give an example.

@Cirn09
Copy link
Contributor Author

Cirn09 commented Aug 28, 2021

ShellAnything works very well in non-English environment, and there is no compatibility problem.
This issue was raised only because I was confused about the parameter types caused by functions such as GetModuleHandleEx not being used with the TEXT macro. Since there is no plan to compile the ASCII version, the suggestion of using TCHAR and tstring I mentioned before is unnecessary.

But I still recommend using a function like GetModuleHandleExA to avoid confusion when reading source code.

end2endzone added a commit that referenced this issue Oct 22, 2021
* Removed as much as possible the usage of WIN32 macros. Replaced by the actual ansi or wide variety.
* Updated Win32Clipboard code. Replaced unicode-ready functions by utf-8-ready version.
* Renamed InputBox::GetModuleHandle() function since it was conflicting with existing macros. The method is now renamed to GetInputBoxModuleHandle().
* Created new unit tests for supporting new Win32Clipboard utf-8 functions.
For #91
end2endzone added a commit that referenced this issue Oct 22, 2021
* feature-issue91:
  Fixed ambiguities for using ansi or unicode versions of win32 functions. * Removed as much as possible the usage of WIN32 macros. Replaced by the actual ansi or wide variety. * Updated Win32Clipboard code. Replaced unicode-ready functions by utf-8-ready version. * Renamed InputBox::GetModuleHandle() function since it was conflicting with existing macros. The method is now renamed to GetInputBoxModuleHandle(). * Created new unit tests for supporting new Win32Clipboard utf-8 functions. For #91
  Silenced the following warnings:     // D:\Projects\ShellAnything\src\shellext.cpp(632) : warning C4311 : 'reinterpret_cast' : pointer truncation from 'LPCSTR' to 'int'     // D:\Projects\ShellAnything\src\shellext.cpp(632) : warning C4302 : 'reinterpret_cast' : truncation from 'LPCSTR' to 'int'
  Created unicode version for `SetDragDropFilesUnicode()` and `GetAsDragDropFilesUnicode()`.
@end2endzone end2endzone added this to the 0.7.0 milestone Dec 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants