[imgcodecs] Support unicode filenames on Windows. #19862

egorpugin · 2021-04-05T16:55:47Z

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
The PR is proposed to proper branch
There is reference to original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

Hi,

This PR enables unicode file name support for imgcodecs on Windows. With it opencv can read and write to files with unicode file names given utf8 file name is provided.
It also adds long path support (see PR condition if (s.size() >= 255)).

Custom cv::fopen() is added to utils.hpp, so all codecs will see it.
On windows it performs utf8 to wstring conversion and _wfopen_s() call.
On other systems libc fopen is called.

There is fast path if we open ascii filename. If it fails, conversion code path is used.

alalek · 2021-04-05T19:41:12Z

Please check conversations from the similar PRs / issues first. There is also FAQ item about this topic.

so all codecs will see it

This is not true. Almost all codecs are external and they are located in the "3rdparty" subdirectory (or they are used as external libraries, e.g. in vcpkg build scripts).

egorpugin · 2021-04-05T19:58:57Z

Internal codecs?

egorpugin · 2021-04-05T20:01:49Z

As I saw there is a lot of unicode issues in python bindings, not in C++.
loadsave.cpp uses fopen, then gfmt_... uses fopen. This fix is visible to these users.

asenyaev · 2021-04-08T05:23:42Z

jenkins cn please retry a build

egorpugin · 2021-05-24T12:46:42Z

jenkins cn please retry a build

egorpugin · 2021-05-25T13:35:50Z

modules/imgcodecs/src/utils.cpp

+    if (f)
+        return f;
+    // try unicode path assuming utf8 was passed
+    static std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;


It seems, static should removed to prevent data races.

vpisarev · 2021-05-25T13:57:00Z

@egorpugin, thank you for the contribution!

there are several issues with the patch:

it's submitted from the master branch. git/github and our CI system cannot always track correctly the changes that you did and merge the patch correctly. It's recommended to create a dedicated branch for the fix.
the patch attempts to replace the standard function fopen, and it succeeds to do that only in a few places. In many other places the problems persists.
as @alalek correctly pointed out, on Windows OpenCV builds libpng, libjpeg, libtiff, ... from sources and uses them. Those are separate libraries located in opencv/3rdparty (and, by the way, they are written in C), therefore they do not "see" this new C++ cv::fopen().

One of the possible methods to fix the problem with unicode paths on Windows would be to implement cv::imread()/cv::imwrite in 2 steps:

read the whole file into memory.
use cv::imdecode() to decode it
or
use cv::encode() to encode the image in memory
write the memory buffer to file.
(maybe, such 2-step procedure should only be used when the mainstream branches of cv::imread()/cv::imwrite() failed)

There is a problem with such in-memory encoding/decoding approach is that for some codecs cv::imdecode()/cv::imencode are implemented via cv::imread()/cv::imwrite with temporary files, so there is a risk of infinite recursion. But the majority of codecs support in-memory encoding/decoding. In any case, there should be protection against infinite recursion.

Are you willing to try to implement this 2-step approach with a proper protection?

egorpugin · 2021-05-25T14:50:11Z

Hi,

Thank you for the explanation.
I'll think about possible implementation in a separate PR.

[imgcodecs] Support unicode filenames on Windows.

38b58e4

egorpugin closed this May 24, 2021

egorpugin reopened this May 24, 2021

vpisarev self-assigned this May 25, 2021

vpisarev self-requested a review May 25, 2021 13:11

egorpugin commented May 25, 2021

View reviewed changes

Prevent data races.

26f1cfe

egorpugin closed this May 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[imgcodecs] Support unicode filenames on Windows. #19862

[imgcodecs] Support unicode filenames on Windows. #19862

egorpugin commented Apr 5, 2021 •

edited

alalek commented Apr 5, 2021

egorpugin commented Apr 5, 2021

egorpugin commented Apr 5, 2021

asenyaev commented Apr 8, 2021

egorpugin commented May 24, 2021

egorpugin May 25, 2021

vpisarev commented May 25, 2021

egorpugin commented May 25, 2021

[imgcodecs] Support unicode filenames on Windows. #19862

[imgcodecs] Support unicode filenames on Windows. #19862

Conversation

egorpugin commented Apr 5, 2021 • edited

Pull Request Readiness Checklist

alalek commented Apr 5, 2021

egorpugin commented Apr 5, 2021

egorpugin commented Apr 5, 2021

asenyaev commented Apr 8, 2021

egorpugin commented May 24, 2021

egorpugin May 25, 2021

Choose a reason for hiding this comment

vpisarev commented May 25, 2021

egorpugin commented May 25, 2021

egorpugin commented Apr 5, 2021 •

edited