Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[imgcodecs] Support unicode filenames on Windows. #19862

Closed
wants to merge 2 commits into from

Conversation

egorpugin
Copy link
Contributor

@egorpugin egorpugin commented Apr 5, 2021

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
  • The PR is proposed to proper branch
  • There is reference to original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

Hi,

This PR enables unicode file name support for imgcodecs on Windows. With it opencv can read and write to files with unicode file names given utf8 file name is provided.
It also adds long path support (see PR condition if (s.size() >= 255)).

Custom cv::fopen() is added to utils.hpp, so all codecs will see it.
On windows it performs utf8 to wstring conversion and _wfopen_s() call.
On other systems libc fopen is called.

There is fast path if we open ascii filename. If it fails, conversion code path is used.

@alalek
Copy link
Member

alalek commented Apr 5, 2021

Please check conversations from the similar PRs / issues first. There is also FAQ item about this topic.

so all codecs will see it

This is not true. Almost all codecs are external and they are located in the "3rdparty" subdirectory (or they are used as external libraries, e.g. in vcpkg build scripts).

@egorpugin
Copy link
Contributor Author

Internal codecs?

@egorpugin
Copy link
Contributor Author

  1. As I saw there is a lot of unicode issues in python bindings, not in C++.
  2. loadsave.cpp uses fopen, then gfmt_... uses fopen. This fix is visible to these users.

@asenyaev
Copy link
Contributor

asenyaev commented Apr 8, 2021

jenkins cn please retry a build

1 similar comment
@egorpugin
Copy link
Contributor Author

jenkins cn please retry a build

@egorpugin egorpugin closed this May 24, 2021
@egorpugin egorpugin reopened this May 24, 2021
@vpisarev vpisarev self-assigned this May 25, 2021
@vpisarev vpisarev self-requested a review May 25, 2021 13:11
if (f)
return f;
// try unicode path assuming utf8 was passed
static std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems, static should removed to prevent data races.

@vpisarev
Copy link
Contributor

@egorpugin, thank you for the contribution!

there are several issues with the patch:

  1. it's submitted from the master branch. git/github and our CI system cannot always track correctly the changes that you did and merge the patch correctly. It's recommended to create a dedicated branch for the fix.
  2. the patch attempts to replace the standard function fopen, and it succeeds to do that only in a few places. In many other places the problems persists.
  3. as @alalek correctly pointed out, on Windows OpenCV builds libpng, libjpeg, libtiff, ... from sources and uses them. Those are separate libraries located in opencv/3rdparty (and, by the way, they are written in C), therefore they do not "see" this new C++ cv::fopen().

One of the possible methods to fix the problem with unicode paths on Windows would be to implement cv::imread()/cv::imwrite in 2 steps:

  1. read the whole file into memory.
  2. use cv::imdecode() to decode it
    or
  3. use cv::encode() to encode the image in memory
  4. write the memory buffer to file.
    (maybe, such 2-step procedure should only be used when the mainstream branches of cv::imread()/cv::imwrite() failed)

There is a problem with such in-memory encoding/decoding approach is that for some codecs cv::imdecode()/cv::imencode are implemented via cv::imread()/cv::imwrite with temporary files, so there is a risk of infinite recursion. But the majority of codecs support in-memory encoding/decoding. In any case, there should be protection against infinite recursion.

Are you willing to try to implement this 2-step approach with a proper protection?

@egorpugin
Copy link
Contributor Author

Hi,

Thank you for the explanation.
I'll think about possible implementation in a separate PR.

@egorpugin egorpugin closed this May 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants