-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
paths as std::string
is not safe or cross-platform -- e.g. fails on Windows with all non-ascii paths
#352
Comments
Would the following work well, while we modify the conversion depending on the platform?
And we can keep the paths saved in char arrays / std::string, and specify that those have to be UTF-8, but convert to wide upon calls to various FS related calls (eg. for Wins)? Or we go with std::filesystem - if Windows STL support is there for it? I think both options would work well also from perspective of Python bindings, as the paths are UTF-8 there platform independent - so calls to the bound functions taking std::string would work as expected. |
MSVC's STL fully supports Given a c++14 approach, I think using
// main app code
// apps before c++17 will have to use native OS apis or codecvt to convert from wchar_t -> utf-8
// app with c++17 have `filesystem` so can use filesystem::path and its utf8 handling
neuralnet->setBlobPath(modelPath.generic_u8string()); // convert #1: wchar_t -> utf-8 stored in an std::string
// setBlobPath(const std::string &utf8Path) calls assetManager.set("__blob", path)
std::shared_ptr<dai::Asset> AssetManager::set(const std::string& key, const std::string& path, int alignment) {
#ifdef WINDOWS
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> utf16convert;
std::wstring widePath = utf16convert.from_bytes(path); // convert #2: utf-8 -> wchar_t
std::ifstream stream(widePath, std::ios::in | std::ios::binary);
#else
// Load binary file at path
std::ifstream stream(path, std::ios::in | std::ios::binary);
#endif
if(!stream.is_open()) {
... |
...also be careful/test text output of this data. For example, on Windows a printf("%s", myutf8string.cstr()) will fail to display correctly. A windows hack is to use the |
I've got some bandwidth again. If we can align on approach, I can take on this work. I lean towards my bullet 3: "I think most of the depthai-core code that is affected is not performance-oriented. If it is, then could have overloaded functions to accept both wstring and string. Or SFINAE with templates. Or use boost::filesystem"
|
Did some reasearch. I think the approach could be easy and flexible change one line of the binding in depthai-python .def("setBlobPath", [](NeuralNetwork& nn, py::object obj){
// Allows to call this function with paths as well as strings
nn.setBlobPath<std::string>(py::str(obj)); // only need to add the explicit template param
}, py::arg("path"), DOC(dai, node, NeuralNetwork, setBlobPath)) then in depthai-core, change the signature to be a template. template<typename T>
void NeuralNetwork::setBlobPath(const T& path) {
auto asset = assetManager.set("__blob", path);
...
}
template<typename T>
std::shared_ptr<dai::Asset> AssetManager::set(const std::string& key, const T& path, int alignment) {
// Load binary file at path
std::ifstream<T> stream(path, std::ios::in | std::ios::binary); Something very similar to that should work with std::string and filesystem::path. I may need to specialize T=wstring w/ codecvt for those SDK users that are only C++14, on windows, and need the widechar support. Easy code like I wrote above. |
Thanks @diablodale and sorry for the delay Your latest suggestion seems good to me - so we'd assume UTF-8 for std::string and let users use wstring for Windows (or filesystem::path) Also bindings using std::string template is 👌 as string there are UTF-8. https://pybind11.readthedocs.io/en/stable/advanced/cast/strings.html#passing-python-strings-to-c
What would need to happen in this case? Wouldn't eg. ifstream taking wstring do the right thing underneath? Also we can specialize (or declare?) for this 3 cases (std::string, std::wstring & std::filesystem::path) if we don't want to move function implementation into the header |
Only Microsoft's STL implementation supports Other compilers for Windows probably don't use Microsoft's STL and runtime. Therefore, probably don't support wchar for path. I suspect this is a very small percentage of your users. I suggest not supporting this scenario. When a Windows user calls, e.g., I'll start work on PR and explore how to do this. I hear that you want to limit code in the headers. |
@themarpe found a subissue while applying this PR broadly in depthai-core... I believe two PRs might be a good approach. PR #384 is the first that enables Unicode paths in API signatures, if/ofstream, and spdlog. A second later PR will enable Unicode from environment variables...which naturally needs the first PR's fixes. The os manages the storage of the different char sets (ascii, utf-8, utf-16, etc.). When an app needs the env var value, it calls an os API. On Windows...it needs to call the UTF-16 wide character APIs for environment variables to get Unicode text. A Windows customer in a country/language who's glyphs do not fit in ASCII, e.g. Japan, China, UAE, Germany, etc. stores a filename or path, using their native language, into env vars that depthai reads. // runtime fail: underlying utility calls ascii OS api, Windows provides best-fit value, the path is corrupt/unusable
auto fwBinaryPath = utility::getEnv("DEPTHAI_DEVICE_BINARY");
// compile fail: underlying utility calls the ascii OS api which returns a std::string, it should return a Unicode std::wstring
std::wstring fwBinaryPath = utility::getEnv("DEPTHAI_DEVICE_BINARY"); I believe the utility::getEnv() call chain should always return wstring on Windows. Making the caller responsible for managing any unicode/ascii/number convert. After the getEnv() is fixed, then this is possible... // works: the getEnv() code has been updated to return wstring
auto fwBinaryPath = utility::getEnv("DEPTHAI_DEVICE_BINARY");
// works: the getEnv() code has been updated to return wstring and dai::Path can be directly constructed with it
dai::Path fwBinaryPath = utility::getEnv("DEPTHAI_DEVICE_BINARY"); There ?may? be some cascading work to ensure things that parse the env var wstring are not assuming Quick look at spdlog's code, they are partially aware of this issue. I see macro defines like |
Thanks @diablodale |
All non-ascii paths on OS which are not natively UTF-8 are doomed to fail.
For example,
c:/MeinDeutschPäth
is likely to fail on Windows due to the umlaut.Non-latin characters, multibyte glyphs, etc. will all fail on operating systems that do not natively default to utf-8.
One example at
depthai-core/src/pipeline/node/NeuralNetwork.cpp
Lines 26 to 27 in 0caf71c
Likely the cause of issue #294 because Chinese glyphs are definitely not the limited ascii.
Setup
Repro
Found while writing my app and then code review. A
std::string
is not a utf-8 string. And it will fail on Windows for any path that is non-ascii. The best fix is to use<filesystem>
which is available in C++17, or<experimental::filesystem>
in most C++14 toolchains. The STL library will manage going to/from utf-8 with clearly defined params/funcs. For example, it will manage the differences between native utf-8 on Linux vs. wchar_t on Windows.If one doesn't use the STL, then one needs to write supporting functions for Windows (perhaps others) to convert between Windows wchar_t UTF-16 and what appears to be the desire of
AssetManager
to have a utf-8 path. There are clear apis in the STL to manually convert between UTF-16 and UTF-8.I see many locations throughout
depthai-core
which have this bug. For exampleAssetManager::set(const std::string& key, const std::string& path, ...
Also locations using pointers to char like
const char* pathToBootloader
which are insufficient for native UTF-16 usage on Windows. They will have the same path bugsFix
It is not hard to use the STL cvt approaches rather than "filesystem". Pseudocode...
Workarounds
The text was updated successfully, but these errors were encountered: