forked from pytorch/pytorch
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Optionally ignore utf-8 decoding error when converting std::string to…
… python str. (pytorch#97282) Summary: Pull Request resolved: pytorch#97282 X-link: pytorch/text#2126 When language models use c++ tokenizer, outputs are a c++ strings that are not necessarily valid utf-8 encodings. Default pybind11 casting uses strict utf-8 decoding. We relax the decoding using 'ignore' argument. Test Plan: https://www.internalfb.com/intern/testinfra/testrun/4503599786612705 Reviewed By: Nayef211 Differential Revision: D43970697 fbshipit-source-id: 262b3e9165e50d893a72f162705956102f1143bc
- Loading branch information
1 parent
aa3a57b
commit 96ce7aa
Showing
5 changed files
with
59 additions
and
23 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
#include <torch/csrc/jit/python/utf8_decoding_ignore.h> | ||
|
||
namespace torch::jit { | ||
|
||
namespace { | ||
thread_local bool kIgnore = false; | ||
} | ||
|
||
void setUTF8DecodingIgnore(bool o) { | ||
kIgnore = o; | ||
} | ||
bool getUTF8DecodingIgnore() { | ||
return kIgnore; | ||
} | ||
|
||
} // namespace torch::jit |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
#pragma once | ||
#include <torch/csrc/Export.h> | ||
namespace torch { | ||
namespace jit { | ||
TORCH_API void setUTF8DecodingIgnore(bool o); | ||
TORCH_API bool getUTF8DecodingIgnore(); | ||
} // namespace jit | ||
} // namespace torch |