Skip to content

Commit

Permalink
🌪The null byte is no longer being sucked up!
Browse files Browse the repository at this point in the history
- Add a pass_handler type, since that was actually supposed to be there!
- Compiles better on GCC now!
- Assumes the right Wide UTF encoding!
  • Loading branch information
ThePhD committed Feb 22, 2021
1 parent b85fb76 commit 472a880
Show file tree
Hide file tree
Showing 21 changed files with 136 additions and 91 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,4 @@ Please considering sponsoring the work via [any of the available means listed ne

# Documentation

The documentation can be found in full on [https://rtfd.io/text](https://rtfd.io/text)!
The documentation can be found in full on [https://ztdtext.rtfd.io/](https://ztdtext.rtfd.io/)!
37 changes: 37 additions & 0 deletions documentation/source/api/error handlers/pass_handler.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
.. =============================================================================
..
.. ztd.text
.. Copyright © 2021 JeanHeyd "ThePhD" Meneide and Shepherd's Oasis, LLC
.. Contact: opensource@soasis.org
..
.. Commercial License Usage
.. Licensees holding valid commercial ztd.text licenses may use this file in
.. accordance with the commercial license agreement provided with the
.. Software or, alternatively, in accordance with the terms contained in
.. a written agreement between you and Shepherd's Oasis, LLC.
.. For licensing terms and conditions see your agreement. For
.. further information contact opensource@soasis.org.
..
.. Apache License Version 2 Usage
.. Alternatively, this file may be used under the terms of Apache License
.. Version 2.0 (the "License") for non-commercial use; you may not use this
.. file except in compliance with the License. You may obtain a copy of the
.. License at
..
.. http:..www.apache.org/licenses/LICENSE-2.0
..
.. Unless required by applicable law or agreed to in writing, software
.. distributed under the License is distributed on an "AS IS" BASIS,
.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
.. See the License for the specific language governing permissions and
.. limitations under the License.
..
.. =============================================================================>
pass_handler
============

The ``pass_handler`` does exactly what its name implies: it passes the error as generated by the encoding object through without touching it. Unlike :doc:`ztd::text::assume_valid_handler </api/error handlers/assume_valid_handler>`, this one does not invoke undefined behavior because it does not meet the :doc:`ztd::text::is_ignorable_error_handler </api/is_ignorable_error_handler>` traits.

.. doxygenclass:: ztd::text::pass_handler
:members:
2 changes: 1 addition & 1 deletion documentation/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@
# .htaccess) here, relative to this directory. These files are copied
# directly to the root of the documentation.
#
html_extra_path = ["resources"]
html_extra_path = []

# Text that is pre-pended to every built file. Useful for global substitution patterns.
rst_prolog = """
Expand Down
1 change: 1 addition & 0 deletions documentation/source/design/error handling.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ They can change the conversion and other operations happen works. Consider, for
Clearly, the Korean characters present in the UTF-8 string just cannot fit in a strict, 7-bit ASCII encoding. What, then, becomes the printed output from ``std::cout`` at ``// (2)``? The answer is two ASCII question marks, ``??``. The :doc:`ztd::text::replacement_handler </api/error handlers/replacement_handler>` object passed in at ``// (1)`` substitutes replacement characters (zero or more) into the output for any failed operation. There are multiple kinds of error handlers with varying behaviors:

- :doc:`replacement_handler </api/error handlers/default_handler>`, which inserts a substitution character specified by either the encoding object or some form using the default replacement character ``"U+FFFD"``;
- :doc:`pass_handler </api/error handlers/pass_handler>`, which simply returns the error result as it and, if there is an error, halts higher-level operations from proceeding forward;
- :doc:`default_handler </api/error handlers/default_handler>`, which is just a name for the ``replacement_handler`` or ``throw_handler`` or some other type based on compile-time configuration of the library;
- :doc:`throw_handler </api/error handlers/throw_handler>`, for throwing an exception on any failed operation;
- :doc:`incomplete_handler </api/error handlers/incomplete_handler>`, for throwing an exception on any failed encode/decode operation; and,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ We can see that creating a state with a default constructor no longer works, bec

.. code-block:: cpp
:linenos:
:emphasize-lines: 36-39,46-49
:emphasize-lines: 7-11,18-22
class type_erased_encoding {
// from above, etc. …
Expand Down
2 changes: 1 addition & 1 deletion documentation/source/known unicode encodings.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
Known Unicode Encodings
=======================

Out of all the encodings listed on the :doc:`encodings page <>`, only a handful are known to be Unicode Encodings. These are as follows:
Out of all the encodings listed on the :doc:`encodings page </encodings>`, only a handful are known to be Unicode Encodings. These are as follows:

- UTF-7
- UTF-7-IMAP
Expand Down
1 change: 0 additions & 1 deletion examples/basic/source/encoding_type_erasure.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,6 @@
#include <ztd/text/encode.hpp>
#include <ztd/text/decode.hpp>
#include <ztd/text/transcode.hpp>
#include <ztd/text/c_string_view.hpp>

#include <iostream>
#include <fstream>
Expand Down
4 changes: 2 additions & 2 deletions include/ztd/text/any_encoding.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -491,12 +491,12 @@ namespace ztd { namespace text {
__raw_result.error_code, __raw_result.handled_error);
}

virtual std::unique_ptr<__erased_state> __create_encode_state() const {
virtual std::unique_ptr<__erased_state> __create_encode_state() const override {
auto& __encoding = this->__base_t::get_value();
return std::make_unique<__typed_state<__real_encode_state>>(make_encode_state(__encoding));
}

virtual std::unique_ptr<__erased_state> __create_decode_state() const {
virtual std::unique_ptr<__erased_state> __create_decode_state() const override {
auto& __encoding = this->__base_t::get_value();
return std::make_unique<__typed_state<__real_decode_state>>(make_decode_state(__encoding));
}
Expand Down
6 changes: 6 additions & 0 deletions include/ztd/text/error_handler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,12 @@ namespace ztd { namespace text {
//////
class assume_valid_handler : public __detail::__pass_through_handler_with<true> { };

//////
/// @brief An error handler that tells an encoding that it will pass through any errors, without doing any
/// adjustment, correction or checking. Does not imply it is ignorable, unlike ztd::text::assume_valid_handler which can invoke UB if an error occurs.
//////
class pass_handler : public __detail::__pass_through_handler_with<false> { };

//////
/// @brief An error handler that replaces bad code points and code units with a chosen code point / code unit
/// sequence.
Expand Down
2 changes: 2 additions & 0 deletions include/ztd/text/execution.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -622,6 +622,8 @@ namespace ztd { namespace text {
}
case static_cast<::std::size_t>(0):
// 0 means null character; ok
__detail::__dereference(__outit) = __intermediary_output[0];
__outit = __detail::__next(__outit);
return _Result(__detail::__reconstruct(::std::in_place_type<_UInputRange>, __init, __inlast),
__detail::__reconstruct(::std::in_place_type<_UOutputRange>, __outit, __outlast), __s,
encoding_error::ok);
Expand Down
3 changes: 2 additions & 1 deletion include/ztd/text/utf32.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,8 @@ namespace ztd { namespace text {
__init = __detail::__next(__init);

if constexpr (__validate_code_units && __call_error_handler) {
if (__unit > __detail::__last_code_point || __detail::__is_surrogate(__unit)) {
if (static_cast<char32_t>(__unit) > __detail::__last_code_point
|| __detail::__is_surrogate(static_cast<char32_t>(__unit))) {
__self_t __self {};
return __error_handler(__self,
_Result(
Expand Down
2 changes: 1 addition & 1 deletion include/ztd/text/version.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -467,7 +467,7 @@
#define ZTD_TEXT_COMPILE_TIME_WIDE_ENCODING_NAME_I_ ZTD_TEXT_DEFAULT_ON
#elif (WCHAR_MAX > 0x001FFFFF) && ZTD_TEXT_IS_ON(ZTD_TEXT_WCHAR_T_UTF32_COMPATIBLE_I_)
#define ZTD_TEXT_COMPILE_TIME_WIDE_ENCODING_NAME_GET_I_() "UTF-32"
#define ZTD_TEXT_COMPILE_TIME_WIDE_ENCODING_NAME_I_ ZTD_TEXT_DEFAULT_OFF
#define ZTD_TEXT_COMPILE_TIME_WIDE_ENCODING_NAME_I_ ZTD_TEXT_DEFAULT_ON
#else
#define ZTD_TEXT_COMPILE_TIME_WIDE_ENCODING_NAME_GET_I_() "UTF-32"
#define ZTD_TEXT_COMPILE_TIME_WIDE_ENCODING_NAME_I_ ZTD_TEXT_DEFAULT_OFF
Expand Down
42 changes: 22 additions & 20 deletions include/ztd/text/wide_execution.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ namespace ztd { namespace text {
// different states, optionally...
__wide_encode_state() noexcept : __wide_state(), __narrow_state() {
wchar_t __ghost_space[2];
::std::size_t __init_result = ::std::mbrtowc(__ghost_space, "", 1, &__narrow_state);
::std::size_t __init_result = ::std::mbrtowc(__ghost_space, "", 1, &__wide_state);
// make sure it is initialized
ZTD_TEXT_ASSERT_I_(__init_result == 0 && __ghost_space[0] == L'\0');
ZTD_TEXT_ASSERT_I_(::std::mbsinit(&__wide_state) != 0);
Expand Down Expand Up @@ -400,7 +400,7 @@ namespace ztd { namespace text {
const code_unit& __unit = __units[__units_count];
++__units_count;
__init = __detail::__next(__init);
#ifdef _MSC_VER
#if ZTD_TEXT_IS_ON(ZTD_TEXT_LIBVCXX_I_)
::std::size_t __res;
errno_t __err = wcrtomb_s(::std::addressof(__res), __pray_for_state, __state_max, __unit,
::std::addressof(__s.__wide_state));
Expand Down Expand Up @@ -436,26 +436,28 @@ namespace ztd { namespace text {
::ztd::text::span<code_unit>(::std::addressof(__units[0]), __units_count));
}
}
else if (__res == 0 && ::std::mbsinit(::std::addressof(__s.__wide_state)) == 0) {
// mixed conversion potential?!
// technically, not standard behavior, but I don't really care?
// Mr. Steve Downey points out I'm slightly right
// about my assumption here: C has an open DR for this
// (DR 488, http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2059.htm#dr_488)
// Another DR, DR 499 (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2059.htm#dr_498)
// addresses thread safety issues, both should be
// solved is this is to be anywhere near usable
if constexpr (__call_error_handler) {
if (__init == __inlast) {
wide_execution __self {};
return __error_handler(__self,
_Result(::std::forward<_InputRange>(__input),
::std::forward<_OutputRange>(__output), __s,
encoding_error::incomplete_sequence),
::ztd::text::span<code_unit>(::std::addressof(__units[0]), __units_count));
else if (__res == 0) {
if (::std::mbsinit(::std::addressof(__s.__wide_state)) == 0) {
// mixed conversion potential?!
// technically, not standard behavior, but I don't really care?
// Mr. Steve Downey points out I'm slightly right
// about my assumption here: C has an open DR for this
// (DR 488, http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2059.htm#dr_488)
// Another DR, DR 499 (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2059.htm#dr_498)
// addresses thread safety issues, both should be
// solved is this is to be anywhere near usable
if constexpr (__call_error_handler) {
if (__init == __inlast) {
wide_execution __self {};
return __error_handler(__self,
_Result(::std::forward<_InputRange>(__input),
::std::forward<_OutputRange>(__output), __s,
encoding_error::incomplete_sequence),
::ztd::text::span<code_unit>(::std::addressof(__units[0]), __units_count));
}
}
continue;
}
continue;
}

__state_count += __res;
Expand Down

0 comments on commit 472a880

Please sign in to comment.