Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

truncation of constant value in to_cbor() #1286

Closed
oktonion opened this issue Oct 9, 2018 · 14 comments
Closed

truncation of constant value in to_cbor() #1286

oktonion opened this issue Oct 9, 2018 · 14 comments
Assignees
Labels
aspect: binary formats BSON, CBOR, MessagePack, UBJSON release item: 🐛 bug fix solution: proposed fix a fix for the issue has been proposed and waits for confirmation
Milestone

Comments

@oktonion
Copy link

oktonion commented Oct 9, 2018

Although documentation for static void to_cbor(const basic_json& j, detail::output_adapter<char> o) says nothing I assumed that it has some purpose but f.e. following sample:

	  ::nlohmann::json jj;
	  std::vector<char> test_cbor_data;
	  ::nlohmann::json::to_cbor(jj, test_cbor_data);

generates lots of warnings like

Warning C4309 'static_cast': truncation of constant value testapp json.hpp 7731

in class binary_writer
at lines like:

oa->write_character(static_cast<CharType>(0xF6));

or

oa->write_character(j.m_value.boolean
? static_cast<CharType>(0xF5)
: static_cast<CharType>(0xF4));
break;

Basically the cause for that is static_cast<CharType>(/*some integer value > 127*/) with CharType = char leads to loss of integer value (sizeof(char) is 1 byte and it is signed, so value range is [-128;127] and it can't fit integer number > 127).

I don't know if it is an issue but that code smells.

@nlohmann
Copy link
Owner

nlohmann commented Oct 9, 2018

(The function's documentation should be similar to this one.)

@oktonion
Copy link
Author

oktonion commented Oct 9, 2018

@nlohmann Ok, so it is not dead code. Then there is a problem. If user calls binary_writer<char>::to_something directly or indirectly like in json::to_cbor then integer values over 127 are being lost.

@nlohmann nlohmann added the aspect: binary formats BSON, CBOR, MessagePack, UBJSON label Oct 9, 2018
@nlohmann
Copy link
Owner

What do you think about using the approach of https://stackoverflow.com/a/15172304/266378 and replace all the static_cast<CharType> calls by calls to convert:

CharType convert(std::uint8_t x)
{
    return *reinterpret_cast<CharType *>(&x);
}

?

@nlohmann nlohmann added state: please discuss please discuss the issue or vote for your favorite option state: help needed the issue needs help to proceed labels Oct 17, 2018
@oktonion
Copy link
Author

oktonion commented Oct 17, 2018

@nlohmann what about CharType != signed/regular/unsigned char? reinterpret_cast to char* and following dereference of char* for any pointer is safe by standard. But reinterpret_cast of pointer to some CharType = UserDefiendType? The cast itself is UB.

{
	int i;
	int *ip = &i;
	char *cp = reinterpret_cast<char*>(ip); // safe
	*cp; // safe

	struct MyUserType{};

	MyUserType mut;
	MyUserType *mutp = &mut;
	cp = reinterpret_cast<char*>(mutp); // safe
	*cp; // safe

	// DANGER (undefined behaviour):
	mutp = reinterpret_cast<MyUserType*>(cp); // UB
	*mutp; // UB
	mutp = reinterpret_cast<MyUserType*>(ip); // UB
	*mutp; // UB
}

So the function should be like:

std::enable_if<
	std::is_same<CharType, char>::value ||
	std::is_same<CharType, unsigned char>::value ||
	std::is_same<CharType, signed char>::value, 
	CharType
>::type convert(std::uint8_t x)
{
    return *reinterpret_cast<CharType *>(&x);
}

or restrict CharType to be only one of

char, unsigned char, signed char

in the class itself.

@nlohmann
Copy link
Owner

For the input adapters, we require that CharType is integral and has a sizeof 1. Does this answer your question?

@oktonion
Copy link
Author

For the input adapters, we require that CharType is integral and has a sizeof 1. Does this answer your question?

C++03 5.2.10.7:
A pointer to an object can be explicitly converted to a pointer to an object of different type.
Except that converting an rvalue of type “pointer to T1” to the type “pointer to T2” (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value, the result of such a pointer conversion is unspecified.

C++03 3.10.15:
If a program attempts to access the stored value of an object through an lvalue of other than one of the following types the behavior is undefined
— the dynamic type of the object,
— a cv-qualified version of the dynamic type of the object,
— a type similar (as defined in 4.4) to the dynamic type of the object,
— a type that is the signed or unsigned type corresponding to the dynamic type of the object,
— a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union),
— a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
— a char or unsigned char type.

So in general cast to any pointer except of char*/unsigned char* with dereferencing will cause UB.

@nlohmann
Copy link
Owner

Hm. What do you propose?

@oktonion
Copy link
Author

oktonion commented Oct 17, 2018

This seems like the solution tho:

std::enable_if<
    std::is_signed<CharType>::value && std::is_signed<char*>::value, 
    CharType 
>::type convert(std::uint8_t x) nothrow
{
    return *reinterpret_cast<char*>(&x);
}

std::enable_if<
    std::is_signed<CharType>::value && std::is_unsigned<char*>::value, 
    CharType 
>::type convert(std::uint8_t x) nothrow
{
    using namespace std;
    static_assert(sizeof(std::uint8_t) == sizeof(CharType), "size of CharType must be equal to std::uint8_t");
    static_assert(is_pod<CharType>::value, "CharType must be POD");
    CharType result;
    memcpy(&result, &x, sizeof(x));
    return result;
}

std::enable_if<
    std::is_unsigned<CharType>::value, 
    CharType 
>::type convert(std::uint8_t x) nothrow
{
    return x;
}

may be add some constexpr.

@nlohmann
Copy link
Owner

I had to fiddle a bit to make it compile:

template < typename C = CharType,
enable_if_t < std::is_signed<C>::value and std::is_signed<char>::value > * = nullptr >
static constexpr CharType convert(std::uint8_t x) noexcept
{
    return *reinterpret_cast<char*>(&x);
}

template < typename C = CharType,
enable_if_t < std::is_signed<C>::value and std::is_unsigned<char>::value > * = nullptr >
static CharType convert(std::uint8_t x) noexcept
{
    static_assert(sizeof(std::uint8_t) == sizeof(CharType), "size of CharType must be equal to std::uint8_t");
    static_assert(std::is_pod<CharType>::value, "CharType must be POD");
    CharType result;
    std::memcpy(&result, &x, sizeof(x));
    return result;
}

template<typename C = CharType,
enable_if_t<std::is_unsigned<C>::value>* = nullptr>
static constexpr CharType convert(std::uint8_t x) noexcept
{
    return x;
}

Is this what you mean?

@oktonion
Copy link
Author

oktonion commented Oct 19, 2018

Yeah, sorry for my code snippet, was writing on the fly.

  1. reinterpret_cast for signed char and signed CharType
  2. memcpy of bits for unsigned char and signed CharType
  3. regular convert for unsigned CharType

so

  • for signed-signed conversion will be just elementary copy with cast to signed char pointer type and dereferencing (which is ok by standard),

  • for signed-unsigned conversion will be exact copy of bits contained with memcpy (no choice there, sorry)

and

  • for unsigned-unsigned conversion will be just elementary copy

There is some issue that I do not know if you are taking in consideration in that project: bit count in byte.
std::uint8_t contains exactly 8 bits and will not be implemented on platforms with count of bits in byte > 8. In reality there are platforms with bit count in byte > 8 (16, 9, Windows CE with 16bits, etc.) so std::uint8_t is absent there.

@nlohmann
Copy link
Owner

@oktonion Could you have a look at branch https://github.com/nlohmann/json/compare/feature/convert_char?

For the non-8-bit-char platforms, I do not have an opinion right now.

@nlohmann nlohmann added solution: proposed fix a fix for the issue has been proposed and waits for confirmation and removed state: help needed the issue needs help to proceed state: please discuss please discuss the issue or vote for your favorite option labels Oct 23, 2018
@jaredgrubb
Copy link
Contributor

IMO, we should not worry about esoteric platforms that are not tested in our CI. We cannot verify that the unit tests even pass there, or even if they happen to, that we don't break it accidentally down the road. Platforms where "byte!=octet" (such that uint8_t is not present) are extremely rare and we should not pessimize our implementation in fear we might be breaking them when we don't even test them to start with.

@oktonion
Copy link
Author

@jaredgrubb Good point but as it states in the README of the projects "What if JSON was part of modern C++?", so I assume that if this library aims to be part of standard lib then it should work on every platform possible (even with > 8 bits in byte?).
Anyway I'm not starting some holly wars about this point. If library supports every platform possible by standard then using uint8_t with no other option is bad design. If it is not the case then some compiler error message like

#include <climits>
#if CHAR_BIT != 8
#error "JSON library is not implemented for this platform"
#endif

would be nice in my opinion.

@oktonion
Copy link
Author

@nlohmann also I want to thank you and the contributors for this excellent library. And I could say that it could be implemented in pure C++ 98 so I've already ported like 90% of codebase to old standard and it works like charm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aspect: binary formats BSON, CBOR, MessagePack, UBJSON release item: 🐛 bug fix solution: proposed fix a fix for the issue has been proposed and waits for confirmation
Projects
None yet
Development

No branches or pull requests

3 participants