Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should support uint8_t for std::vector and std::array #243

Closed
jpetso opened this issue Mar 17, 2015 · 7 comments
Closed

Should support uint8_t for std::vector and std::array #243

jpetso opened this issue Mar 17, 2015 · 7 comments

Comments

@jpetso
Copy link
Contributor

jpetso commented Mar 17, 2015

Release 1.0.0 supports adapters for std::vector<char> and std::array<char>, however binary data is often represented as uint8_t (or std::uint8_t in C++11). char is defined as implementation-specific signed or unsigned integer type - see http://en.wikipedia.org/wiki/C_data_types - so while it's a single byte pretty much everywhere, the specification doesn't actually guarantee this. uint8_t and int8_t are specified more clearly, and thus a better choice for binary data.

So while I think it's not much of a problem to keep the char-templated adapters in msgpack-c, I think the project should endorse uint8_t as a better default and ship adapters for that type as well. Since msgpack-c doesn't have to choose between types, I suggest also including one for int8_t so that all three major types are covered.

@redboltz
Copy link
Contributor

Thank you for sending the issue. I considered the issue.

First, I'd like to let you know my design choice. There are two reasons that I mapped std::vector and std::array to BIN. The first reason is that the containers located on continuous memory. The second reason is that char is used as representing byte stream in the C++(11) standard library. For example, http://www.cplusplus.com/reference/ostream/basic_ostream/write/ uses char* even if C++11.

The next topic is uint8_t. uint8_t is considered as an appropriate type to represent binary data.
http://stackoverflow.com/questions/21296997/is-it-better-to-use-uint8-t-or-char-to-represent-binary-data-arranged-in-bytes

I think that mapping vector and array of uint8_t to BIN is reasonable. There is minor problem. std::uint8_t is an alias type of uint8_t. It's ok. uint8_t might be an alias type of unsigned char. In this case, std::vector is mapped to BIN implicitly.
See the following example:
http://melpon.org/wandbox/permlink/lvmNdG67nCgoQMr3

When I provide two overloads like http://melpon.org/wandbox/permlink/MskKRI45qbfiMrmI , it is ill-formed.

To solve this problem, using SFINAE as follows?
http://melpon.org/wandbox/permlink/xl8mvcVwTpspvSEF

msgpack conversion from msgpack::object to a user type using operator>>.

inline msgpack::object const& operator>> (msgpack::object const& o, std::vector<char>& v)

So I need to write as follows:
http://melpon.org/wandbox/permlink/nMf89e6cVE0VmsFi

The final topic is user's expectation. As I wrote before, char is ok becase it is the same mannar as C++ standard library. uint8_t and unsigned char is also ok. It is implementation defined whether they are the same type or not. But if one is mapped to BIN, the other should be too.

C++ type(vector and array) msgpack type
char BIN
unsigned char BIN
uint8_t BIN
int8_t BIN or ARRAY
signed char BIN or ARRAY

How about int8_t and singed char? msgpack user might expect that they are mapped to ARRAY that have -128..127 values. To choose the mapping targets, we can use msgpack::type::raw_ref.

#include <msgpack.hpp>
#include <vector>
#include <sstream>

int main() {
    {
        std::vector<int8_t> v { -1, 1, 2, 3 };
        std::stringstream ss;
        msgpack::pack(ss, v);
        auto obj = msgpack::unpack(ss.str().data(), ss.str().size()).get();
        assert(obj.type == msgpack::type::ARRAY);
    }
    {
        std::vector<int8_t> v { -1, 1, 2, 3 };
        std::stringstream ss;
        msgpack::pack(ss, msgpack::type::raw_ref(reinterpret_cast<char*>(v.data()), v.size()));
        auto obj = msgpack::unpack(ss.str().data(), ss.str().size()).get();
        assert(obj.type == msgpack::type::BIN);
    }
}

[NOTE]Discussion about reinterpret_cast:
http://stackoverflow.com/questions/16260033/reinterpret-cast-between-char-and-stduint8-t-safe

It is difficult to judge which is appropriate behavior because users could have different expectation. However, if they are mapped to ARRAY, the users can choose ARRAY or BIN using raw_ref wrapper. I can say the same about uint8_t and unsigned char. I think that they are more strongly expected to map to BIN than int8_t and signed char.
Any ideas?

@jpetso
Copy link
Contributor Author

jpetso commented Mar 26, 2015

Kudos to your thoroughness in analyzing the different aspects of this issue. I agree with all of your assessments, and the SFINAE solution seems like a good one.

I wasn't aware of the raw_ref approach before, knowing about it definitely makes it easier to choose. I feel that BIN would be a good default for unsigned char and uint8_t, after reading your analysis I think that maybe the signed variants should remain as ARRAY for the reasons that you listed. Maybe an array_ref (analogous to raw_ref) could be provided for users who specifically want to store char/uchar/uint8_t as lists?

@redboltz
Copy link
Contributor

redboltz commented Apr 3, 2015

Sorry for the late reply. I had worked for #249.
I think that the array_ref is a good idea. I think that mapping uint8_t to BIN is reasonable too. On the C++11 the casting code should be as follows:

static_assert(std::is_same<std::uint8_t, unsigned char>::value || \
"This library requires std::uint8_t to be implemented as unsigned char.");

http://stackoverflow.com/questions/16260033/reinterpret-cast-between-char-and-stduint8-t-safe

We need to do the similar check on the C++03. What is the good way to implement that?

In the raw_ref, those kinds of casting problem is users responsibility ;)
raw_ref accepts 'const char *' as an argument. Users need to convert to 'const char *' in a safe way.

    raw_ref(const char* p, uint32_t s) : size(s), ptr(p) {}

@redboltz
Copy link
Contributor

redboltz commented Apr 3, 2015

We need to do the similar check on the C++03. What is the good way to implement that?

BOOST_STATIC_ASSERT is a good one.
http://www.boost.org/doc/libs/1_57_0/doc/html/boost_staticassert.html

But some of msgpack-c users don't want to external dependency. The implementation is pretty tricky https://github.com/boostorg/static_assert/blob/master/include/boost/static_assert.hpp

@jpetso
Copy link
Contributor Author

jpetso commented Aug 28, 2015

This approach appears compelling for a compact pre-C++11 static assert: http://stackoverflow.com/a/14621998

@redboltz
Copy link
Contributor

Thank you for the comment. It looks good for me. I will restart implementing the PR to solve the issue.

@redboltz
Copy link
Contributor

I just restarted implementing. And then I realized that all I need to do is implement std::vector<unsigned char> and std::array<unsigned char> specialization. I mean I don't need to implement std::vector<uint8_t> and std::array<uint8_t> specialization. Because uint8_t specialization should only be allowed if is_same<uint8_t, unsigned char>::value is true. In this case, uint8_t is a typedef of unsigned char. So std::vector<uint8_t> is dispatched to std::vector<unsigned char>. If uint8_t is the different type of unsigned char, then uint8_t is dispatched to std::vector<T>. That is expected behavior. So I don't need static_assert.

I will implement std::vector<unsigned char> and std::array<unsigned char> specialization, and array_ref adaptor.

redboltz added a commit to redboltz/msgpack-c that referenced this issue Aug 31, 2015
std::vector<unsigned char> and std::array<unsigned char> are mapped to BIN.
std::vector<uint8_t> and std::array<uint8_t> are mapped to BIN if uint8_t is the same type of unsigned char, otherwise mapped to ARRAY.

Added array_ref. When client wraps BIN mapped types above with array_ref as msgpack::type::array_ref<std::vector<char> >, the type is mapped to ARRAY.
redboltz added a commit to redboltz/msgpack-c that referenced this issue Aug 31, 2015
std::vector<unsigned char> and std::array<unsigned char> are mapped to BIN.
std::vector<uint8_t> and std::array<uint8_t> are mapped to BIN if uint8_t is the same type of unsigned char, otherwise mapped to ARRAY.

Added array_ref. When client wraps BIN mapped types above with array_ref as msgpack::type::array_ref<std::vector<char> >, the type is mapped to ARRAY.
nobu-k added a commit that referenced this issue Sep 4, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants