Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deserialzed string has incorrect length #5401

Closed
ejoebstl opened this issue Nov 27, 2018 · 7 comments
Closed

Deserialzed string has incorrect length #5401

ejoebstl opened this issue Nov 27, 2018 · 7 comments
Assignees

Comments

@ejoebstl
Copy link

ejoebstl commented Nov 27, 2018

Environment
Version: libprotoc 3.6.1
Language: C++ 17
OS: Linux archlinux 4.18.9-arch1-1-ARCH
Compiler: g++ (GCC) 8.2.1 20180831

What did you do?
Using the following proto layout:

syntax = "proto2";
package ast_loader;

message StringHolder {
   optional string Text = 1;
   optional int32 A = 2 [default = 0];
   optional int32 B = 3 [default = 0];
}

Executing the following minimal example:

#include "proto/data_format.pb.h"
#include <fstream>
#include <iostream>

int main() {

    std::ifstream stream("data.out", std::ios::binary);
    ast_loader::StringHolder data;
    data.ParseFromIstream(&stream);
    stream.close();

    std::cout << data.text() << std::endl;
}

On the following example data:

 0A-04-54-65-73-74-10-78-18-78-0a

Which is Text = "Test", A = 120, B = 120.

What did you expect to see

Test

What did you see instead?

Testoadere�c7�7V1P�7V|�u�7V

The string length was set to 48.

PrintDebugString() seems to give an appropriate output:

Text: "Test"
A: 120
B: 120
@acozzette
Copy link
Member

@ejoebstl Have you checked that ParseFromIstream() is returning true? It could be returning false to indicate an error.

@ejoebstl
Copy link
Author

@acozzette Thanks for the input. Yes, it is returning true.

This is such a minimal example, I'm puzzled why it goes wrong in the first place.

@acozzette
Copy link
Member

That is strange, I just tried out your example and couldn't reproduce the problem. When I run it I just get Test as expected. Could you perhaps try running your example in a memory checker like ASAN or Valgrind and see if that reports any problems?

@ejoebstl
Copy link
Author

When running with Valgrind, I can see a memory error when accessing string::size():

==6292== Invalid read of size 8
==6292==    at 0x1220A4: std::string::size() const (basic_string.h:3827)
==6292==    by 0x11DEBC: main (main.cpp:51)
==6292==  Address 0x1071f9c8 is 8 bytes before a block of size 32 alloc'd
==6292==    at 0x4837DEF: operator new(unsigned long) (vg_replace_malloc.c:334)
==6292==    by 0x127F4A: google::protobuf::internal::ArenaStringPtr::CreateInstanceNoArena(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const*) (arenastring.h:377)
==6292==    by 0x127B49: google::protobuf::internal::ArenaStringPtr::MutableNoArena(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const*) (arenastring.h:286)
==6292==    by 0x1287BE: ast_loader::StringHolder::mutable_text[abi:cxx11]() (data_format.pb.h:259)
==6292==    by 0x126661: ast_loader::StringHolder::MergePartialFromCodedStream(google::protobuf::io::CodedInputStream*) (data_format.pb.cc:210)
==6292==    by 0x586F17C: google::protobuf::MessageLite::ParseFromCodedStream(google::protobuf::io::CodedInputStream*) (in /usr/lib/libprotobuf.so.17.0.0)
==6292==    by 0x586F2F4: google::protobuf::MessageLite::ParseFromZeroCopyStream(google::protobuf::io::ZeroCopyInputStream*) (in /usr/lib/libprotobuf.so.17.0.0)
==6292==    by 0x591A8BB: google::protobuf::Message::ParseFromIstream(std::istream*) (in /usr/lib/libprotobuf.so.17.0.0)
==6292==    by 0x11DE61: main (main.cpp:47)

@acozzette
Copy link
Member

@ejoebstl Oh, I suspect it's an ABI issue. I believe the std::string ABI has changed between C++11 and C++17, and I'm guessing the problem is coming from mixing C++11-compiled code (i.e. libprotobuf) and C++17-compiled code (your main function and maybe data_format.pb.cc). To solve that problem you just have to pick one C++ version and use it consistently. The protobuf build uses C++11 by default, but you could easily tweak the flags to build it for C++17 instead.

@ejoebstl
Copy link
Author

Thank you a lot for your help so far.

I still face the issue when using C++11. Since you can not reproduce the problem, I'm assuming there is something wrong with my setup or the build environment I am using. I will investigate and come back as soon as I find the issue.

@ejoebstl
Copy link
Author

ejoebstl commented Nov 29, 2018

I had a reference to libtorch in my CMakeLists, which is linked to a different protobuf version. Removing the reference to libtorch and re-building solved the issue.

It might be important to note that libprotobuf's include and library paths were correctly set to my local protobuf all the time.

What puzzels me is that there was no compiler or linker error, which I would have expected on a library/symbol mismatch.

I'm still not sure why this happened, but now that we know it's not a problem of libprotobuf, I'm going to close this and discuss a solution with the developers of pytorch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants