CTF parser and CTF chunkdataset #3

thiagocrepaldi · 2019-01-03T17:26:28Z

Due to differences between latest dataloader.cpp and the one at jaliyae/master, I had to hand pick the patches.
I have left only the code only the code relevant to CTF parsing and chunking, removing improvements/fixes from chunk feature implementation

test/cpp/api/dataloader.cpp

torch/csrc/api/include/torch/data/ctf/ctf_chunk_dataset.h

torch/csrc/api/include/torch/data/ctf/reader.h

torch/csrc/api/src/data/ctf/reader.cpp

torch/csrc/api/include/torch/data/ctf/ctf_parser.h

jaliyae

Please have a look at the comments. Let's go through the parser once you are back.

torch/csrc/api/include/torch/data/ctf/ctf_parser.h

torch/csrc/api/include/torch/data/ctf/reader.h

torch/csrc/api/include/torch/data/ctf/ctf_parser.h

torch/csrc/api/include/torch/data/ctf/reader.h

xzhu1900 · 2019-01-26T02:43:32Z

torch/csrc/api/include/torch/data/ctf/ctf_parser.h

+      const std::vector<CTFInputStreamInformation>& input_streams_info)
+      : data_type(data_type), input_streams_info(input_streams_info) {}
+
+  bool operator==(const CTFDataset<DataType>& rhs) const {


Consider move this function to the test file as it is only used for unit test.

torch/csrc/api/include/torch/data/ctf/ctf_parser.h

Summary: Currently there is a mismatch in naming between Python BatchNorm `running_var` and C++ BatchNorm `running_variance`, which causes JIT model parameters loading to fail (pytorch/vision#728 (comment)): ``` terminate called after throwing an instance of 'c10::Error' what(): No such serialized tensor 'running_variance' (read at /home/shahriar/Build/pytorch/torch/csrc/api/src/serialize/input-archive.cpp:27) frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x85 (0x7f2d92d32f95 in /usr/local/lib/libc10.so) frame #1: torch::serialize::InputArchive::read(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, at::Tensor&, bool) + 0xdeb (0x7f2d938551ab in /usr/local/lib/libtorch.so.1) frame #2: torch::nn::Module::load(torch::serialize::InputArchive&) + 0x98 (0x7f2d9381cd08 in /usr/local/lib/libtorch.so.1) frame #3: torch::nn::Module::load(torch::serialize::InputArchive&) + 0xf9 (0x7f2d9381cd69 in /usr/local/lib/libtorch.so.1) frame #4: torch::nn::Module::load(torch::serialize::InputArchive&) + 0xf9 (0x7f2d9381cd69 in /usr/local/lib/libtorch.so.1) frame #5: torch::nn::operator>>(torch::serialize::InputArchive&, std::shared_ptr<torch::nn::Module> const&) + 0x32 (0x7f2d9381c7b2 in /usr/local/lib/libtorch.so.1) frame pytorch#6: <unknown function> + 0x2b16c (0x5645f4d1916c in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest) frame pytorch#7: <unknown function> + 0x27a3c (0x5645f4d15a3c in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest) frame pytorch#8: <unknown function> + 0x2165c (0x5645f4d0f65c in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest) frame pytorch#9: <unknown function> + 0x1540b (0x5645f4d0340b in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest) frame pytorch#10: __libc_start_main + 0xf3 (0x7f2d051dd223 in /usr/lib/libc.so.6) frame pytorch#11: <unknown function> + 0x1381e (0x5645f4d0181e in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest) ``` Renaming C++ BatchNorm `running_variance` to `running_var` should fix this problem. This is a BC-breaking change, but it should be easy for end user to rename `running_variance` to `running_var` in their call sites. Pull Request resolved: pytorch#17371 Reviewed By: goldsborough Differential Revision: D14172775 Pulled By: yf225 fbshipit-source-id: b9d3729ec79272a8084269756f28a8f7c4dd16b6

…fc3ff6 (pytorch#18028) Summary: Pull Request resolved: pytorch#18028 Previous import was 520e8e135f1ad75959bf9b5bd15c361b8caeb8d6 Included changes: - **[d1f45b1](houseroad/foxi@d1f45b1)**: update the gitignore (pytorch#6) <Lu Fang> - **[398135c](houseroad/foxi@398135c)**: Remove static variable in header (#3) <Lu Fang> - **[f817be1](houseroad/foxi@f817be1)**: sync to ONNX cb544d07cc022e3fe83622fda9b2b1fa00b75b89 (#2) <Lu Fang> Reviewed By: zrphercule Differential Revision: D14464213 fbshipit-source-id: b5d166f05f7fd503dec11d676e219cc6c6a373f9

Summary: Tracing models which attempts to return this in-place value doesn't turn out well. I haven't run any tests to confirm the results to be honest, but regardless of the outcome, the operation happens in-place, so it should work as before. Sample output from traced model attempting to set `max_norm` on `Embedding`: ``` a leaf Variable that requires grad has been used in an in-place operation. (check_inplace at /pytorch/torch/csrc/autograd/VariableTypeUtils.h:49) frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f0ecc5cc021 in /usr/local/lib/python3.7/site-packages/torch/lib/libc10.so) frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f0ecc5cb8ea in /usr/local/lib/python3.7/site-packages/torch/lib/libc10.so) frame #2: <unknown function> + 0x38ab2f (0x7f0ecb55ab2f in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1) frame #3: torch::autograd::VariableType::embedding_renorm_(at::Tensor&, at::Tensor const&, double, double) const + 0x76 (0x7f0ecb5b5966 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1) frame #4: <unknown function> + 0x56c958 (0x7f0ecb73c958 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1) frame #5: <unknown function> + 0x672286 (0x7f0ecb842286 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1) frame pytorch#6: torch::jit::InterpreterState::run(std::vector<c10::IValue, std::allocator<c10::IValue> >&) + 0x22 (0x7f0ecb83d842 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1) frame pytorch#7: <unknown function> + 0x65c6ac (0x7f0ecb82c6ac in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1) frame pytorch#8: <unknown function> + 0x3c8ab4 (0x7f0f06bc0ab4 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch_python.so) frame pytorch#9: <unknown function> + 0x3ad2c3 (0x7f0f06ba52c3 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch_python.so) frame pytorch#10: <unknown function> + 0x11663e (0x7f0f0690e63e in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch_python.so) <omitting python frames> frame pytorch#39: python_call + 0x11 (0x5563c3c521c1 in uwsgi) frame pytorch#40: uwsgi_request_wsgi + 0x100 (0x5563c3c54410 in uwsgi) frame pytorch#41: wsgi_req_recv + 0xac (0x5563c3becabc in uwsgi) frame pytorch#42: simple_loop_run + 0xc4 (0x5563c3c35be4 in uwsgi) frame pytorch#43: simple_loop + 0x10 (0x5563c3c35a00 in uwsgi) frame pytorch#44: uwsgi_ignition + 0x241 (0x5563c3c3a3a1 in uwsgi) frame pytorch#45: uwsgi_worker_run + 0x275 (0x5563c3c3ec35 in uwsgi) frame pytorch#46: <unknown function> + 0x8f22c (0x5563c3c3f22c in uwsgi) frame pytorch#47: <unknown function> + 0x3c13e (0x5563c3bec13e in uwsgi) frame pytorch#48: __libc_start_main + 0xf1 (0x7f0f138922e1 in /lib/x86_64-linux-gnu/libc.so.6) frame pytorch#49: _start + 0x2a (0x5563c3bec16a in uwsgi) : operation failed in interpreter: op_version_set = 0 def forward(self, input_1: Tensor) -> Tensor: _0 = torch.norm(self.item_embedding.weight, 2, 1, True) _1 = torch.div(self.item_embedding.weight, _0) m_weight = torch.t(_1) input_2 = torch.contiguous(input_1) weight_1 = torch.embedding_renorm_(self.item_embedding.weight, input_2, 1., 2.) ~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE x = torch.embedding(weight_1, input_2, -1, False, False) input_3 = torch.div(x, torch.norm(x, 2, 2, True)) max_batch_size = ops.prim.NumToTensor(torch.size(input_3, 0)) hx = torch.zeros([2, int(max_batch_size), 70], dtype=6, layout=0, device=torch.device("cpu")) _2 = [self.lstm_layer.weight_ih_l0, self.lstm_layer.weight_hh_l0, self.lstm_layer.weight_ih_l1, self.lstm_layer.weight_hh_l1] input_4, _3, _4 = torch.lstm(input_3, [hx, hx], _2, False, 2, 0.10000000000000001, False, False, True) input = torch.matmul(input_4, torch.t(self.rnn2item.weight)) tastevec = torch.div(input, torch.norm(input, 2, 2, True)) outputs = torch.matmul(tastevec, m_weight) ``` Pull Request resolved: pytorch#18684 Differential Revision: D14782041 Pulled By: ezyang fbshipit-source-id: 7b2fc19b7d5b6600263644498bb728319a19f39d

Summary: We have encountered `std::bad_cast` error when running PyTorch binary built with cxx11 abi on CentOS7, stack trace: ``` #0 0x00007fec10160207 in raise () from /lib64/libc.so.6 #1 0x00007fec101618f8 in abort () from /lib64/libc.so.6 #2 0x00007fec015767d5 in __gnu_cxx::__verbose_terminate_handler() () from /lib64/libstdc++.so.6 #3 0x00007fec01574746 in ?? () from /lib64/libstdc++.so.6 #4 0x00007fec01574773 in std::terminate() () from /lib64/libstdc++.so.6 #5 0x00007fec01574993 in __cxa_throw () from /lib64/libstdc++.so.6 pytorch#6 0x00007fec015c94d2 in std::__throw_bad_cast() () from /lib64/libstdc++.so.6 pytorch#7 0x00007feb2ab3c2d7 in std::__cxx11::numpunct<char> const& std::use_facet<std::__cxx11::numpunct<char> >(std::locale const&) () from /root/.local/lib/python2.7/site-packages/torch/lib/libcaffe2.so pytorch#8 0x00007feb28643d62 in torch::jit::script::strtod_c(char const*, char**) () from /root/.local/lib/python2.7/site-packages/torch/lib/libcaffe2.so ``` We are suspecting this line will get compiled to gcc abi dependent symbol: ``` char decimal_point = std::use_facet<std::numpunct<char>>(std::locale()).decimal_point(); ``` Pull Request resolved: pytorch#21293 Differential Revision: D15609910 Pulled By: bddppq fbshipit-source-id: e247059729863868e4b36d6fec4fcbc36fbc4bb1

jaliyae and others added 9 commits December 27, 2018 01:00

Adding chunk dataset API and tests

302d0e6

Add CTFParser and CTFChunkDataset support with unit tests

e845c84

Add template to CTF parser and ChunkDataset implementation

4ec4123

Refactor CTFParser and CTFChunkDataset to improve performance

0319cf2

Improve CTF parsing performance by reserving capacity for vectors

1b5f23b

Implement buffering for ctf parsing

bcf14f3

Improve CTF Reader performance

9b0853f

Improve CTF Parser performance

f7d280f

Refactor to improve CTF Parser performance

6ce6a6e

jaliyae reviewed Jan 9, 2019

View reviewed changes

test/cpp/api/dataloader.cpp Show resolved Hide resolved

jaliyae reviewed Jan 9, 2019

View reviewed changes

torch/csrc/api/include/torch/data/ctf/ctf_chunk_dataset.h Show resolved Hide resolved

jaliyae reviewed Jan 9, 2019

View reviewed changes

torch/csrc/api/include/torch/data/ctf/reader.h Show resolved Hide resolved

jaliyae reviewed Jan 9, 2019

View reviewed changes

torch/csrc/api/src/data/ctf/reader.cpp Outdated Show resolved Hide resolved

jaliyae reviewed Jan 9, 2019

View reviewed changes

torch/csrc/api/src/data/ctf/reader.cpp Outdated Show resolved Hide resolved

jaliyae reviewed Jan 10, 2019

View reviewed changes

torch/csrc/api/src/data/ctf/reader.cpp Outdated Show resolved Hide resolved

jaliyae reviewed Jan 10, 2019

View reviewed changes

torch/csrc/api/include/torch/data/ctf/ctf_parser.h Show resolved Hide resolved

jaliyae reviewed Jan 10, 2019

View reviewed changes

Thiago Crepaldi added 2 commits January 11, 2019 10:44

Improve CTF Reader as per jaliyae comment

7a586e4

Change parser to store data per stream instead of per sample

0646b4c

xzhu1900 reviewed Jan 25, 2019

View reviewed changes

torch/csrc/api/include/torch/data/ctf/ctf_parser.h Show resolved Hide resolved

xzhu1900 reviewed Jan 25, 2019

View reviewed changes

torch/csrc/api/include/torch/data/ctf/ctf_parser.h Show resolved Hide resolved

xzhu1900 reviewed Jan 25, 2019

View reviewed changes

torch/csrc/api/include/torch/data/ctf/ctf_parser.h Show resolved Hide resolved

xzhu1900 reviewed Jan 25, 2019

View reviewed changes

torch/csrc/api/include/torch/data/ctf/ctf_parser.h Show resolved Hide resolved

xzhu1900 reviewed Jan 26, 2019

View reviewed changes

torch/csrc/api/include/torch/data/ctf/ctf_parser.h Show resolved Hide resolved

xzhu1900 reviewed Jan 26, 2019

View reviewed changes

torch/csrc/api/include/torch/data/ctf/ctf_parser.h Show resolved Hide resolved

xzhu1900 reviewed Jan 26, 2019

View reviewed changes

torch/csrc/api/include/torch/data/ctf/ctf_parser.h Show resolved Hide resolved

xzhu1900 reviewed Jan 26, 2019

View reviewed changes

torch/csrc/api/include/torch/data/ctf/reader.h Show resolved Hide resolved

xzhu1900 reviewed Jan 26, 2019

View reviewed changes

torch/csrc/api/include/torch/data/ctf/ctf_parser.h Show resolved Hide resolved

xzhu1900 reviewed Jan 26, 2019

View reviewed changes

torch/csrc/api/include/torch/data/ctf/reader.h Show resolved Hide resolved

xzhu1900 reviewed Jan 26, 2019

View reviewed changes

torch/csrc/api/include/torch/data/ctf/reader.h Show resolved Hide resolved

xzhu1900 reviewed Jan 26, 2019

View reviewed changes

torch/csrc/api/include/torch/data/ctf/reader.h Show resolved Hide resolved

xzhu1900 reviewed Jan 26, 2019

View reviewed changes

xzhu1900 reviewed Jan 29, 2019

View reviewed changes

torch/csrc/api/include/torch/data/ctf/ctf_parser.h Show resolved Hide resolved

xzhu1900 reviewed Jan 29, 2019

View reviewed changes

torch/csrc/api/include/torch/data/ctf/ctf_parser.h Show resolved Hide resolved

xzhu1900 reviewed Jan 29, 2019

View reviewed changes

torch/csrc/api/include/torch/data/ctf/ctf_parser.h Show resolved Hide resolved

xzhu1900 reviewed Jan 29, 2019

View reviewed changes

torch/csrc/api/include/torch/data/ctf/ctf_parser.h Show resolved Hide resolved

xzhu1900 reviewed Jan 29, 2019

View reviewed changes

torch/csrc/api/include/torch/data/ctf/ctf_parser.h Show resolved Hide resolved

thiagocrepaldi closed this Feb 6, 2019

thiagocrepaldi deleted the thiagofc/pr_ctf branch May 4, 2023 17:17

CTF parser and CTF chunkdataset #3

CTF parser and CTF chunkdataset #3

Uh oh!

Conversation

thiagocrepaldi commented Jan 3, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jaliyae left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xzhu1900 Jan 26, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants