New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate Arrow-related code to incorporate IPC code improvements since 0.4.x, simplifications and perf improvements #65
Conversation
wesm
commented
Sep 3, 2017
•
edited
edited
- Use up-to-date Arrow IPC serialization / read APIs
- Use vector appends with Arrow builders (cf ARROW-1468)
- Avoid extra full result set memory copy in multithreaded fetches
- Use library calls for pretty printing (cf ARROW-1396)
Thanks, the API simplifications look very nice. Regarding exceptions, it all depends on their nature. If it's running out of memory for example, you can throw |
OK, I'll add a function to create different exception types based on the type of error status |
3f8b5cd
to
a850a24
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About all for tonight -- I need to do some do some debugging to get the test suite passing again. The next step beyond this is to make a bridge between MapD's CudaMgr and Arrow's GPU memory management so that the IPC code can write into device memory owned by MapD in a clean way: https://issues.apache.org/jira/browse/ARROW-1423.
Once I finish all this some additional testing from folks would be helpful to ensure I haven't broken anything, or instructions how I can test
As a last matter (probably in a follow up patch), I'm going to add some simple APIs for dealing with SysV and POSIX shared memory segments: https://issues.apache.org/jira/browse/ARROW-1385
@@ -68,36 +69,36 @@ std::vector<TargetValue> ArrowResultSet::getNextRow(const bool translate_strings | |||
const auto& column_typeinfo = getColType(i); | |||
switch (column_typeinfo.get_type()) { | |||
case kSMALLINT: { | |||
CHECK(dynamic_cast<const arrow::NumericArray<arrow::Int16Type>*>(&column)); | |||
const auto& i16_column = static_cast<const arrow::NumericArray<arrow::Int16Type>&>(column); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not recommend using arrow::NumericArray
-- use the typedefs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I didn't notice them when I wrote this.
QueryEngine/ResultSetConversion.cpp
Outdated
CHECK(is_valid.get()); | ||
ARROW_THROW_NOT_OK(typed_builder->Append(vals.data(), length, *is_valid)); | ||
} else { | ||
ARROW_THROW_NOT_OK(typed_builder->Append(vals.data(), length)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vector appends
@wesm Thanks, I'll have a look tomorrow. For testing, have a look at https://github.com/mapd/mapd-core/blob/master/run_sanity_tests. You can run it as is, but you only need to run |
Yep, I found that. I'll step around gdb and fix the bugs I introduced, probably tomorrow later in the day |
QueryEngine/ArrowUtil.h
Outdated
#define ARROW_THROW_NOT_OK(s) \ | ||
do { \ | ||
::arrow::Status _s = (s); \ | ||
if (ARROW_PREDICT_FALSE(!_s.ok())) { \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, you can use UNLIKELY
from Shared/likely.h
. I guess everyone has a version of this macro somewhere in the codebase.
Do the tests pass now? |
Not yet, I am going to look at them hopefully before I go to bed tonight |
I think it would be best to defer improvements re: direct writes to GPU device memory (i.e. https://github.com/apache/arrow/blob/master/cpp/src/arrow/gpu/cuda_arrow_ipc.h#L50) to a subsequent patch. I also will need to update the toolchain files with an updated Arrow version and build recipe. 0.7.0 will probably be released next week roughly so you shouldn't have to be on an unreleased version for very long |
I'm out of gas for tonight. I have a few free hours tomorrow morning (Eastern Time) so I'll look at this first thing and and get the issues worked out |
@asuhan I'm finding when I build with gcc 4.9 that I get errors like
When I build with clang-3.8 instead, the tests work, but gdb doesn't work very well -- I'm able to get break points to work but inspecting variables seems not to work. I don't have any issues with gdb elsewhere, so it seems this is something MapD-specific. I can debug with print statements but gdb would really be better |
I created a |
Looking at this again later this evening |
OK, I'll have a look myself tomorrow if you don't get to the bottom of it. FYI, I've seen a similar type of crash during my initial development of the test and it was due to buffer ownership problems. Holding the |
Hm, interesting. Perhaps a pointer is being leaked somewhere. I will let you know if I can't sort it out As an aside, it would also be better to use |
5881226
to
3f615c0
Compare
OK, tests are passing again. I summarized the work in the commit message. Some follow up JIRAs When you use |
…to 0.7.x * Avoid extra full result set memory copy in multithreaded fetches * Use vector appends with Arrow builders (cf ARROW-1468) * Use up-to-date Arrow IPC serialization / read APIs * Fix memory ownership issues with RecordBatch::column (returns a new object now) * Use library calls for pretty printing (cf ARROW-1396)
3f615c0
to
b6c7fce
Compare
I have not made use of any of the direct-to-CUDA serialization code in Arrow yet. I think it would be better to scrutinize that patch more carefully in isolation |
I agree we should leave the CUDA parts for later. We should also remove the dependency on the Arrow GPU, The patch looks good otherwise, I'll pick it with the dependency on Arrow GPU removed for now. Thanks! |
Oops, sorry about that (didn't test the toolchain from scratch). Thanks! |
…-dbe-install remove editable dbe install by default