forked from apache/arrow
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
…#37525) ### Rationale for this change In order to add an `arrow.tabular.Table` class to the MATLAB Interface, we first need to add a MATLAB class representing `arrow::ChunkedArray`s. This is required because an `arrow::Table` is backed by a vector of `arrow::ChunkedArray`s, and the output of its `column(int index)` method is an `arrow::ChunkedArray`. ### What changes are included in this PR? 1. Introduced a new class called `arrow.array.ChunkedArray`. 2. `arrow.array.ChunkedArray` has the following properties: 1. `Type` - datatype of the `arrow.array.Array`s 2. `Length` - Sum of the `arrow.array.Array` lengths 3. `NumChunks` - Number of `arrow.array.Array`s 3. `arrow.array.ChunkedArray` has the following methods: 1. `chunk(index)` - Returns the `arrow.array.Array` stored at the specified index 2. `fromArrays(array1, array1, ..., arrayN, Type=type)` - Creates a `ChunkedArray` from the arrays provided. If `Type` is provided, all arrays are expected to have the specified `Type`. **Example Usage** ```matlab >> a1 = arrow.array(1:100); >> a2 = arrow.array(101:250); >> a3 = arrow.array(251:300); % Create a ChunkedArray from 3 Float64Arrays >> c = arrow.array.ChunkedArray.fromArrays(a1, a2, a3) c = ChunkedArray with properties: Type: [1×1 arrow.type.Float64Type] NumChunks: 3 Length: 300 % Extract the first chunk and compare it to a1 >> c1 = c.chunk(1); >> tf = isequal(c1, a1) tf = logical 1 % Create an empty ChunkedArray by providing the Type nv-pair >> c = arrow.array.ChunkedArray.fromArrays(Type=arrow.timestamp()) c = ChunkedArray with properties: Type: [1×1 arrow.type.TimestampType] NumChunks: 0 Length: 0 ``` ### Are these changes tested? Yes. I added a new test class called `tChunkedArray.m` that contains unit tests for the new class. ### Are there any user-facing changes? Yes. Users can now create a `ChunkedArray` in the MATLAB Interface. ### Future Directions 1. In this PR, we deliberately didn't include a convenience constructor function because we're not sure if we want users to create `ChunkedArray`s themselves. We think users will mostly use `ChunkedArray` when extracting columns from `Table`s. 2. We will implement more methods on `ChunkedArray`, such as `flatten()` and `combineChunks()`, etc. * Closes: apache#37448 Authored-by: Sarah Gilmore <sgilmore@mathworks.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
- Loading branch information
1 parent
a38daf5
commit 59b0403
Showing
25 changed files
with
694 additions
and
52 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
187 changes: 187 additions & 0 deletions
187
matlab/src/cpp/arrow/matlab/array/proxy/chunked_array.cc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,187 @@ | ||
// Licensed to the Apache Software Foundation (ASF) under one | ||
// or more contributor license agreements. See the NOTICE file | ||
// distributed with this work for additional information | ||
// regarding copyright ownership. The ASF licenses this file | ||
// to you under the Apache License, Version 2.0 (the | ||
// "License"); you may not use this file except in compliance | ||
// with the License. You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, | ||
// software distributed under the License is distributed on an | ||
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
// KIND, either express or implied. See the License for the | ||
// specific language governing permissions and limitations | ||
// under the License. | ||
|
||
#include "arrow/util/utf8.h" | ||
|
||
#include "arrow/matlab/array/proxy/chunked_array.h" | ||
#include "arrow/matlab/array/proxy/array.h" | ||
#include "arrow/matlab/error/error.h" | ||
#include "arrow/matlab/type/proxy/wrap.h" | ||
#include "arrow/matlab/array/proxy/wrap.h" | ||
|
||
#include "libmexclass/proxy/ProxyManager.h" | ||
|
||
namespace arrow::matlab::array::proxy { | ||
|
||
namespace { | ||
libmexclass::error::Error makeEmptyChunkedArrayError() { | ||
const std::string error_msg = "Numeric indexing using the chunk method is not supported for chunked arrays with zero chunks."; | ||
return libmexclass::error::Error{error::CHUNKED_ARRAY_NUMERIC_INDEX_WITH_EMPTY_CHUNKED_ARRAY, error_msg}; | ||
} | ||
|
||
libmexclass::error::Error makeInvalidNumericIndexError(const int32_t matlab_index, const int32_t num_chunks) { | ||
std::stringstream error_message_stream; | ||
error_message_stream << "Invalid chunk index: "; | ||
error_message_stream << matlab_index; | ||
error_message_stream << ". Chunk index must be between 1 and the number of chunks ("; | ||
error_message_stream << num_chunks; | ||
error_message_stream << ")."; | ||
return libmexclass::error::Error{error::CHUNKED_ARRAY_INVALID_NUMERIC_CHUNK_INDEX, error_message_stream.str()}; | ||
} | ||
} | ||
|
||
ChunkedArray::ChunkedArray(std::shared_ptr<arrow::ChunkedArray> chunked_array) : chunked_array{std::move(chunked_array)} { | ||
|
||
// Register Proxy methods. | ||
REGISTER_METHOD(ChunkedArray, getLength); | ||
REGISTER_METHOD(ChunkedArray, getNumChunks); | ||
REGISTER_METHOD(ChunkedArray, getChunk); | ||
REGISTER_METHOD(ChunkedArray, getType); | ||
REGISTER_METHOD(ChunkedArray, isEqual); | ||
} | ||
|
||
|
||
libmexclass::proxy::MakeResult ChunkedArray::make(const libmexclass::proxy::FunctionArguments& constructor_arguments) { | ||
namespace mda = ::matlab::data; | ||
|
||
mda::StructArray opts = constructor_arguments[0]; | ||
const mda::TypedArray<uint64_t> array_proxy_ids = opts[0]["ArrayProxyIDs"]; | ||
const mda::TypedArray<uint64_t> type_proxy_id = opts[0]["TypeProxyID"]; | ||
|
||
std::vector<std::shared_ptr<arrow::Array>> arrays; | ||
// Retrieve all of the Array Proxy instances from the libmexclass ProxyManager. | ||
for (const auto& array_proxy_id : array_proxy_ids) { | ||
auto proxy = libmexclass::proxy::ProxyManager::getProxy(array_proxy_id); | ||
auto array_proxy = std::static_pointer_cast<proxy::Array>(proxy); | ||
auto array = array_proxy->unwrap(); | ||
arrays.push_back(array); | ||
} | ||
|
||
auto proxy = libmexclass::proxy::ProxyManager::getProxy(type_proxy_id[0]); | ||
auto type_proxy = std::static_pointer_cast<type::proxy::Type>(proxy); | ||
auto type = type_proxy->unwrap(); | ||
|
||
MATLAB_ASSIGN_OR_ERROR(auto chunked_array, | ||
arrow::ChunkedArray::Make(arrays, type), | ||
error::CHUNKED_ARRAY_MAKE_FAILED); | ||
|
||
return std::make_unique<proxy::ChunkedArray>(std::move(chunked_array)); | ||
} | ||
|
||
std::shared_ptr<arrow::ChunkedArray> ChunkedArray::unwrap() { | ||
return chunked_array; | ||
} | ||
|
||
void ChunkedArray::getLength(libmexclass::proxy::method::Context& context) { | ||
namespace mda = ::matlab::data; | ||
mda::ArrayFactory factory; | ||
auto length_mda = factory.createScalar(chunked_array->length()); | ||
context.outputs[0] = length_mda; | ||
} | ||
|
||
void ChunkedArray::getNumChunks(libmexclass::proxy::method::Context& context) { | ||
namespace mda = ::matlab::data; | ||
mda::ArrayFactory factory; | ||
auto length_mda = factory.createScalar(chunked_array->num_chunks()); | ||
context.outputs[0] = length_mda; | ||
} | ||
|
||
void ChunkedArray::getChunk(libmexclass::proxy::method::Context& context) { | ||
namespace mda = ::matlab::data; | ||
mda::ArrayFactory factory; | ||
|
||
mda::StructArray args = context.inputs[0]; | ||
const mda::TypedArray<int32_t> index_mda = args[0]["Index"]; | ||
const auto matlab_index = int32_t(index_mda[0]); | ||
|
||
// Note: MATLAB uses 1-based indexing, so subtract 1. | ||
// arrow::Schema::field does not do any bounds checking. | ||
const int32_t index = matlab_index - 1; | ||
const auto num_chunks = chunked_array->num_chunks(); | ||
|
||
if (num_chunks == 0) { | ||
context.error = makeEmptyChunkedArrayError(); | ||
return; | ||
} | ||
|
||
if (matlab_index < 1 || matlab_index > num_chunks) { | ||
context.error = makeInvalidNumericIndexError(matlab_index, num_chunks); | ||
return; | ||
} | ||
|
||
const auto array = chunked_array->chunk(index); | ||
MATLAB_ASSIGN_OR_ERROR_WITH_CONTEXT(auto array_proxy, | ||
arrow::matlab::array::proxy::wrap(array), | ||
context, | ||
error::UNKNOWN_PROXY_FOR_ARRAY_TYPE); | ||
|
||
|
||
const auto array_proxy_id = libmexclass::proxy::ProxyManager::manageProxy(array_proxy); | ||
const auto type_id = static_cast<int64_t>(array->type_id()); | ||
|
||
mda::StructArray output = factory.createStructArray({1, 1}, {"ProxyID", "TypeID"}); | ||
output[0]["ProxyID"] = factory.createScalar(array_proxy_id); | ||
output[0]["TypeID"] = factory.createScalar(type_id); | ||
context.outputs[0] = output; | ||
} | ||
|
||
|
||
void ChunkedArray::getType(libmexclass::proxy::method::Context& context) { | ||
namespace mda = ::matlab::data; | ||
|
||
mda::ArrayFactory factory; | ||
|
||
MATLAB_ASSIGN_OR_ERROR_WITH_CONTEXT(auto type_proxy, | ||
type::proxy::wrap(chunked_array->type()), | ||
context, | ||
error::ARRAY_FAILED_TO_CREATE_TYPE_PROXY); | ||
|
||
|
||
const auto proxy_id = libmexclass::proxy::ProxyManager::manageProxy(type_proxy); | ||
const auto type_id = static_cast<int32_t>(type_proxy->unwrap()->id()); | ||
|
||
mda::StructArray output = factory.createStructArray({1, 1}, {"ProxyID", "TypeID"}); | ||
output[0]["ProxyID"] = factory.createScalar(proxy_id); | ||
output[0]["TypeID"] = factory.createScalar(type_id); | ||
context.outputs[0] = output; | ||
} | ||
|
||
void ChunkedArray::isEqual(libmexclass::proxy::method::Context& context) { | ||
namespace mda = ::matlab::data; | ||
|
||
const mda::TypedArray<uint64_t> chunked_array_proxy_ids = context.inputs[0]; | ||
|
||
bool is_equal = true; | ||
for (const auto& chunked_array_proxy_id : chunked_array_proxy_ids) { | ||
// Retrieve the ChunkedArray proxy from the ProxyManager | ||
auto proxy = libmexclass::proxy::ProxyManager::getProxy(chunked_array_proxy_id); | ||
auto chunked_array_proxy = std::static_pointer_cast<proxy::ChunkedArray>(proxy); | ||
auto chunked_array_to_compare = chunked_array_proxy->unwrap(); | ||
|
||
// Use the ChunkedArray::Equals(const ChunkedArray& other) overload instead | ||
// of ChunkedArray::Equals(const std::shared_ptr<ChunkedArray> other&) to | ||
// ensure we don't assume chunked arrays with the same memory address are | ||
// equal. This ensures we treat NaNs as not equal by default. | ||
if (!chunked_array->Equals(*chunked_array_to_compare)) { | ||
is_equal = false; | ||
break; | ||
} | ||
} | ||
mda::ArrayFactory factory; | ||
context.outputs[0] = factory.createScalar(is_equal); | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
// Licensed to the Apache Software Foundation (ASF) under one | ||
// or more contributor license agreements. See the NOTICE file | ||
// distributed with this work for additional information | ||
// regarding copyright ownership. The ASF licenses this file | ||
// to you under the Apache License, Version 2.0 (the | ||
// "License"); you may not use this file except in compliance | ||
// with the License. You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, | ||
// software distributed under the License is distributed on an | ||
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
// KIND, either express or implied. See the License for the | ||
// specific language governing permissions and limitations | ||
// under the License. | ||
|
||
#pragma once | ||
|
||
#include "arrow/chunked_array.h" | ||
|
||
#include "libmexclass/proxy/Proxy.h" | ||
|
||
namespace arrow::matlab::array::proxy { | ||
|
||
class ChunkedArray : public libmexclass::proxy::Proxy { | ||
public: | ||
ChunkedArray(std::shared_ptr<arrow::ChunkedArray> chunked_array); | ||
|
||
~ChunkedArray() {} | ||
|
||
std::shared_ptr<arrow::ChunkedArray> unwrap(); | ||
|
||
static libmexclass::proxy::MakeResult make(const libmexclass::proxy::FunctionArguments& constructor_arguments); | ||
|
||
protected: | ||
|
||
void getLength(libmexclass::proxy::method::Context& context); | ||
|
||
void getNumChunks(libmexclass::proxy::method::Context& context); | ||
|
||
void getChunk(libmexclass::proxy::method::Context& context); | ||
|
||
void getType(libmexclass::proxy::method::Context& context); | ||
|
||
void isEqual(libmexclass::proxy::method::Context& context); | ||
|
||
std::shared_ptr<arrow::ChunkedArray> chunked_array; | ||
}; | ||
|
||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.