Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast scan of a vector of properties of consecutive node IDs in columns #583

Closed
semihsalihoglu-uw opened this issue May 5, 2022 · 1 comment

Comments

@semihsalihoglu-uw
Copy link
Contributor

At the time of writing our main column scan operator uses the following function from column.cpp:

void Column::readValues(
    const shared_ptr<ValueVector>& nodeIDVector, const shared_ptr<ValueVector>& valueVector) {
    assert(nodeIDVector->dataType.typeID == NODE);
    if (nodeIDVector->state->isFlat()) {
        auto pos = nodeIDVector->state->getPositionOfCurrIdx();
        readForSingleNodeIDPosition(pos, nodeIDVector, valueVector);
    } else {
        for (auto i = 0ul; i < nodeIDVector->state->selectedSize; i++) {
            auto pos = nodeIDVector->state->selectedPositions[i];
            readForSingleNodeIDPosition(pos, nodeIDVector, valueVector);
        }
    }
}

void Column::readForSingleNodeIDPosition(uint32_t pos, const shared_ptr<ValueVector>& nodeIDVector,
    const shared_ptr<ValueVector>& resultVector) {
    if (nodeIDVector->isNull(pos)) {
        resultVector->setNull(pos, true);
        return;
    }
    auto pageCursor = PageUtils::getPageElementCursorForOffset(
        nodeIDVector->readNodeOffset(pos), numElementsPerPage);
    auto frame = bufferManager.pin(fileHandle, pageCursor.idx);
    memcpy(resultVector->values + pos * elementSize,
        frame + mapElementPosToByteOffset(pageCursor.pos), elementSize);
    setNULLBitsForAPos(resultVector, frame, pageCursor.pos, pos);
    bufferManager.unpin(fileHandle, pageCursor.idx);
}

That is, for each node in the vector, we read it's value, which attempts to pin a page through the BM, so acquires a lock etc. This is quite inefficient for the case when the given nodeIDs are consecutive and many of them can be scanned in chunks. We need a fast path to be able to perform these fast scans.

@semihsalihoglu-uw semihsalihoglu-uw changed the title Scanning a vector of nodeIDs in columns too Fast scan of a vector of consecutive nodeIDs in columns May 5, 2022
@semihsalihoglu-uw semihsalihoglu-uw changed the title Fast scan of a vector of consecutive nodeIDs in columns Fast scan of a vector of properties of consecutive node IDs in columns May 5, 2022
@andyfengHKU
Copy link
Contributor

Fixed in PR #627

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants