Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: Page shouldn't close a block twice #100370

Merged
merged 6 commits into from
Oct 6, 2023
Merged

Conversation

costin
Copy link
Member

@costin costin commented Oct 5, 2023

Page now takes into account that a block can be used in multiple
positions (such as the same column aliased under multiple names).

Relates #100001
Fix #100365
Fix #100356

Page now takes into account that a block can be used in multiple
 positions (such as the same column aliased under multiple names).

Relates elastic#100001
Fix elastic#100365
Fix elastic#100356
@costin costin self-assigned this Oct 5, 2023
@elasticsearchmachine elasticsearchmachine added the Team:QL (Deprecated) Meta label for query languages team label Oct 5, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-ql (Team:QL)

@elasticsearchmachine
Copy link
Collaborator

Hi @costin, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL)

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alex-spies should probably look at this too. It does get the job done. And we don't need to keep it forever. If we have reference counting we wouldn't need it. But we don't have it now.

@@ -222,6 +223,13 @@ public void writeTo(StreamOutput out) throws IOException {
*/
public void releaseBlocks() {
blocksReleased = true;
Releasables.closeExpectNoException(blocks);
// blocks can be used as multiple columns
var set = new IdentityHashMap<Block, Object>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Collections.newSetFromMap is the idiomatic way to do this. And all the cool kids want to be idiots, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:)
I actually tried that first but it blew up because the blocks equals implementation checks the content meaning different blocks end up looking equal meaning they are not closed as they are considered a duplicate.
Which causes a bunch of leaks - hence why I'm using the Identity match alone.

@nik9000
Copy link
Member

nik9000 commented Oct 5, 2023

@ChrisHegarty also might have opinions on this.

* That is, allows clean-up of the current page _after_ external manipulation of the blocks.
* The current page should no longer be used and be considered closed.
*/
public Page newPageAndRelease(Block... keep) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've introduced this method for handling the case of reusing some blocks from an old page into a new one.
This happens in ProjectExec and OutputExec (inside the planner) - I've moved this method from the Project into the page and clean-it up a bit (the usual reduction complexity from using two lists - O(N*M) to O(N) with the extra memory for the map).

Comment on lines 91 to 92
for (int i = 0; i < toAdd.length; i++) {
this.blocks[prev.blocks.length + i] = toAdd[i];
}
System.arraycopy(toAdd, 0, this.blocks, prev.blocks.length + 0, toAdd.length);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small improvement .

Comment on lines -91 to +92
for (int i = 0; i < toAdd.length; i++) {
this.blocks[prev.blocks.length + i] = toAdd[i];
}
System.arraycopy(toAdd, 0, this.blocks, prev.blocks.length, toAdd.length);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small improvement.

@costin
Copy link
Member Author

costin commented Oct 6, 2023

@elasticsearchmachine run elasticsearch-ci/part-1

@costin
Copy link
Member Author

costin commented Oct 6, 2023

FTR, it looks like all the builders for part-1 end up being disabled...

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as an intermediate fix.

IMO we're beginning to create a lot of workarounds to make up for a lack of ref counted shallow copies of blocks and that makes the code harder to reason about.

Comment on lines 225 to 231
var map = new IdentityHashMap<Block, Object>(mapSize(blocks.length));
var DUMMY = new Object();
for (Block b : blocks) {
if (map.putIfAbsent(b, DUMMY) == null) {
Releasables.closeExpectNoException(b);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this improves the issue, but there still can be problems because non-identical blocks can be backed by the same vector or array.

Additionally, we create and populate a new hash map on each page release; that's probably quite expensive.

We can merge this to fix the problem right now, but IMO should mark this with a todo comment to remove this logic once possible. We want ref counting, this will fix the problem more idiomatically.

Copy link
Member Author

@costin costin Oct 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-identical blocks can be backed by the same vector or array.

Why would they?

we create and populate a new hash map on each page release; that's probably quite expensive.

I've added a check to skip the release once it's one - we can change that if we want a page to only be released once.

Releasables.closeExpectNoException(blocks);
// blocks can be used as multiple columns
var map = new IdentityHashMap<Block, Object>(mapSize(blocks.length));
var DUMMY = new Object();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could be moved into a static final var.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or...

var map = new IdentityHashMap<Block, Boolean>(mapSize(blocks.length));
for (Block b : blocks) {
    if (map.putIfAbsent(b, Boolean.TRUE) == null) { ..

Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I left a few small comments / suggestions.

Additionally, it would be good to have a small unit test in BasicPageTests for the Page additions.

Releasables.closeExpectNoException(blocks);
// blocks can be used as multiple columns
var map = new IdentityHashMap<Block, Object>(mapSize(blocks.length));
var DUMMY = new Object();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or...

var map = new IdentityHashMap<Block, Boolean>(mapSize(blocks.length));
for (Block b : blocks) {
    if (map.putIfAbsent(b, Boolean.TRUE) == null) { ..


// create identity set
for (Block b : keep) {
map.putIfAbsent(b, DUMMY);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subjective:

  // create identity set
  var set = Collections.newSetFromMap(new IdentityHashMap<Block, Boolean>(mapSize(keep.length)));
  set.addAll(Arrays.asList(keep));
  ..
      if (set.contains(b) == false) { ..

Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

}

static int mapSize(int expectedSize) {
return expectedSize < 2 ? expectedSize + 1 : (int) (expectedSize / 0.75 + 1.0);
Copy link
Contributor

@bpintea bpintea Oct 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CollectionUtils.mapSize()?
Edit: not avail in compute package, disregard.

@costin costin added buildkite-opt-in Opts your PR into Buildkite instead of Jenkins v8.11.1 auto-backport-and-merge Automatically create backport pull requests and merge when ready labels Oct 6, 2023
@costin costin merged commit 44068cb into elastic:main Oct 6, 2023
11 of 12 checks passed
@costin costin deleted the fix/100001 branch October 6, 2023 18:50
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.11 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 100370

costin added a commit to costin/elasticsearch that referenced this pull request Oct 6, 2023
Page now takes into account that a block can be used in multiple
 positions (such as the same column aliased under multiple names).
Introduce newPageAndRelease method that handles clean-up of blocks that
 are not-used when creating a new page

Relates elastic#100001
Fix elastic#100365
Fix elastic#100356

(cherry picked from commit 44068cb)
elasticsearchmachine pushed a commit that referenced this pull request Oct 6, 2023
* ESQL: Page shouldn't close a block twice (#100370)

Page now takes into account that a block can be used in multiple
 positions (such as the same column aliased under multiple names).
Introduce newPageAndRelease method that handles clean-up of blocks that
 are not-used when creating a new page

Relates #100001
Fix #100365
Fix #100356

(cherry picked from commit 44068cb)

* Fix order inside test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport-and-merge Automatically create backport pull requests and merge when ready >bug buildkite-opt-in Opts your PR into Buildkite instead of Jenkins Team:QL (Deprecated) Meta label for query languages team v8.11.1 v8.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] EsqlActionIT testProjectRename failing Test failures in rename
6 participants