Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: Fix synthetic attribute pruning #111413

Merged
merged 51 commits into from
Sep 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
205dfad
Use AggregatorMode instead of AggregateExec.Mode
alex-spies Jul 29, 2024
1cabef2
Make AggregateExec track intermediateAttributes
alex-spies Jul 29, 2024
54e0436
Implement QueryPlan.requiredInputSet
alex-spies Jul 29, 2024
890701e
Simplify and correct ProjectAwayColumns
alex-spies Jul 29, 2024
bb0c926
Small refactor
alex-spies Jul 29, 2024
f22f3a1
Squash some minor mistakes
alex-spies Jul 29, 2024
dbc9199
Update tests
alex-spies Jul 29, 2024
741dbd7
Simplify DependencyConsistency
alex-spies Jul 30, 2024
9e4dab5
Small refactor
alex-spies Jul 30, 2024
d78f073
Tiny refactor
alex-spies Jul 30, 2024
9695190
Start using synthetic=true again
alex-spies Jul 30, 2024
c343812
Merge remote-tracking branch 'upstream/main' into fix-synthetic-attri…
alex-spies Jul 30, 2024
c5e74be
Update docs/changelog/111413.yaml
alex-spies Jul 30, 2024
ac2d6e4
Do not filter out <no-fields> in the Analyzer
alex-spies Jul 30, 2024
54e1847
Merge remote-tracking branch 'upstream/main' into fix-synthetic-attri…
alex-spies Jul 30, 2024
7db76d8
Merge remote-tracking branch 'upstream/main' into fix-synthetic-attri…
alex-spies Aug 14, 2024
01d4c5f
Fix AggregateExec ser/de tests, hash, equals
alex-spies Aug 14, 2024
5443fef
Merge remote-tracking branch 'upstream/main' into fix-synthetic-attri…
alex-spies Aug 19, 2024
ddfdd2c
Add and use new constructor for FieldAttribute
alex-spies Aug 19, 2024
e7c212e
Mark another one as synthetic
alex-spies Aug 19, 2024
3c648b3
Rename requiredInputSet->childrenReferences
alex-spies Aug 19, 2024
4f01e7c
Synthetic as :s in debug output, e.g. field{f:s}#1
alex-spies Aug 19, 2024
ff8b72d
Spotless
alex-spies Aug 19, 2024
e4b258a
Merge remote-tracking branch 'upstream/main' into fix-synthetic-attri…
alex-spies Aug 20, 2024
9ad9e1c
Revert changes to AggregateExec
alex-spies Aug 20, 2024
7ff1991
Revert "Revert changes to AggregateExec"
alex-spies Aug 20, 2024
2eabfc2
Avoid using QueryPlan.references
alex-spies Aug 20, 2024
f1b9b86
Replace references with childrenReferences
alex-spies Aug 20, 2024
63c39a7
Fix a whoopsie
alex-spies Aug 20, 2024
6028664
Improve comments
alex-spies Aug 20, 2024
e193fb1
Add new transport version for AggregateExec
alex-spies Aug 20, 2024
ac44b1b
More tests for ProjectAwayColumns
alex-spies Aug 21, 2024
41e9159
Merge remote-tracking branch 'upstream/main' into fix-synthetic-attri…
alex-spies Aug 21, 2024
febb40f
Improve comment
alex-spies Aug 21, 2024
93cf2cf
Don't make computeReferences public
alex-spies Aug 21, 2024
9dbd5ea
Checkstyle
alex-spies Aug 21, 2024
d4dc892
Reduce noise via static imports
alex-spies Aug 22, 2024
0505dc4
Synthetic label: {f$} instead of {f:s}
alex-spies Aug 22, 2024
6f36f70
Make synth attr prefix inaccessible
alex-spies Aug 22, 2024
ec67cf2
Merge remote-tracking branch 'upstream/main' into fix-synthetic-attri…
alex-spies Aug 22, 2024
259d93f
Make FieldAttribute ctor with type package private
alex-spies Aug 22, 2024
7849978
Merge remote-tracking branch 'upstream/main' into fix-synthetic-attri…
alex-spies Aug 22, 2024
3582136
Merge remote-tracking branch 'upstream/main' into fix-synthetic-attri…
alex-spies Aug 28, 2024
d1f55e3
Refactor fieldNames for METRICS + add test
alex-spies Aug 28, 2024
776ed78
Update comment
alex-spies Aug 28, 2024
77431f6
Undo refactor to fieldNames; rely on index mode
alex-spies Aug 29, 2024
7c3aaf5
Merge remote-tracking branch 'upstream/main' into fix-synthetic-attri…
alex-spies Aug 29, 2024
e557eb4
Merge remote-tracking branch 'upstream/main' into fix-synthetic-attri…
alex-spies Aug 30, 2024
cb69899
Merge remote-tracking branch 'upstream/main' into fix-synthetic-attri…
alex-spies Sep 3, 2024
f65747d
Use default implementation for computeReferences
alex-spies Sep 3, 2024
a7b8715
Rename static helpers in Aggregate and Eval
alex-spies Sep 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/changelog/111413.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 111413
summary: "ESQL: Fix synthetic attribute pruning"
area: ES|QL
type: bug
issues:
- 105821
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,7 @@ static TransportVersion def(int id) {
public static final TransportVersion GET_DATA_STREAMS_VERBOSE = def(8_735_00_0);
public static final TransportVersion ESQL_ADD_INDEX_MODE_CONCRETE_INDICES = def(8_736_00_0);
public static final TransportVersion UNASSIGNED_PRIMARY_COUNT_ON_CLUSTER_HEALTH = def(8_737_00_0);
public static final TransportVersion ESQL_AGGREGATE_EXEC_TRACKS_INTERMEDIATE_ATTRS = def(8_738_00_0);

/*
* STOP! READ THIS FIRST! No, really,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,11 @@
* The rest are not as they are not part of the projection and thus are not part of the derived table.
*/
public abstract class Attribute extends NamedExpression {
/**
* Changing this will break bwc with 8.15, see {@link FieldAttribute#fieldName()}.
*/
protected static final String SYNTHETIC_ATTRIBUTE_NAME_PREFIX = "$$";

public static List<NamedWriteableRegistry.Entry> getNamedWriteables() {
// TODO add UnsupportedAttribute when these are moved to the same project
return List.of(FieldAttribute.ENTRY, MetadataAttribute.ENTRY, ReferenceAttribute.ENTRY);
Expand All @@ -49,6 +54,10 @@ public Attribute(Source source, String name, Nullability nullability, NameId id,
this.nullability = nullability;
}

public static String rawTemporaryName(String inner, String outer, String suffix) {
return SYNTHETIC_ATTRIBUTE_NAME_PREFIX + inner + "$" + outer + "$" + suffix;
}

@Override
public final Expression replaceChildren(List<Expression> newChildren) {
throw new UnsupportedOperationException("this type of node doesn't have any children to replace");
Expand Down Expand Up @@ -123,7 +132,7 @@ public boolean equals(Object obj) {

@Override
public String toString() {
return name() + "{" + label() + "}" + "#" + id();
return name() + "{" + label() + (synthetic() ? "$" : "") + "}" + "#" + id();
}

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,6 @@
* - nestedParent - if nested, what's the parent (which might not be the immediate one)
*/
public class FieldAttribute extends TypedAttribute {
// TODO: This constant should not be used if possible; use .synthetic()
// https://github.com/elastic/elasticsearch/issues/105821
public static final String SYNTHETIC_ATTRIBUTE_NAME_PREFIX = "$$";

static final NamedWriteableRegistry.Entry ENTRY = new NamedWriteableRegistry.Entry(
Attribute.class,
Expand All @@ -52,6 +49,10 @@ public FieldAttribute(Source source, FieldAttribute parent, String name, EsField
this(source, parent, name, field, Nullability.TRUE, null, false);
}

public FieldAttribute(Source source, FieldAttribute parent, String name, EsField field, boolean synthetic) {
this(source, parent, name, field, Nullability.TRUE, null, synthetic);
}

public FieldAttribute(
Source source,
FieldAttribute parent,
Expand All @@ -64,7 +65,11 @@ public FieldAttribute(
this(source, parent, name, field.getDataType(), field, nullability, id, synthetic);
}

public FieldAttribute(
/**
* Used only for testing. Do not use this otherwise, as an explicitly set type will be ignored the next time this FieldAttribute is
* {@link FieldAttribute#clone}d.
*/
FieldAttribute(
Source source,
FieldAttribute parent,
String name,
Expand Down Expand Up @@ -147,28 +152,7 @@ public String getWriteableName() {

@Override
protected NodeInfo<FieldAttribute> info() {
return NodeInfo.create(
this,
(source, parent1, name, type, field1, qualifier, nullability, id, synthetic) -> new FieldAttribute(
source,
parent1,
name,
type,
field1,
qualifier,
nullability,
id,
synthetic
),
parent,
name(),
dataType(),
field,
(String) null,
nullable(),
id(),
synthetic()
);
return NodeInfo.create(this, FieldAttribute::new, parent, name(), dataType(), field, (String) null, nullable(), id(), synthetic());
}

public FieldAttribute parent() {
Expand All @@ -185,9 +169,9 @@ public String path() {
public String fieldName() {
// Before 8.15, the field name was the same as the attribute's name.
// On later versions, the attribute can be renamed when creating synthetic attributes.
// TODO: We should use synthetic() to check for that case.
// https://github.com/elastic/elasticsearch/issues/105821
if (name().startsWith(SYNTHETIC_ATTRIBUTE_NAME_PREFIX) == false) {
// Because until 8.15, we couldn't set `synthetic` to true due to a bug, in that version such FieldAttributes are marked by their
// name starting with `$$`.
if ((synthetic() || name().startsWith(SYNTHETIC_ATTRIBUTE_NAME_PREFIX)) == false) {
return name();
}
return Strings.hasText(path) ? path + "." + field.getName() : field.getName();
Expand All @@ -211,9 +195,15 @@ private FieldAttribute innerField(EsField type) {

@Override
protected Attribute clone(Source source, String name, DataType type, Nullability nullability, NameId id, boolean synthetic) {
// Ignore `type`, this must be the same as the field's type.
return new FieldAttribute(source, parent, name, field, nullability, id, synthetic);
}

@Override
public Attribute withDataType(DataType type) {
throw new UnsupportedOperationException("FieldAttribute obtains its type from the contained EsField.");
}

@Override
public int hashCode() {
return Objects.hash(super.hashCode(), path, field);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,10 @@ public NameId id() {
return id;
}

/**
* Synthetic named expressions are not user defined and usually created during optimizations and substitutions, e.g. when turning
* {@code ... | STATS x = avg(2*field)} into {@code ... | EVAL $$synth$attribute = 2*field | STATS x = avg($$synth$attribute)}.
*/
public boolean synthetic() {
return synthetic;
}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0; you may not use this file except in compliance with the Elastic License
* 2.0.
*/

package org.elasticsearch.xpack.esql.core.expression;

import org.elasticsearch.xpack.esql.core.tree.Source;
import org.elasticsearch.xpack.esql.core.type.DataType;
import org.elasticsearch.xpack.esql.core.type.EsField;

public class FieldAttributeTestUtils {
public static final FieldAttribute newFieldAttributeWithType(
Source source,
FieldAttribute parent,
String name,
DataType type,
EsField field,
Nullability nullability,
NameId id,
boolean synthetic
) {
return new FieldAttribute(source, parent, name, type, field, nullability, id, synthetic);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,24 @@

public enum AggregatorMode {

/**
* Maps raw inputs to intermediate outputs.
*/
INITIAL(false, true),

/**
* Maps intermediate inputs to intermediate outputs.
*/
INTERMEDIATE(true, true),

/**
* Maps intermediate inputs to final outputs.
*/
FINAL(true, false),

// most useful for testing
/**
* Maps raw inputs to final outputs. Most useful for testing.
*/
SINGLE(false, false);

private final boolean inputPartial;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,6 @@
import org.elasticsearch.xpack.esql.core.expression.Expressions;
import org.elasticsearch.xpack.esql.core.expression.FieldAttribute;
import org.elasticsearch.xpack.esql.core.expression.Literal;
import org.elasticsearch.xpack.esql.core.expression.NameId;
import org.elasticsearch.xpack.esql.core.expression.NamedExpression;
import org.elasticsearch.xpack.esql.core.expression.Nullability;
import org.elasticsearch.xpack.esql.core.expression.ReferenceAttribute;
Expand Down Expand Up @@ -65,7 +64,6 @@
import org.elasticsearch.xpack.esql.expression.predicate.operator.arithmetic.EsqlArithmeticOperation;
import org.elasticsearch.xpack.esql.expression.predicate.operator.comparison.In;
import org.elasticsearch.xpack.esql.index.EsIndex;
import org.elasticsearch.xpack.esql.optimizer.LogicalPlanOptimizer;
import org.elasticsearch.xpack.esql.plan.TableIdentifier;
import org.elasticsearch.xpack.esql.plan.logical.Drop;
import org.elasticsearch.xpack.esql.plan.logical.Enrich;
Expand Down Expand Up @@ -1249,12 +1247,10 @@ private Expression createIfDoesNotAlreadyExist(
List<FieldAttribute> unionFieldAttributes
) {
// Generate new ID for the field and suffix it with the data type to maintain unique attribute names.
String unionTypedFieldName = LogicalPlanOptimizer.rawTemporaryName(
fa.name(),
"converted_to",
resolvedField.getDataType().typeName()
);
FieldAttribute unionFieldAttribute = new FieldAttribute(fa.source(), fa.parent(), unionTypedFieldName, resolvedField);
alex-spies marked this conversation as resolved.
Show resolved Hide resolved
// NOTE: The name has to start with $$ to not break bwc with 8.15 - in that version, this is how we had to mark this as
// synthetic to work around a bug.
String unionTypedFieldName = Attribute.rawTemporaryName(fa.name(), "converted_to", resolvedField.getDataType().typeName());
FieldAttribute unionFieldAttribute = new FieldAttribute(fa.source(), fa.parent(), unionTypedFieldName, resolvedField, true);
int existingIndex = unionFieldAttributes.indexOf(unionFieldAttribute);
if (existingIndex >= 0) {
// Do not generate multiple name/type combinations with different IDs
Expand All @@ -1280,8 +1276,16 @@ private MultiTypeEsField resolvedMultiTypeEsField(FieldAttribute fa, HashMap<Typ

private Expression typeSpecificConvert(AbstractConvertFunction convert, Source source, DataType type, InvalidMappedField mtf) {
EsField field = new EsField(mtf.getName(), type, mtf.getProperties(), mtf.isAggregatable());
NameId id = ((FieldAttribute) convert.field()).id();
FieldAttribute resolvedAttr = new FieldAttribute(source, null, field.getName(), field, Nullability.TRUE, id, false);
FieldAttribute originalFieldAttr = (FieldAttribute) convert.field();
FieldAttribute resolvedAttr = new FieldAttribute(
source,
originalFieldAttr.parent(),
originalFieldAttr.name(),
field,
originalFieldAttr.nullable(),
originalFieldAttr.id(),
Comment on lines +1280 to +1286
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Carrying over more info from the original field attribute is technically unrelated; this seemed more correct, though, as the field attribute created here can lose the parent, the full name in case of subfields, and nullability info - which could serve as excellent red herrings in case someone runs across this while debugging in the future.

true
);
return convert.replaceChildren(Collections.singletonList(resolvedAttr));
}
}
Expand Down Expand Up @@ -1330,11 +1334,11 @@ private static LogicalPlan planWithoutSyntheticAttributes(LogicalPlan plan) {
List<Attribute> newOutput = new ArrayList<>(output.size());

for (Attribute attr : output) {
// TODO: this should really use .synthetic()
// https://github.com/elastic/elasticsearch/issues/105821
if (attr.name().startsWith(FieldAttribute.SYNTHETIC_ATTRIBUTE_NAME_PREFIX) == false) {
newOutput.add(attr);
// Do not let the synthetic union type field attributes end up in the final output.
if (attr.synthetic() && attr instanceof FieldAttribute) {
continue;
}
newOutput.add(attr);
}

return newOutput.size() == output.size() ? plan : new Project(Source.EMPTY, plan, newOutput);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -552,6 +552,10 @@ private Tuple<List<Attribute>, List<Stat>> pushableStats(AggregateExec aggregate
singletonList(agg),
emptyList()
);
// TODO: the attributes have been recreated here; they will have wrong name ids, and the dependency check will
// probably fail when we fix https://github.com/elastic/elasticsearch/issues/105436.
// We may need to refactor AbstractPhysicalOperationProviders.intermediateAttributes so it doesn't return just
// a list of attributes, but a mapping from the logical to the physical attributes.
tuple.v1().addAll(intermediateAttributes);
tuple.v2().add(stat);
}
Expand Down Expand Up @@ -604,6 +608,7 @@ && allowedForDocValues(fieldAttribute, agg, foundAttributes)) {
agg.groupings(),
orderedAggregates,
agg.getMode(),
agg.intermediateAttributes(),
agg.estimatedRowSize()
);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@
import org.elasticsearch.xpack.esql.core.expression.AttributeMap;
import org.elasticsearch.xpack.esql.core.expression.Expression;
import org.elasticsearch.xpack.esql.core.expression.Expressions;
import org.elasticsearch.xpack.esql.core.expression.FieldAttribute;
import org.elasticsearch.xpack.esql.core.expression.NameId;
import org.elasticsearch.xpack.esql.core.expression.NamedExpression;
import org.elasticsearch.xpack.esql.core.expression.ReferenceAttribute;
Expand Down Expand Up @@ -123,15 +122,11 @@ public LogicalPlanOptimizer(LogicalOptimizerContext optimizerContext) {
public static String temporaryName(Expression inner, Expression outer, int suffix) {
String in = toString(inner);
String out = toString(outer);
return rawTemporaryName(in, out, String.valueOf(suffix));
return Attribute.rawTemporaryName(in, out, String.valueOf(suffix));
}

public static String locallyUniqueTemporaryName(String inner, String outer) {
return FieldAttribute.SYNTHETIC_ATTRIBUTE_NAME_PREFIX + inner + "$" + outer + "$" + new NameId();
}

public static String rawTemporaryName(String inner, String outer, String suffix) {
return FieldAttribute.SYNTHETIC_ATTRIBUTE_NAME_PREFIX + inner + "$" + outer + "$" + suffix;
return Attribute.rawTemporaryName(inner, outer, (new NameId()).toString());
}

static String toString(Expression ex) {
Expand Down Expand Up @@ -373,9 +368,7 @@ private static AttributeReplacement renameAttributesInExpressions(
if (attributeNamesToRename.contains(attr.name())) {
Alias renamedAttribute = aliasesForReplacedAttributes.computeIfAbsent(attr, a -> {
String tempName = locallyUniqueTemporaryName(a.name(), "temp_name");
// TODO: this should be synthetic
// blocked on https://github.com/elastic/elasticsearch/issues/98703
return new Alias(a.source(), tempName, a, null, false);
return new Alias(a.source(), tempName, a, null, true);
});
return renamedAttribute.toAttribute();
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,12 @@

import org.elasticsearch.xpack.esql.capabilities.Validatable;
import org.elasticsearch.xpack.esql.common.Failures;
import org.elasticsearch.xpack.esql.optimizer.OptimizerRules.LogicalPlanDependencyCheck;
import org.elasticsearch.xpack.esql.optimizer.OptimizerRules.DependencyConsistency;
import org.elasticsearch.xpack.esql.plan.logical.LogicalPlan;

public final class LogicalVerifier {

private static final LogicalPlanDependencyCheck DEPENDENCY_CHECK = new LogicalPlanDependencyCheck();
private static final DependencyConsistency<LogicalPlan> DEPENDENCY_CHECK = new DependencyConsistency<>();
public static final LogicalVerifier INSTANCE = new LogicalVerifier();

private LogicalVerifier() {}
Expand Down
Loading