Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: Fix synthetic attribute pruning #111413

Merged
merged 51 commits into from
Sep 4, 2024
Merged
Show file tree
Hide file tree
Changes from 36 commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
205dfad
Use AggregatorMode instead of AggregateExec.Mode
alex-spies Jul 29, 2024
1cabef2
Make AggregateExec track intermediateAttributes
alex-spies Jul 29, 2024
54e0436
Implement QueryPlan.requiredInputSet
alex-spies Jul 29, 2024
890701e
Simplify and correct ProjectAwayColumns
alex-spies Jul 29, 2024
bb0c926
Small refactor
alex-spies Jul 29, 2024
f22f3a1
Squash some minor mistakes
alex-spies Jul 29, 2024
dbc9199
Update tests
alex-spies Jul 29, 2024
741dbd7
Simplify DependencyConsistency
alex-spies Jul 30, 2024
9e4dab5
Small refactor
alex-spies Jul 30, 2024
d78f073
Tiny refactor
alex-spies Jul 30, 2024
9695190
Start using synthetic=true again
alex-spies Jul 30, 2024
c343812
Merge remote-tracking branch 'upstream/main' into fix-synthetic-attri…
alex-spies Jul 30, 2024
c5e74be
Update docs/changelog/111413.yaml
alex-spies Jul 30, 2024
ac2d6e4
Do not filter out <no-fields> in the Analyzer
alex-spies Jul 30, 2024
54e1847
Merge remote-tracking branch 'upstream/main' into fix-synthetic-attri…
alex-spies Jul 30, 2024
7db76d8
Merge remote-tracking branch 'upstream/main' into fix-synthetic-attri…
alex-spies Aug 14, 2024
01d4c5f
Fix AggregateExec ser/de tests, hash, equals
alex-spies Aug 14, 2024
5443fef
Merge remote-tracking branch 'upstream/main' into fix-synthetic-attri…
alex-spies Aug 19, 2024
ddfdd2c
Add and use new constructor for FieldAttribute
alex-spies Aug 19, 2024
e7c212e
Mark another one as synthetic
alex-spies Aug 19, 2024
3c648b3
Rename requiredInputSet->childrenReferences
alex-spies Aug 19, 2024
4f01e7c
Synthetic as :s in debug output, e.g. field{f:s}#1
alex-spies Aug 19, 2024
ff8b72d
Spotless
alex-spies Aug 19, 2024
e4b258a
Merge remote-tracking branch 'upstream/main' into fix-synthetic-attri…
alex-spies Aug 20, 2024
9ad9e1c
Revert changes to AggregateExec
alex-spies Aug 20, 2024
7ff1991
Revert "Revert changes to AggregateExec"
alex-spies Aug 20, 2024
2eabfc2
Avoid using QueryPlan.references
alex-spies Aug 20, 2024
f1b9b86
Replace references with childrenReferences
alex-spies Aug 20, 2024
63c39a7
Fix a whoopsie
alex-spies Aug 20, 2024
6028664
Improve comments
alex-spies Aug 20, 2024
e193fb1
Add new transport version for AggregateExec
alex-spies Aug 20, 2024
ac44b1b
More tests for ProjectAwayColumns
alex-spies Aug 21, 2024
41e9159
Merge remote-tracking branch 'upstream/main' into fix-synthetic-attri…
alex-spies Aug 21, 2024
febb40f
Improve comment
alex-spies Aug 21, 2024
93cf2cf
Don't make computeReferences public
alex-spies Aug 21, 2024
9dbd5ea
Checkstyle
alex-spies Aug 21, 2024
d4dc892
Reduce noise via static imports
alex-spies Aug 22, 2024
0505dc4
Synthetic label: {f$} instead of {f:s}
alex-spies Aug 22, 2024
6f36f70
Make synth attr prefix inaccessible
alex-spies Aug 22, 2024
ec67cf2
Merge remote-tracking branch 'upstream/main' into fix-synthetic-attri…
alex-spies Aug 22, 2024
259d93f
Make FieldAttribute ctor with type package private
alex-spies Aug 22, 2024
7849978
Merge remote-tracking branch 'upstream/main' into fix-synthetic-attri…
alex-spies Aug 22, 2024
3582136
Merge remote-tracking branch 'upstream/main' into fix-synthetic-attri…
alex-spies Aug 28, 2024
d1f55e3
Refactor fieldNames for METRICS + add test
alex-spies Aug 28, 2024
776ed78
Update comment
alex-spies Aug 28, 2024
77431f6
Undo refactor to fieldNames; rely on index mode
alex-spies Aug 29, 2024
7c3aaf5
Merge remote-tracking branch 'upstream/main' into fix-synthetic-attri…
alex-spies Aug 29, 2024
e557eb4
Merge remote-tracking branch 'upstream/main' into fix-synthetic-attri…
alex-spies Aug 30, 2024
cb69899
Merge remote-tracking branch 'upstream/main' into fix-synthetic-attri…
alex-spies Sep 3, 2024
f65747d
Use default implementation for computeReferences
alex-spies Sep 3, 2024
a7b8715
Rename static helpers in Aggregate and Eval
alex-spies Sep 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/changelog/111413.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 111413
summary: "ESQL: Fix synthetic attribute pruning"
area: ES|QL
type: bug
issues:
- 105821
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,7 @@ static TransportVersion def(int id) {
public static final TransportVersion RANDOM_RERANKER_RETRIEVER = def(8_724_00_0);
public static final TransportVersion ESQL_PROFILE_SLEEPS = def(8_725_00_0);
public static final TransportVersion ZDT_NANOS_SUPPORT = def(8_726_00_0);
public static final TransportVersion ESQL_AGGREGATE_EXEC_TRACKS_INTERMEDIATE_ATTRS = def(8_727_00_0);

/*
* STOP! READ THIS FIRST! No, really,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ public boolean equals(Object obj) {

@Override
public String toString() {
return name() + "{" + label() + "}" + "#" + id();
return name() + "{" + label() + (synthetic() ? ":s" : "") + "}" + "#" + id();
alex-spies marked this conversation as resolved.
Show resolved Hide resolved
}

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,7 @@
* - nestedParent - if nested, what's the parent (which might not be the immediate one)
*/
public class FieldAttribute extends TypedAttribute {
// TODO: This constant should not be used if possible; use .synthetic()
// https://github.com/elastic/elasticsearch/issues/105821
@Deprecated()
public static final String SYNTHETIC_ATTRIBUTE_NAME_PREFIX = "$$";
alex-spies marked this conversation as resolved.
Show resolved Hide resolved

static final NamedWriteableRegistry.Entry ENTRY = new NamedWriteableRegistry.Entry(
Expand All @@ -52,6 +51,10 @@ public FieldAttribute(Source source, FieldAttribute parent, String name, EsField
this(source, parent, name, field, Nullability.TRUE, null, false);
}

public FieldAttribute(Source source, FieldAttribute parent, String name, EsField field, boolean synthetic) {
this(source, parent, name, field, Nullability.TRUE, null, synthetic);
}

public FieldAttribute(
Source source,
FieldAttribute parent,
Expand All @@ -64,6 +67,8 @@ public FieldAttribute(
this(source, parent, name, field.getDataType(), field, nullability, id, synthetic);
}

// TODO: Should this become private? An explicitly set type will be discarded the moment this is `clone`d, so using this constructor is
alex-spies marked this conversation as resolved.
Show resolved Hide resolved
// deceptive.
public FieldAttribute(
Source source,
FieldAttribute parent,
Expand Down Expand Up @@ -147,28 +152,7 @@ public String getWriteableName() {

@Override
protected NodeInfo<FieldAttribute> info() {
return NodeInfo.create(
this,
(source, parent1, name, type, field1, qualifier, nullability, id, synthetic) -> new FieldAttribute(
source,
parent1,
name,
type,
field1,
qualifier,
nullability,
id,
synthetic
),
parent,
name(),
dataType(),
field,
(String) null,
nullable(),
id(),
synthetic()
);
return NodeInfo.create(this, FieldAttribute::new, parent, name(), dataType(), field, (String) null, nullable(), id(), synthetic());
}

public FieldAttribute parent() {
Expand All @@ -185,9 +169,9 @@ public String path() {
public String fieldName() {
// Before 8.15, the field name was the same as the attribute's name.
// On later versions, the attribute can be renamed when creating synthetic attributes.
// TODO: We should use synthetic() to check for that case.
// https://github.com/elastic/elasticsearch/issues/105821
if (name().startsWith(SYNTHETIC_ATTRIBUTE_NAME_PREFIX) == false) {
// Because until 8.15, we couldn't set `synthetic` to true due to a bug, in that version such FieldAttributes are marked by their
// name starting with `$$`.
if ((synthetic() || name().startsWith(SYNTHETIC_ATTRIBUTE_NAME_PREFIX)) == false) {
return name();
}
return Strings.hasText(path) ? path + "." + field.getName() : field.getName();
Expand All @@ -211,6 +195,7 @@ private FieldAttribute innerField(EsField type) {

@Override
protected Attribute clone(Source source, String name, DataType type, Nullability nullability, NameId id, boolean synthetic) {
// Ignore `type`, this must be the same as the field's type.
return new FieldAttribute(source, parent, name, field, nullability, id, synthetic);
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,10 @@ public NameId id() {
return id;
}

/**
* Synthetic named expressions are not user defined and usually created during optimizations and substitutions, e.g. when turning
* {@code ... | STATS x = avg(2*field)} into {@code ... | EVAL $$synth$attribute = 2*field | STATS x = avg($$synth$attribute)}.
*/
public boolean synthetic() {
return synthetic;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,24 @@

public enum AggregatorMode {

/**
* Maps raw inputs to intermediate outputs.
*/
INITIAL(false, true),

/**
* Maps intermediate inputs to intermediate outputs.
*/
INTERMEDIATE(true, true),

/**
* Maps intermediate inputs to final outputs.
*/
FINAL(true, false),

// most useful for testing
/**
* Maps raw inputs to final outputs. Most useful for testing.
*/
SINGLE(false, false);

private final boolean inputPartial;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,6 @@
import org.elasticsearch.xpack.esql.core.expression.Expressions;
import org.elasticsearch.xpack.esql.core.expression.FieldAttribute;
import org.elasticsearch.xpack.esql.core.expression.Literal;
import org.elasticsearch.xpack.esql.core.expression.NameId;
import org.elasticsearch.xpack.esql.core.expression.NamedExpression;
import org.elasticsearch.xpack.esql.core.expression.Nullability;
import org.elasticsearch.xpack.esql.core.expression.ReferenceAttribute;
Expand Down Expand Up @@ -1169,12 +1168,14 @@ private Expression createIfDoesNotAlreadyExist(
List<FieldAttribute> unionFieldAttributes
) {
// Generate new ID for the field and suffix it with the data type to maintain unique attribute names.
// NOTE: The name has to start with $$ to not break bwc with 8.15 - in that version, this is how we had to mark this as
// synthetic to work around a bug.
String unionTypedFieldName = LogicalPlanOptimizer.rawTemporaryName(
fa.name(),
"converted_to",
resolvedField.getDataType().typeName()
);
FieldAttribute unionFieldAttribute = new FieldAttribute(fa.source(), fa.parent(), unionTypedFieldName, resolvedField);
alex-spies marked this conversation as resolved.
Show resolved Hide resolved
FieldAttribute unionFieldAttribute = new FieldAttribute(fa.source(), fa.parent(), unionTypedFieldName, resolvedField, true);
int existingIndex = unionFieldAttributes.indexOf(unionFieldAttribute);
if (existingIndex >= 0) {
// Do not generate multiple name/type combinations with different IDs
Expand All @@ -1200,8 +1201,16 @@ private MultiTypeEsField resolvedMultiTypeEsField(FieldAttribute fa, HashMap<Typ

private Expression typeSpecificConvert(AbstractConvertFunction convert, Source source, DataType type, InvalidMappedField mtf) {
EsField field = new EsField(mtf.getName(), type, mtf.getProperties(), mtf.isAggregatable());
NameId id = ((FieldAttribute) convert.field()).id();
FieldAttribute resolvedAttr = new FieldAttribute(source, null, field.getName(), field, Nullability.TRUE, id, false);
FieldAttribute originalFieldAttr = (FieldAttribute) convert.field();
FieldAttribute resolvedAttr = new FieldAttribute(
source,
originalFieldAttr.parent(),
originalFieldAttr.name(),
field,
originalFieldAttr.nullable(),
originalFieldAttr.id(),
Comment on lines +1280 to +1286
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Carrying over more info from the original field attribute is technically unrelated; this seemed more correct, though, as the field attribute created here can lose the parent, the full name in case of subfields, and nullability info - which could serve as excellent red herrings in case someone runs across this while debugging in the future.

true
);
return convert.replaceChildren(Collections.singletonList(resolvedAttr));
}
}
Expand Down Expand Up @@ -1250,11 +1259,11 @@ private static LogicalPlan planWithoutSyntheticAttributes(LogicalPlan plan) {
List<Attribute> newOutput = new ArrayList<>(output.size());

for (Attribute attr : output) {
// TODO: this should really use .synthetic()
// https://github.com/elastic/elasticsearch/issues/105821
if (attr.name().startsWith(FieldAttribute.SYNTHETIC_ATTRIBUTE_NAME_PREFIX) == false) {
newOutput.add(attr);
// Do not let the synthetic union type field attributes end up in the final output.
if (attr.synthetic() && attr instanceof FieldAttribute) {
continue;
}
newOutput.add(attr);
}

return newOutput.size() == output.size() ? plan : new Project(Source.EMPTY, plan, newOutput);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -552,6 +552,10 @@ private Tuple<List<Attribute>, List<Stat>> pushableStats(AggregateExec aggregate
singletonList(agg),
emptyList()
);
// TODO: the attributes have been recreated here; they will have wrong name ids, and the dependency check will
// probably fail when we fix https://github.com/elastic/elasticsearch/issues/105436.
// We may need to refactor AbstractPhysicalOperationProviders.intermediateAttributes so it doesn't return just
// a list of attributes, but a mapping from the logical to the physical attributes.
tuple.v1().addAll(intermediateAttributes);
tuple.v2().add(stat);
}
Expand Down Expand Up @@ -604,6 +608,7 @@ && allowedForDocValues(fieldAttribute, agg, foundAttributes)) {
agg.groupings(),
orderedAggregates,
agg.getMode(),
agg.intermediateAttributes(),
agg.estimatedRowSize()
);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ public static String temporaryName(Expression inner, Expression outer, int suffi
}

public static String locallyUniqueTemporaryName(String inner, String outer) {
return FieldAttribute.SYNTHETIC_ATTRIBUTE_NAME_PREFIX + inner + "$" + outer + "$" + new NameId();
return rawTemporaryName(inner, outer, (new NameId()).toString());
}

public static String rawTemporaryName(String inner, String outer, String suffix) {
Expand Down Expand Up @@ -372,9 +372,7 @@ private static AttributeReplacement renameAttributesInExpressions(
if (attributeNamesToRename.contains(attr.name())) {
Alias renamedAttribute = aliasesForReplacedAttributes.computeIfAbsent(attr, a -> {
String tempName = locallyUniqueTemporaryName(a.name(), "temp_name");
// TODO: this should be synthetic
// blocked on https://github.com/elastic/elasticsearch/issues/98703
return new Alias(a.source(), tempName, a, null, false);
return new Alias(a.source(), tempName, a, null, true);
});
return renamedAttribute.toAttribute();
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,12 @@

import org.elasticsearch.xpack.esql.capabilities.Validatable;
import org.elasticsearch.xpack.esql.common.Failures;
import org.elasticsearch.xpack.esql.optimizer.OptimizerRules.LogicalPlanDependencyCheck;
import org.elasticsearch.xpack.esql.optimizer.OptimizerRules.DependencyConsistency;
import org.elasticsearch.xpack.esql.plan.logical.LogicalPlan;

public final class LogicalVerifier {

private static final LogicalPlanDependencyCheck DEPENDENCY_CHECK = new LogicalPlanDependencyCheck();
private static final DependencyConsistency<LogicalPlan> DEPENDENCY_CHECK = new DependencyConsistency<>();
public static final LogicalVerifier INSTANCE = new LogicalVerifier();

private LogicalVerifier() {}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,32 +10,8 @@
import org.elasticsearch.xpack.esql.common.Failures;
import org.elasticsearch.xpack.esql.core.expression.Attribute;
import org.elasticsearch.xpack.esql.core.expression.AttributeSet;
import org.elasticsearch.xpack.esql.core.expression.Expressions;
import org.elasticsearch.xpack.esql.core.expression.NameId;
import org.elasticsearch.xpack.esql.plan.GeneratingPlan;
import org.elasticsearch.xpack.esql.plan.QueryPlan;
import org.elasticsearch.xpack.esql.plan.logical.Aggregate;
import org.elasticsearch.xpack.esql.plan.logical.Enrich;
import org.elasticsearch.xpack.esql.plan.logical.EsRelation;
import org.elasticsearch.xpack.esql.plan.logical.LogicalPlan;
import org.elasticsearch.xpack.esql.plan.logical.MvExpand;
import org.elasticsearch.xpack.esql.plan.logical.Row;
import org.elasticsearch.xpack.esql.plan.logical.local.LocalRelation;
import org.elasticsearch.xpack.esql.plan.physical.AggregateExec;
import org.elasticsearch.xpack.esql.plan.physical.EnrichExec;
import org.elasticsearch.xpack.esql.plan.physical.EsQueryExec;
import org.elasticsearch.xpack.esql.plan.physical.EsSourceExec;
import org.elasticsearch.xpack.esql.plan.physical.EsStatsQueryExec;
import org.elasticsearch.xpack.esql.plan.physical.EvalExec;
import org.elasticsearch.xpack.esql.plan.physical.ExchangeExec;
import org.elasticsearch.xpack.esql.plan.physical.ExchangeSourceExec;
import org.elasticsearch.xpack.esql.plan.physical.FieldExtractExec;
import org.elasticsearch.xpack.esql.plan.physical.LocalSourceExec;
import org.elasticsearch.xpack.esql.plan.physical.MvExpandExec;
import org.elasticsearch.xpack.esql.plan.physical.PhysicalPlan;
import org.elasticsearch.xpack.esql.plan.physical.RegexExtractExec;
import org.elasticsearch.xpack.esql.plan.physical.RowExec;
import org.elasticsearch.xpack.esql.plan.physical.ShowExec;

import java.util.HashSet;
import java.util.Set;
Expand All @@ -49,10 +25,11 @@ private OptimizerRules() {}
static class DependencyConsistency<P extends QueryPlan<P>> {

void checkPlan(P p, Failures failures) {
AttributeSet refs = references(p);
AttributeSet refs = p.references();
AttributeSet input = p.inputSet();
AttributeSet generated = generates(p);
AttributeSet missing = refs.subtract(input).subtract(generated);
AttributeSet missing = refs.subtract(input);
// TODO: for Joins, we should probably check if the required fields from the left child are actually in the left child, not
// just any child (and analogously for the right child).
if (missing.isEmpty() == false) {
failures.add(fail(p, "Plan [{}] optimized incorrectly due to missing references {}", p.nodeString(), missing));
}
Expand All @@ -72,95 +49,5 @@ void checkPlan(P p, Failures failures) {
}
}
}

protected AttributeSet references(P p) {
return p.references();
}

protected AttributeSet generates(P p) {
return AttributeSet.EMPTY;
}
}

static class LogicalPlanDependencyCheck extends DependencyConsistency<LogicalPlan> {
@Override
protected AttributeSet references(LogicalPlan plan) {
if (plan instanceof Enrich enrich) {
// The enrichFields are NamedExpressions, so we compute their references as well when just calling enrich.references().
// But they are not actually referring to attributes from the input plan - only the match field does.
return enrich.matchField().references();
}
return super.references(plan);
}

@Override
protected AttributeSet generates(LogicalPlan logicalPlan) {
// source-like operators
if (logicalPlan instanceof EsRelation
|| logicalPlan instanceof LocalRelation
|| logicalPlan instanceof Row
|| logicalPlan instanceof Aggregate) {
return logicalPlan.outputSet();
}
if (logicalPlan instanceof GeneratingPlan<?> generating) {
return new AttributeSet(generating.generatedAttributes());
}
if (logicalPlan instanceof MvExpand mvExpand) {
return new AttributeSet(mvExpand.expanded());
}

return AttributeSet.EMPTY;
}
}

static class PhysicalPlanDependencyCheck extends DependencyConsistency<PhysicalPlan> {
@Override
protected AttributeSet generates(PhysicalPlan physicalPlan) {
// source-like operators
if (physicalPlan instanceof EsSourceExec
|| physicalPlan instanceof EsStatsQueryExec
|| physicalPlan instanceof EsQueryExec
|| physicalPlan instanceof LocalSourceExec
|| physicalPlan instanceof RowExec
|| physicalPlan instanceof ExchangeExec
|| physicalPlan instanceof ExchangeSourceExec
|| physicalPlan instanceof AggregateExec
|| physicalPlan instanceof ShowExec) {
return physicalPlan.outputSet();
}

if (physicalPlan instanceof FieldExtractExec fieldExtractExec) {
return new AttributeSet(fieldExtractExec.attributesToExtract());
}
if (physicalPlan instanceof EvalExec eval) {
return new AttributeSet(Expressions.asAttributes(eval.fields()));
}
if (physicalPlan instanceof RegexExtractExec extract) {
return new AttributeSet(extract.extractedFields());
}
if (physicalPlan instanceof MvExpandExec mvExpand) {
return new AttributeSet(mvExpand.expanded());
}
if (physicalPlan instanceof EnrichExec enrich) {
return new AttributeSet(Expressions.asAttributes(enrich.enrichFields()));
}

return AttributeSet.EMPTY;
}

@Override
protected AttributeSet references(PhysicalPlan plan) {
if (plan instanceof AggregateExec aggregate) {
if (aggregate.getMode() == AggregateExec.Mode.FINAL) {
// lousy hack - need to generate the intermediate aggs yet the intermediateAggs method keep creating new IDs on each
// call
// in practice, the final aggregate should clearly declare the expected properties not hold on the original ones
// as they no longer apply
return aggregate.inputSet();
}
}
return plan.references();
}
}

}
Loading