Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add children aggregation #6936

Merged
merged 1 commit into from Aug 19, 2014

Conversation

Projects
None yet
4 participants
@martijnvg
Copy link
Member

martijnvg commented Jul 21, 2014

Add children bucket aggregation that is able to map buckets between parent types and child types based on top of the parent/child support. It is the equivalent of the has_child filter/query in the query dsl.

Example request:

GET /stack/question/_search?search_type=count
{
  "aggs": {
    "top-tags": {
      "terms": {
        "field": "tags",
        "size": 10
      },
      "aggs": {
        "to-answers": {
          "children": {
            "child_type" : "answer"
          },
          "aggs": {
            "top-names": {
              "terms": {
                "field": "owner_display_name",
                "size": 10
              }
            }
          }
        }
      }
    }
  }
}

Example response:

{
   "took": 90,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 175275,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "top-tags": {
         "buckets": [
            {
               "key": "windows-7",
               "doc_count": 25365,
               "to-answers": {
                  "doc_count": 36004,
                  "top-names": {
                     "buckets": [
                        {
                           "key": "molly7244",
                           "doc_count": 274
                        },
                        {
                           "key": "chris",
                           "doc_count": 19
                        },
                        {
                           "key": "david",
                           "doc_count": 14
                        },
                        {
                           "key": "dan",
                           "doc_count": 12
                        },
                        {
                           "key": "james",
                           "doc_count": 7
                        },
                        {
                           "key": "b",
                           "doc_count": 5
                        },
                        {
                           "key": "alex",
                           "doc_count": 4
                        },
                        {
                           "key": "peter",
                           "doc_count": 4
                        },
                        {
                           "key": "s",
                           "doc_count": 4
                        },
                        {
                           "key": "cody",
                           "doc_count": 3
                        }
                     ]
                  }
               }
            },
            {
               "key": "linux",
               "doc_count": 18342,
               "to-answers": {
                  "doc_count": 6655,
                  "top-names": {
                     "buckets": [
                        {
                           "key": "abrams",
                           "doc_count": 25
                        },
                        {
                           "key": "ignacio",
                           "doc_count": 25
                        },
                        {
                           "key": "vazquez",
                           "doc_count": 25
                        },
                        {
                           "key": "chris",
                           "doc_count": 9
                        },
                        {
                           "key": "michael",
                           "doc_count": 7
                        },
                        {
                           "key": "basile",
                           "doc_count": 6
                        },
                        {
                           "key": "david",
                           "doc_count": 6
                        },
                        {
                           "key": "alex",
                           "doc_count": 4
                        },
                        {
                           "key": "botykai",
                           "doc_count": 3
                        },
                        {
                           "key": "paul",
                           "doc_count": 3
                        }
                     ]
                  }
               }
            },
            {
               "key": "windows",
               "doc_count": 18119,
               "to-answers": {
                  "doc_count": 24051,
                  "top-names": {
                     "buckets": [
                        {
                           "key": "molly7244",
                           "doc_count": 265
                        },
                        {
                           "key": "david",
                           "doc_count": 27
                        },
                        {
                           "key": "chris",
                           "doc_count": 26
                        },
                        {
                           "key": "diago",
                           "doc_count": 9
                        },
                        {
                           "key": "john",
                           "doc_count": 7
                        },
                        {
                           "key": "paxdiablo",
                           "doc_count": 7
                        },
                        {
                           "key": "ben",
                           "doc_count": 6
                        },
                        {
                           "key": "mark",
                           "doc_count": 6
                        },
                        {
                           "key": "adam",
                           "doc_count": 5
                        },
                        {
                           "key": "c",
                           "doc_count": 5
                        }
                     ]
                  }
               }
            },
            {
               "key": "osx",
               "doc_count": 10971,
               "to-answers": {
                  "doc_count": 5902,
                  "top-names": {
                     "buckets": [
                        {
                           "key": "diago",
                           "doc_count": 4
                        },
                        {
                           "key": "albert",
                           "doc_count": 3
                        },
                        {
                           "key": "asmus",
                           "doc_count": 3
                        },
                        {
                           "key": "molly7244",
                           "doc_count": 3
                        },
                        {
                           "key": "aaron",
                           "doc_count": 2
                        },
                        {
                           "key": "abizern",
                           "doc_count": 2
                        },
                        {
                           "key": "adam",
                           "doc_count": 2
                        },
                        {
                           "key": "duskwuff",
                           "doc_count": 2
                        },
                        {
                           "key": "johnsyweb",
                           "doc_count": 2
                        },
                        {
                           "key": "mark",
                           "doc_count": 2
                        }
                     ]
                  }
               }
            },
            {
               "key": "ubuntu",
               "doc_count": 8743,
               "to-answers": {
                  "doc_count": 8784,
                  "top-names": {
                     "buckets": [
                        {
                           "key": "ignacio",
                           "doc_count": 9
                        },
                        {
                           "key": "abrams",
                           "doc_count": 8
                        },
                        {
                           "key": "molly7244",
                           "doc_count": 8
                        },
                        {
                           "key": "david",
                           "doc_count": 7
                        },
                        {
                           "key": "pate",
                           "doc_count": 6
                        },
                        {
                           "key": "roger",
                           "doc_count": 6
                        },
                        {
                           "key": "chris",
                           "doc_count": 5
                        },
                        {
                           "key": "vazquez",
                           "doc_count": 5
                        },
                        {
                           "key": "paul",
                           "doc_count": 3
                        },
                        {
                           "key": "rob",
                           "doc_count": 3
                        }
                     ]
                  }
               }
            },
            {
               "key": "windows-xp",
               "doc_count": 7517,
               "to-answers": {
                  "doc_count": 13610,
                  "top-names": {
                     "buckets": [
                        {
                           "key": "molly7244",
                           "doc_count": 232
                        },
                        {
                           "key": "chris",
                           "doc_count": 9
                        },
                        {
                           "key": "john",
                           "doc_count": 9
                        },
                        {
                           "key": "david",
                           "doc_count": 8
                        },
                        {
                           "key": "dave",
                           "doc_count": 5
                        },
                        {
                           "key": "b",
                           "doc_count": 4
                        },
                        {
                           "key": "bart",
                           "doc_count": 4
                        },
                        {
                           "key": "joeqwerty",
                           "doc_count": 4
                        },
                        {
                           "key": "mike",
                           "doc_count": 4
                        },
                        {
                           "key": "s",
                           "doc_count": 4
                        }
                     ]
                  }
               }
            },
            {
               "key": "networking",
               "doc_count": 6739,
               "to-answers": {
                  "doc_count": 2076,
                  "top-names": {
                     "buckets": [
                        {
                           "key": "molly7244",
                           "doc_count": 6
                        },
                        {
                           "key": "alnitak",
                           "doc_count": 5
                        },
                        {
                           "key": "chris",
                           "doc_count": 3
                        },
                        {
                           "key": "albin",
                           "doc_count": 2
                        },
                        {
                           "key": "brian",
                           "doc_count": 2
                        },
                        {
                           "key": "everett",
                           "doc_count": 2
                        },
                        {
                           "key": "fishdump",
                           "doc_count": 2
                        },
                        {
                           "key": "m",
                           "doc_count": 2
                        },
                        {
                           "key": "mike",
                           "doc_count": 2
                        },
                        {
                           "key": "p",
                           "doc_count": 2
                        }
                     ]
                  }
               }
            },
            {
               "key": "mac",
               "doc_count": 5590,
               "to-answers": {
                  "doc_count": 999,
                  "top-names": {
                     "buckets": [
                        {
                           "key": "abrams",
                           "doc_count": 2
                        },
                        {
                           "key": "ignacio",
                           "doc_count": 2
                        },
                        {
                           "key": "vazquez",
                           "doc_count": 2
                        },
                        {
                           "key": "adam",
                           "doc_count": 1
                        },
                        {
                           "key": "anon",
                           "doc_count": 1
                        },
                        {
                           "key": "aravindhanarvi",
                           "doc_count": 1
                        },
                        {
                           "key": "arkaaito",
                           "doc_count": 1
                        },
                        {
                           "key": "ballard",
                           "doc_count": 1
                        },
                        {
                           "key": "bart",
                           "doc_count": 1
                        },
                        {
                           "key": "ben",
                           "doc_count": 1
                        }
                     ]
                  }
               }
            },
            {
               "key": "wireless-networking",
               "doc_count": 4409,
               "to-answers": {
                  "doc_count": 6497,
                  "top-names": {
                     "buckets": [
                        {
                           "key": "molly7244",
                           "doc_count": 61
                        },
                        {
                           "key": "chris",
                           "doc_count": 5
                        },
                        {
                           "key": "mike",
                           "doc_count": 5
                        },
                        {
                           "key": "tom",
                           "doc_count": 5
                        },
                        {
                           "key": "bart",
                           "doc_count": 4
                        },
                        {
                           "key": "user48838",
                           "doc_count": 4
                        },
                        {
                           "key": "alex",
                           "doc_count": 2
                        },
                        {
                           "key": "anon31097",
                           "doc_count": 2
                        },
                        {
                           "key": "joeqwerty",
                           "doc_count": 2
                        },
                        {
                           "key": "kevin",
                           "doc_count": 2
                        }
                     ]
                  }
               }
            },
            {
               "key": "windows-8",
               "doc_count": 3601,
               "to-answers": {
                  "doc_count": 4263,
                  "top-names": {
                     "buckets": [
                        {
                           "key": "molly7244",
                           "doc_count": 3
                        },
                        {
                           "key": "msft",
                           "doc_count": 2
                        },
                        {
                           "key": "user172132",
                           "doc_count": 2
                        },
                        {
                           "key": "aj",
                           "doc_count": 1
                        },
                        {
                           "key": "algorithms",
                           "doc_count": 1
                        },
                        {
                           "key": "andersson",
                           "doc_count": 1
                        },
                        {
                           "key": "ashafiee",
                           "doc_count": 1
                        },
                        {
                           "key": "balakrishnan",
                           "doc_count": 1
                        },
                        {
                           "key": "bobby",
                           "doc_count": 1
                        },
                        {
                           "key": "brockschmidt",
                           "doc_count": 1
                        }
                     ]
                  }
               }
            }
         ]
      }
   }
}
This aggregation relies on the <<mapping-parent-field,_parent field>> in the mapping. This aggregation has a single option:
* `child_type` - The what child type the buckets in the parent space should be mapped to.

For example, lets say we have a index of questions and answers. The answer type has the following `_parent` field in the mapping:

This comment has been minimized.

Copy link
@jpountz

jpountz Jul 21, 2014

Contributor

s/lets/let's/

two different kinds of documents.

[source,js]
--------------------------------------------------

This comment has been minimized.

Copy link
@jpountz

jpountz Jul 21, 2014

Contributor

Maybe add a GET /stackoverflow/question to make clear that the query needs to apply to the question type?

super(name, factories, aggregationContext, parent);
this.childFilter = childFilter;
this.parentFilter = parentFilter;
this.parentOrdToBucket = aggregationContext.bigArrays().newLongArray(maxOrd);

This comment has been minimized.

Copy link
@jpountz

jpountz Jul 21, 2014

Contributor

you might want to pass false to not clear it since you fill it on the next line anyway

long globalOrdinal = globalOrdinals.getOrd(docId);
if (globalOrdinal != BytesValues.WithOrdinals.MISSING_ORDINAL) {
parentOrdToBucket.set(globalOrdinal, bucketOrdinal);
}

This comment has been minimized.

Copy link
@jpountz

jpountz Jul 21, 2014

Contributor

What if a parent can match several bucket ordinals?

This comment has been minimized.

Copy link
@martijnvg

martijnvg Jul 21, 2014

Author Member

Good point, that will result in incorrect results. (the last seen bucket, will be the one remembered)
I'll rewrite it such a way that multiple buckets per parent work as expected.

@@ -159,6 +159,7 @@ public void execute(SearchContext searchContext) throws QueryPhaseExecutionExcep
}
searchContext.queryResult().topDocs(topDocs);
} catch (Throwable e) {
e.printStackTrace();

This comment has been minimized.

Copy link
@jpountz

jpountz Jul 21, 2014

Contributor

haha, I know this one :) I forgot to remove it before opening several PRs :)

This comment has been minimized.

Copy link
@martijnvg

martijnvg Jul 21, 2014

Author Member

oops :)

@jpountz

This comment has been minimized.

Copy link
Contributor

jpountz commented Jul 21, 2014

Looks good overall, but I am suspicious that the aggregator does not always do the right thing (see comment in ChildrenAggregator).

@clintongormley

This comment has been minimized.

Copy link
Member

clintongormley commented Jul 28, 2014

Any reason you use child_type when the has_child filter supports just type? I'd prefer type, as the child_ part is redundant.

@martijnvg

This comment has been minimized.

Copy link
Member Author

martijnvg commented Jul 28, 2014

@clintongormley Make sense, the context makes clear that it must be a child type. I will change child_type into type

@martijnvg

This comment has been minimized.

Copy link
Member Author

martijnvg commented Aug 18, 2014

Finally got back to this PR. I applied the feedback. Main changes are:

  • Renamed the child_type option to type
  • Support multiple buckets per parent id. The only situation I can think of when multiple buckets belong to the same parent id, is when the parent agg is a terms agg and that is ran on a multivalued field.

@martijnvg martijnvg added the review label Aug 18, 2014

@jpountz

View changes

docs/reference/search/aggregations/bucket/children-aggregation.asciidoc Outdated
"aggs": {
"top-names": {
"terms": {
"field": "owner_display_name",

This comment has been minimized.

Copy link
@jpountz

jpountz Aug 18, 2014

Contributor

wouldn't it be owner.display_name?

This comment has been minimized.

Copy link
@martijnvg

martijnvg Aug 18, 2014

Author Member

yes it would!

@jpountz

View changes

docs/reference/search/aggregations/bucket/children-aggregation.asciidoc Outdated
"key": "windows-server-2003",
"doc_count": 25365,
"to-answers": {
"doc_count": 36004,

This comment has been minimized.

Copy link
@jpountz

jpountz Aug 18, 2014

Contributor

Maybe put an anchor here to explain that this would be the number of anwers for this tag. I'm not sure how obvious it is.

This comment has been minimized.

Copy link
@jpountz

jpountz Aug 18, 2014

Contributor

(and that the upper count is the number of questions)

@jpountz

View changes

src/main/java/org/elasticsearch/search/aggregations/bucket/children/ChildrenParser.java Outdated
ParentFieldMapper parentFieldMapper = childDocMapper.parentFieldMapper();
if (!parentFieldMapper.active()) {
throw new SearchParseException(context, "[children] _parent field not configured");
}

This comment has been minimized.

Copy link
@jpountz

jpountz Aug 18, 2014

Contributor

The two above checks are the same?

This comment has been minimized.

Copy link
@martijnvg

martijnvg Aug 18, 2014

Author Member

This check is redundant now. Before child type could point to a parent type that did exist yet, this no longer the case.

This comment has been minimized.

Copy link
@martijnvg

martijnvg Aug 18, 2014

Author Member

wait, this is false. You can actually create a mapping with a child type that points to a parent type that doesn't exist, so this validation would prevent from throwing a NPE down the road when we want to go the type filter for the parent type.

@jpountz

View changes

src/main/java/org/elasticsearch/search/aggregations/support/ValuesSource.java Outdated
@@ -288,6 +290,78 @@ public long globalMaxOrd(IndexSearcher indexSearcher) {
}
}

public static class ParentChild extends WithOrdinals implements ReaderContextAware {

This comment has been minimized.

Copy link
@jpountz

jpountz Aug 18, 2014

Contributor

I'm not sure this should extend WithOrdinals? (eg. AtomicParentChildFieldData doesn't extend AtomicOrdinalsFieldData)

@jpountz

View changes

...n/java/org/elasticsearch/search/aggregations/bucket/children/ParentToChildrenAggregator.java Outdated

SortedDocValues globalOrdinals = valuesSource.globalOrdinalsValues(parentType);
DocIdSet childDocIdSet = childFilter.getDocIdSet(atomicReaderContext, atomicReaderContext.reader().getLiveDocs());
DocIdSetIterator childDocsIter = childDocIdSet.iterator();

This comment has been minimized.

Copy link
@jpountz

jpountz Aug 18, 2014

Contributor

Can the iterator sometimes be null?

@jpountz

View changes

...n/java/org/elasticsearch/search/aggregations/bucket/children/ParentToChildrenAggregator.java Outdated
globalOrdinals = valuesSource.globalOrdinalsValues(parentType);
assert globalOrdinals != null;
try {
parentDocs = (Bits) parentFilter.getDocIdSet(reader, null);

This comment has been minimized.

Copy link
@jpountz

jpountz Aug 18, 2014

Contributor

Hmm I think it would be cleaner to call bits() rather than casting?

@jpountz

View changes

...n/java/org/elasticsearch/search/aggregations/bucket/children/ParentToChildrenAggregator.java Outdated
// Only pay the extra storage price if the a parentOrd has multiple buckets
// Most of the times a parent doesn't have multiple buckets, since there is only one document per parent ord,
// only in the case of terms agg if a parent doc has multiple terms per field this is needed:
private final LongBitSet parentOrdWithMultipleBuckets;

This comment has been minimized.

Copy link
@jpountz

jpountz Aug 18, 2014

Contributor

Maybe we should get rid of that bit set and just rely on the fact that the long[] in parentOrdToOtherBuckets is not null?

This comment has been minimized.

Copy link
@martijnvg

martijnvg Aug 18, 2014

Author Member

I think has a significant impact when we replay the child docs later on?
Instead of checking is a bit is set we need to check is a key is set in a
map.

On 18 August 2014 15:00, Adrien Grand notifications@github.com wrote:

In
src/main/java/org/elasticsearch/search/aggregations/bucket/children/ParentToChildrenAggregator.java:

+// aggregation, for this reason that collector can't be used
+public class ParentToChildrenAggregator extends SingleBucketAggregator implements ReaderContextAware {
+

  • private final String parentType;
  • private final Filter childFilter;
  • private final Filter parentFilter;
  • private final ValuesSource.Bytes.WithOrdinals.ParentChild valuesSource;
  • // Maybe use PagedGrowableWriter? This will be less wasteful than LongArray, but then we don't have the reuse feature of BigArrays.
  • // Also if we know the highest possible value that a parent agg will create then we store multiple values into one slot
  • private final LongArray parentOrdToBuckets;
  • // Only pay the extra storage price if the a parentOrd has multiple buckets
  • // Most of the times a parent doesn't have multiple buckets, since there is only one document per parent ord,
  • // only in the case of terms agg if a parent doc has multiple terms per field this is needed:
  • private final LongBitSet parentOrdWithMultipleBuckets;

Maybe we should get rid of that bit set and just rely on the fact that the
long[] in parentOrdToOtherBuckets is not null?


Reply to this email directly or view it on GitHub
https://github.com/elasticsearch/elasticsearch/pull/6936/files#r16351155
.

Met vriendelijke groet,

Martijn van Groningen

This comment has been minimized.

Copy link
@jpountz

jpountz Aug 18, 2014

Contributor

Good question. I'm wondering if a benchmark would really see a difference between both cases (since in the common case the hash table is empty).

Something else I was wondering is whether this aggregation should optimize for the sub-aggregation case. Currently it always allocates an array of length maxOrd so if you put it under a terms aggregation (like in the docs example), this could be pretty memory-intensive. Maybe its default behavior should rather be to use a hash like GlobalOrdinalsStringTermsAggregator.WithHash?

This comment has been minimized.

Copy link
@martijnvg

martijnvg Aug 18, 2014

Author Member

I think since this aggregator extends SingleBucketAggregator we don't need to, since there we'll only be one instance per children definition in the dsl.

@jpountz

View changes

...n/java/org/elasticsearch/search/aggregations/bucket/children/ParentToChildrenAggregator.java Outdated
if (parentOrdWithMultipleBuckets.get(globalOrdinal)) {
long[] bucketOrds = parentOrdToOtherBuckets.get(globalOrdinal);
long[] newBucketOrds = new long[bucketOrds.length + 1]; // Don't oversize
System.arraycopy(bucketOrds, 0, newBucketOrds, 0, bucketOrds.length);

This comment has been minimized.

Copy link
@jpountz

jpountz Aug 18, 2014

Contributor

You can just do: bucketOrds = Arrays.copyOf(bucketsOrds, bucketOrds.length + 1);?

@jpountz jpountz removed the review label Aug 18, 2014

@jpountz

This comment has been minimized.

Copy link
Contributor

jpountz commented Aug 18, 2014

@martijnvg Just did another review.

@martijnvg

This comment has been minimized.

Copy link
Member Author

martijnvg commented Aug 18, 2014

@jpountz Thanks for the review! I implemented your feedback and updated the PR.

@martijnvg martijnvg added the review label Aug 18, 2014

@jpountz

View changes

src/main/java/org/elasticsearch/search/aggregations/support/ValuesSource.java Outdated
@@ -288,6 +290,63 @@ public long globalMaxOrd(IndexSearcher indexSearcher) {
}
}

public static class ParentChild extends Bytes implements ReaderContextAware, TopReaderContextAware {

This comment has been minimized.

Copy link
@jpountz

jpountz Aug 19, 2014

Contributor

Can you put in one level higher so that its qualified name will be ValuesSource.Bytes.ParentChild instead of ValuesSource.Bytes.WithOrdinals.ParentChild?

@jpountz

This comment has been minimized.

Copy link
Contributor

jpountz commented Aug 19, 2014

LGTM

@jpountz jpountz removed the review label Aug 19, 2014

Aggregations: Add `children` bucket aggregator that is able to map bu…
…ckets between parent types and child types using the already builtin parent/child support.

Closes #6936

martijnvg added a commit that referenced this pull request Aug 19, 2014

Aggregations: Add `children` bucket aggregator that is able to map bu…
…ckets between parent types and child types using the already builtin parent/child support.

Closes #6936

@martijnvg martijnvg merged commit 383e64b into elastic:master Aug 19, 2014

martijnvg added a commit that referenced this pull request Sep 8, 2014

Aggregations: Add `children` bucket aggregator that is able to map bu…
…ckets between parent types and child types using the already builtin parent/child support.

Closes #6936
@jsnod

This comment has been minimized.

Copy link
Contributor

jsnod commented Sep 16, 2014

Can we add "Coming in 1.4.0." to the docs page at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-children-aggregation.html ? Just wasted a few hours trying to figure out why this wouldn't work with my 1.3.2 install

@jsnod

This comment has been minimized.

Copy link
Contributor

jsnod commented Sep 16, 2014

Made PR for doc fix: #7755

@martijnvg

This comment has been minimized.

Copy link
Member Author

martijnvg commented Sep 17, 2014

Sorry for the lost time @afx114, your doc change is in now.

@martijnvg martijnvg deleted the martijnvg:feature/children-agg branch May 18, 2015

@clintongormley clintongormley changed the title Aggregations: Add children aggregation Add children aggregation Jun 6, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.