More debugging info for significant_text (#72727)

Adds some extra debugging information to make it clear that you are running `significant_text`. Also adds some using timing information around the `_source` fetch and the `terms` accumulation. This lets you calculate a third useful timing number: the analysis time. It is `collect_ns - fetch_ns - accumulation_ns`. This also adds a half dozen extra REST tests to get a *fairly* comprehensive set of the operations this supports. It doesn't cover all of the significance heuristic parsing, but its certainly much better than what we had.
elastic · May 10, 2021 · a43b166 · a43b166
1 parent 8069e9b
commit a43b166
Show file tree

Hide file tree

Showing 10 changed files with 790 additions and 244 deletions.
diff --git a/docs/reference/aggregations/bucket/significantterms-aggregation.asciidoc b/docs/reference/aggregations/bucket/significantterms-aggregation.asciidoc
@@ -374,7 +374,7 @@ Chi square behaves like mutual information and can be configured with the same p
 
 
 ===== Google normalized distance
-Google normalized distance as described in "The Google Similarity Distance", Cilibrasi and Vitanyi, 2007 (https://arxiv.org/pdf/cs/0412098v3.pdf) can be used as significance score by adding the parameter
+Google normalized distance as described in https://arxiv.org/pdf/cs/0412098v3.pdf["The Google Similarity Distance", Cilibrasi and Vitanyi, 2007] can be used as significance score by adding the parameter
 
 [source,js]
 --------------------------------------------------
@@ -408,7 +408,7 @@ Multiple observations are typically required to reinforce a view so it is recomm
 
 Roughly, `mutual_information` prefers high frequent terms even if they occur also frequently in the background. For example, in an analysis of natural language text this might lead to selection of stop words. `mutual_information` is unlikely to select very rare terms like misspellings. `gnd` prefers terms with a high co-occurrence and avoids selection of stopwords. It might be better suited for synonym detection. However, `gnd` has a tendency to select very rare terms that are, for example, a result of misspelling. `chi_square` and `jlh` are somewhat in-between.
 
-It is hard to say which one of the different heuristics will be the best choice as it depends on what the significant terms are used for (see for example [Yang and Pedersen, "A Comparative Study on Feature Selection in Text Categorization", 1997](http://courses.ischool.berkeley.edu/i256/f06/papers/yang97comparative.pdf) for a study on using significant terms for feature selection for text classification).
+It is hard to say which one of the different heuristics will be the best choice as it depends on what the significant terms are used for (see for example http://courses.ischool.berkeley.edu/i256/f06/papers/yang97comparative.pdf[Yang and Pedersen, "A Comparative Study on Feature Selection in Text Categorization", 1997] for a study on using significant terms for feature selection for text classification).
 
 If none of the above measures suits your usecase than another option is to implement a custom significance measure:
 

diff --git a/...api-spec/src/yamlRestTest/resources/rest-api-spec/test/search.aggregation/90_sig_text.yml b/...api-spec/src/yamlRestTest/resources/rest-api-spec/test/search.aggregation/90_sig_text.yml