Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Add concepts section to analysis topic #50801

Merged
merged 2 commits into from
Jan 16, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/reference/analysis.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ looking for:

include::analysis/overview.asciidoc[]

include::analysis/anatomy.asciidoc[]
include::analysis/concepts.asciidoc[]

include::analysis/testing.asciidoc[]

Expand Down
18 changes: 5 additions & 13 deletions docs/reference/analysis/anatomy.asciidoc
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[[analyzer-anatomy]]
== Anatomy of an analyzer
=== Anatomy of an analyzer
Copy link
Contributor Author

@jrodewig jrodewig Jan 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No substantive changes were made to this page. Simply adjusted headings and attributes so they would display normally. I plan to do an overhaul of this page in an upcoming PR.


An _analyzer_ -- whether built-in or custom -- is just a package which
contains three lower-level building blocks: _character filters_,
Expand All @@ -10,8 +10,7 @@ blocks into analyzers suitable for different languages and types of text.
Elasticsearch also exposes the individual building blocks so that they can be
combined to define new <<analysis-custom-analyzer,`custom`>> analyzers.

[float]
=== Character filters
==== Character filters

A _character filter_ receives the original text as a stream of characters and
can transform the stream by adding, removing, or changing characters. For
Expand All @@ -22,8 +21,7 @@ elements like `<b>` from the stream.
An analyzer may have *zero or more* <<analysis-charfilters,character filters>>,
which are applied in order.

[float]
=== Tokenizer
==== Tokenizer

A _tokenizer_ receives a stream of characters, breaks it up into individual
_tokens_ (usually individual words), and outputs a stream of _tokens_. For
Expand All @@ -37,9 +35,7 @@ the term represents.

An analyzer must have *exactly one* <<analysis-tokenizers,tokenizer>>.


[float]
=== Token filters
==== Token filters

A _token filter_ receives the token stream and may add, remove, or change
tokens. For example, a <<analysis-lowercase-tokenfilter,`lowercase`>> token
Expand All @@ -53,8 +49,4 @@ Token filters are not allowed to change the position or character offsets of
each token.

An analyzer may have *zero or more* <<analysis-tokenfilters,token filters>>,
which are applied in order.




which are applied in order.
11 changes: 11 additions & 0 deletions docs/reference/analysis/concepts.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[[analysis-concepts]]
== Text analysis concepts
++++
<titleabbrev>Concepts</titleabbrev>
++++

This section explains the fundamental concepts of text analysis in {es}.

* <<analyzer-anatomy>>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit sparse at the moment, but I plan to move some of the index/search time analysis content here from the top-level Analysis page in an upcoming PR.


include::anatomy.asciidoc[]