Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add crawler metrics into the stats metricset for Enterprise Search #28790

Merged
merged 6 commits into from
Nov 5, 2021
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
363 changes: 363 additions & 0 deletions metricbeat/docs/fields.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -32795,6 +32795,369 @@ type: long

--

[float]
=== crawler

Aggregate stats on the functioning of the crawler subsystem within Enterprise Search.


[float]
=== global

Global deployment-wide metrics for the crawler.


[float]
=== crawl_requests

Crawl request summary for the deployment.


*`enterprisesearch.stats.crawler.global.crawl_requests.pending`*::
+
--
Total number of crawl requests waiting to be processed.

type: long

--

*`enterprisesearch.stats.crawler.global.crawl_requests.active`*::
+
--
Total number of crawl requests cyrrently being processed (running crawls).
kovyrin marked this conversation as resolved.
Show resolved Hide resolved

type: long

--

*`enterprisesearch.stats.crawler.global.crawl_requests.successful`*::
+
--
Total number of crawl requests that have succeeded.

type: long

--

*`enterprisesearch.stats.crawler.global.crawl_requests.failed`*::
+
--
Total number of failed crawl requests.

type: long

--

[float]
=== node

Node-level statistics for the crawler.


*`enterprisesearch.stats.crawler.node.pages_visited`*::
+
--
Total number of pages visited by the crawler since the instance start.

type: long

--

*`enterprisesearch.stats.crawler.node.urls_allowed`*::
+
--
Total number of URLs allowed by the crawler during discovery since the instance start.

type: long

--

[float]
=== urls_denied

Total number of URLs debied by the crawler during discovery since the instance start, broken down by deny reason.
kovyrin marked this conversation as resolved.
Show resolved Hide resolved


*`enterprisesearch.stats.crawler.node.urls_denied.already_seen`*::
+
--
Total number of URLs not followed becuase of URL de-duplication (each URL is visited only once).

type: long

--

*`enterprisesearch.stats.crawler.node.urls_denied.domain_filter_denied`*::
+
--
Total number of URLs denied because of an unknown domain.

type: long

--

*`enterprisesearch.stats.crawler.node.urls_denied.incorrect_protocol`*::
+
--
Total number of URLs with incorrect/invalid/unsupported protocols.

type: long

--

*`enterprisesearch.stats.crawler.node.urls_denied.link_too_deep`*::
+
--
Total number of URLs not followed due to crawl depth limits.

type: long

--

*`enterprisesearch.stats.crawler.node.urls_denied.nofollow`*::
+
--
Total number of URLs denied due to a nofollow meta tag or URL attribute.
kovyrin marked this conversation as resolved.
Show resolved Hide resolved

type: long

--

*`enterprisesearch.stats.crawler.node.urls_denied.unsupported_content_type`*::
+
--
Total number of URLs denied due to an unsuported content type.
kovyrin marked this conversation as resolved.
Show resolved Hide resolved

type: long

--

[float]
=== status_codes

HTTP request result counts, broken by status code.
kovyrin marked this conversation as resolved.
Show resolved Hide resolved


*`enterprisesearch.stats.crawler.node.status_codes.200`*::
+
--
Total number of HTTP 200 responses seen by the crawler since the instance start.

type: long

--

*`enterprisesearch.stats.crawler.node.status_codes.301`*::
+
--
Total number of HTTP 301 responses seen by the crawler since the instance start.

type: long

--

*`enterprisesearch.stats.crawler.node.status_codes.302`*::
+
--
Total number of HTTP 302 responses seen by the crawler since the instance start.

type: long

--

*`enterprisesearch.stats.crawler.node.status_codes.400`*::
+
--
Total number of HTTP 400 responses seen by the crawler since the instance start.

type: long

--

*`enterprisesearch.stats.crawler.node.status_codes.401`*::
+
--
Total number of HTTP 401 responses seen by the crawler since the instance start.

type: long

--

*`enterprisesearch.stats.crawler.node.status_codes.402`*::
+
--
Total number of HTTP 402 responses seen by the crawler since the instance start.

type: long

--

*`enterprisesearch.stats.crawler.node.status_codes.403`*::
+
--
Total number of HTTP 403 responses seen by the crawler since the instance start.

type: long

--

*`enterprisesearch.stats.crawler.node.status_codes.404`*::
+
--
Total number of HTTP 404 responses seen by the crawler since the instance start.

type: long

--

*`enterprisesearch.stats.crawler.node.status_codes.405`*::
+
--
Total number of HTTP 405 responses seen by the crawler since the instance start.

type: long

--

*`enterprisesearch.stats.crawler.node.status_codes.410`*::
+
--
Total number of HTTP 410 responses seen by the crawler since the instance start.

type: long

--

*`enterprisesearch.stats.crawler.node.status_codes.422`*::
+
--
Total number of HTTP 422 responses seen by the crawler since the instance start.

type: long

--

*`enterprisesearch.stats.crawler.node.status_codes.429`*::
+
--
Total number of HTTP 429 responses seen by the crawler since the instance start.

type: long

--

*`enterprisesearch.stats.crawler.node.status_codes.500`*::
+
--
Total number of HTTP 500 responses seen by the crawler since the instance start.

type: long

--

*`enterprisesearch.stats.crawler.node.status_codes.501`*::
+
--
Total number of HTTP 501 responses seen by the crawler since the instance start.

type: long

--

*`enterprisesearch.stats.crawler.node.status_codes.502`*::
+
--
Total number of HTTP 502 responses seen by the crawler since the instance start.

type: long

--

*`enterprisesearch.stats.crawler.node.status_codes.503`*::
+
--
Total number of HTTP 503 responses seen by the crawler since the instance start.

type: long

--

*`enterprisesearch.stats.crawler.node.status_codes.504`*::
+
--
Total number of HTTP 504 responses seen by the crawler since the instance start.

type: long

--

[float]
=== queue_size

Total current URL queue size for the instance.


*`enterprisesearch.stats.crawler.node.queue_size.primary`*::
+
--
Total number of URLs waiting to be crawled by the instance.

type: long

--

*`enterprisesearch.stats.crawler.node.queue_size.purge`*::
+
--
Total number of URLs waiting to be checked by the purge crawl phase.

type: long

--

*`enterprisesearch.stats.crawler.node.active_threads`*::
+
--
Total number of crawler worker threads currently active on the instance.

type: long

--

[float]
=== workers

Crawler workers information for the instance.


*`enterprisesearch.stats.crawler.node.workers.pool_size`*::
+
--
Total size of the crawl workers pool (number of concurrent crawls possible) for the instance.

type: long

--

*`enterprisesearch.stats.crawler.node.workers.active`*::
+
--
Total number of currently active crawl workers (running crawls) for the instance.

type: long

--

*`enterprisesearch.stats.crawler.node.workers.available`*::
+
--
Total number of currently available (free) crawl workers for the instance.

type: long

--

[float]
=== product_usage

Expand Down
2 changes: 1 addition & 1 deletion x-pack/metricbeat/module/enterprisesearch/_meta/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,4 @@ COPY docker-entrypoint-dependencies.sh /usr/local/bin/
ENTRYPOINT ["tini", "--", "/usr/local/bin/docker-entrypoint-dependencies.sh"]

HEALTHCHECK --interval=1s --retries=300 --start-period=60s \
CMD curl --user elastic:changeme --fail --silent http://localhost:3002/api/as/v1/internal/health
CMD curl --user elastic:changeme --fail --silent http://localhost:3002/api/ent/v1/internal/health
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old app-search scoped API has been deprecated and removed from 8.0

6 changes: 3 additions & 3 deletions x-pack/metricbeat/module/enterprisesearch/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@ version: '2.3'

services:
enterprise_search:
image: docker.elastic.co/integrations-ci/beats-enterprisesearch:${ENT_VERSION:-7.16.0-SNAPSHOT}
image: docker.elastic.co/integrations-ci/beats-enterprisesearch:${ENT_VERSION:-8.0.0-SNAPSHOT}
build:
context: ./_meta
args:
ENT_VERSION: ${ENT_VERSION:-7.16.0-SNAPSHOT}
ENT_VERSION: ${ENT_VERSION:-8.0.0-SNAPSHOT}
depends_on:
- "elasticsearch"
environment:
Expand All @@ -15,8 +15,8 @@ services:
- "elasticsearch.password=changeme"
- "elasticsearch.host=http://elasticsearch:9200"
- "allow_es_settings_modification=true"
- "ent_search.auth.native1.source=elasticsearch-native"
- "secret_management.encryption_keys=[4a2cd3f81d39bf28738c10db0ca782095ffac07279561809eecc722e0c20eb09]"
- "kibana.host=http://localhost:5601"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed to make Enterprise Search start, we don't actually need Kibana there

- "JAVA_OPTS=-Xms2g -Xmx2g"
# Make it possible to run against slightly older ES versions
- "elasticsearch.ignore_version_mismatch=true"
Expand Down
Loading