From 7e93a7e258dea6449342fa3e2386f5c8c6c49064 Mon Sep 17 00:00:00 2001 From: Liam Thompson Date: Wed, 5 Feb 2025 13:15:11 +0100 Subject: [PATCH 1/8] [Search] Update/consolidate ingest section, move search pipelines content --- .../es-ingestion-overview.md | 41 ------------ raw-migrated-files/toc.yml | 2 - solutions/search/ingest-for-search.md | 64 +++++++++++++++++-- .../search/search-pipelines.md | 38 ++++++----- solutions/toc.yml | 2 + 5 files changed, 77 insertions(+), 70 deletions(-) delete mode 100644 raw-migrated-files/elasticsearch/elasticsearch-reference/es-ingestion-overview.md rename raw-migrated-files/elasticsearch/elasticsearch-reference/ingest-pipeline-search.md => solutions/search/search-pipelines.md (82%) diff --git a/raw-migrated-files/elasticsearch/elasticsearch-reference/es-ingestion-overview.md b/raw-migrated-files/elasticsearch/elasticsearch-reference/es-ingestion-overview.md deleted file mode 100644 index 5f6cac5c92..0000000000 --- a/raw-migrated-files/elasticsearch/elasticsearch-reference/es-ingestion-overview.md +++ /dev/null @@ -1,41 +0,0 @@ -# Add data to {{es}} [es-ingestion-overview] - -There are multiple ways to ingest data into {{es}}. The option that you choose depends on whether you’re working with timestamped data or non-timestamped data, where the data is coming from, its complexity, and more. - -::::{tip} -You can load [sample data](../../../manage-data/ingest.md#_add_sample_data) into your {{es}} cluster using {{kib}}, to get started quickly. - -:::: - - - -## General content [es-ingestion-overview-general-content] - -General content is data that does not have a timestamp. This could be data like vector embeddings, website content, product catalogs, and more. For general content, you have the following options for adding data to {{es}} indices: - -* [API](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs.html): Use the {{es}} [Document APIs](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs.html) to index documents directly, using the Dev Tools [Console](../../../explore-analyze/query-filter/tools/console.md), or cURL. - - If you’re building a website or app, then you can call Elasticsearch APIs using an [{{es}} client](https://www.elastic.co/guide/en/elasticsearch/client/index.html) in the programming language of your choice. If you use the Python client, then check out the `elasticsearch-labs` repo for various [example notebooks](https://github.com/elastic/elasticsearch-labs/tree/main/notebooks/search/python-examples). - -* [File upload](../../../manage-data/ingest.md#upload-data-kibana): Use the {{kib}} file uploader to index single files for one-off testing and exploration. The GUI guides you through setting up your index and field mappings. -* [Web crawler](https://github.com/elastic/crawler): Extract and index web page content into {{es}} documents. -* [Connectors](https://www.elastic.co/guide/en/elasticsearch/reference/current/es-connectors.html): Sync data from various third-party data sources to create searchable, read-only replicas in {{es}}. - - -## Timestamped data [es-ingestion-overview-timestamped] - -Timestamped data in {{es}} refers to datasets that include a timestamp field. If you use the [Elastic Common Schema (ECS)](https://www.elastic.co/guide/en/ecs/{{ecs_version}}/ecs-reference.html), this field is named `@timestamp`. This could be data like logs, metrics, and traces. - -For timestamped data, you have the following options for adding data to {{es}} data streams: - -* [Elastic Agent and Fleet](https://www.elastic.co/guide/en/fleet/current/fleet-overview.html): The preferred way to index timestamped data. Each Elastic Agent based integration includes default ingestion rules, dashboards, and visualizations to start analyzing your data right away. You can use the Fleet UI in {{kib}} to centrally manage Elastic Agents and their policies. -* [Beats](https://www.elastic.co/guide/en/beats/libbeat/current/beats-reference.html): If your data source isn’t supported by Elastic Agent, use Beats to collect and ship data to Elasticsearch. You install a separate Beat for each type of data to collect. -* [Logstash](https://www.elastic.co/guide/en/logstash/current/introduction.html): Logstash is an open source data collection engine with real-time pipelining capabilities that supports a wide variety of data sources. You might use this option because neither Elastic Agent nor Beats supports your data source. You can also use Logstash to persist incoming data, or if you need to send the data to multiple destinations. -* [Language clients](../../../manage-data/ingest/ingesting-data-from-applications.md): The linked tutorials demonstrate how to use {{es}} programming language clients to ingest data from an application. In these examples, {{es}} is running on Elastic Cloud, but the same principles apply to any {{es}} deployment. - -::::{tip} -If you’re interested in data ingestion pipelines for timestamped data, use the decision tree in the [Elastic Cloud docs](../../../manage-data/ingest.md#ec-data-ingest-pipeline) to understand your options. - -:::: - - diff --git a/raw-migrated-files/toc.yml b/raw-migrated-files/toc.yml index 7e36e3a085..1e2a87c775 100644 --- a/raw-migrated-files/toc.yml +++ b/raw-migrated-files/toc.yml @@ -603,7 +603,6 @@ toc: - file: elasticsearch/elasticsearch-reference/document-level-security.md - file: elasticsearch/elasticsearch-reference/documents-indices.md - file: elasticsearch/elasticsearch-reference/elasticsearch-intro-deploy.md - - file: elasticsearch/elasticsearch-reference/es-ingestion-overview.md - file: elasticsearch/elasticsearch-reference/es-security-principles.md - file: elasticsearch/elasticsearch-reference/esql-examples.md - file: elasticsearch/elasticsearch-reference/esql-getting-started.md @@ -621,7 +620,6 @@ toc: - file: elasticsearch/elasticsearch-reference/index-modules-analysis.md - file: elasticsearch/elasticsearch-reference/index-modules-mapper.md - file: elasticsearch/elasticsearch-reference/ingest-enriching-data.md - - file: elasticsearch/elasticsearch-reference/ingest-pipeline-search.md - file: elasticsearch/elasticsearch-reference/ingest.md - file: elasticsearch/elasticsearch-reference/install-elasticsearch.md - file: elasticsearch/elasticsearch-reference/ip-filtering.md diff --git a/solutions/search/ingest-for-search.md b/solutions/search/ingest-for-search.md index a1190d32aa..05623262b6 100644 --- a/solutions/search/ingest-for-search.md +++ b/solutions/search/ingest-for-search.md @@ -6,22 +6,72 @@ mapped_urls: - https://www.elastic.co/guide/en/serverless/current/elasticsearch-ingest-your-data.html --- -# Ingest for search +# Ingest for search use cases -% What needs to be done: Lift-and-shift +% ---- +% navigation_title: "Ingest for search use cases" +% ---- -% Scope notes: guidance on what ingest options you might want to use for search - connectors, crawler ... +$$$elasticsearch-ingest-time-series-data$$$ +::::{note} +This page covers ingest methods specifically for search use cases. If you're working with a different use case, refer to the [ingestion overview](/manage-data/ingest.md) for more options. +:::: + +Search use cases usually focus on general **content**, typically text-heavy data that does not have a timestamp. This could be data like knowledge bases, website content, product catalogs, and more. + +Once you've decided how to [deploy Elastic](/deploy-manage/index.md), the next step is getting your content into {{es}}. Your choice of ingestion method depends on where your content lives and how you need to access it. + +There are several methods to ingest data into {es} for search use cases. Choose one or more based on your requirements. + +::::{tip} +If you just want to do a quick test, you can load [sample data](/manage-data/ingest/sample-data.md) into your {{es}} cluster using the UI. +:::: + +## Ingest data using APIs [es-ingestion-overview-apis] + +You can use the [`_bulk` API](https://www.elastic.co/docs/api/doc/elasticsearch/v8/group/endpoint-document) to add data to your {{es}} indices, using any HTTP client, including the [{{es}} client libraries](/solutions/search/site-or-app/clients.md). + +While the {{es}} APIs can be used for any data type, Elastic provides specialized tools that optimize ingestion for specific use cases. + +## Specialized tools [es-ingestion-overview-general-content] + +You can use these specialized tools to add general content to {{es}} indices. + +| Method | Description | Notes | +|--------|-------------|-------| +| [**Web crawler**](https://github.com/elastic/crawler) | Programmatically discover and index content from websites and knowledge bases | Crawl public-facing web content or internal sites accessible via HTTP proxy | +| [**Search connectors**]() | Third-party integrations to popular content sources like databases, cloud storage, and business applications | Choose from a range of Elastic-built connectors or build your own in Python using the Elastic connector framework| +| [**File upload**](/manage-data/ingest/tools/upload-data-files.md)| One-off manual uploads through the UI | Useful for testing or very small-scale use cases, but not recommended for production workflows | + +### (Optional) Content processing + +You can also transform and enrich your content at ingest time with ingest pipelines, or at query time with runtime fields. Choose the right approach based on your requirements: + +| Processing Type | Description | Use cases | +|----------------|-------------|------------| +| **Ingest pipelines** | Choose from a range of built-in processors or create custom processors | Data enrichment, content extraction from PDFs, ML inference, custom business logic | +| **Runtime fields** | Fields computed during query execution | Price calculation with current exchange rates, distance calculations, user-specific scoring | + +You can manage ingest pipelines through Elasticsearch APIs or Kibana UIs. + +The Content UI under Search has a set of tools for creating and managing indices optimized for search use cases (non time series data). You can also manage your ingest pipelines in this UI. +Refer to % Use migrated content from existing pages that map to this page: -% - [ ] ./raw-migrated-files/elasticsearch/elasticsearch-reference/es-ingestion-overview.md -% - [ ] ./raw-migrated-files/docs-content/serverless/elasticsearch-ingest-data-through-api.md +% - [x] ./raw-migrated-files/elasticsearch/elasticsearch-reference/es-ingestion-overview.md +% - [x] ./raw-migrated-files/docs-content/serverless/elasticsearch-ingest-data-through-api.md % - [ ] ./raw-migrated-files/elasticsearch/elasticsearch-reference/ingest-pipeline-search.md -% - [ ] ./raw-migrated-files/docs-content/serverless/elasticsearch-ingest-your-data.md +% - [x] ./raw-migrated-files/docs-content/serverless/elasticsearch-ingest-your-data.md % Internal links rely on the following IDs being on this page (e.g. as a heading ID, paragraph ID, etc): -$$$elasticsearch-ingest-time-series-data$$$ + + + + + + $$$ingest-pipeline-search-details-specific-ml-reference$$$ diff --git a/raw-migrated-files/elasticsearch/elasticsearch-reference/ingest-pipeline-search.md b/solutions/search/search-pipelines.md similarity index 82% rename from raw-migrated-files/elasticsearch/elasticsearch-reference/ingest-pipeline-search.md rename to solutions/search/search-pipelines.md index 5e27a802f0..7522f69165 100644 --- a/raw-migrated-files/elasticsearch/elasticsearch-reference/ingest-pipeline-search.md +++ b/solutions/search/search-pipelines.md @@ -1,9 +1,8 @@ -# Ingest pipelines in Search [ingest-pipeline-search] +# Ingest pipelines for search use cases [ingest-pipeline-search] You can manage ingest pipelines through Elasticsearch APIs or Kibana UIs. -The **Content** UI under **Search** has a set of tools for creating and managing indices optimized for search use cases (non time series data). You can also manage your ingest pipelines in this UI. - +The **Content** UI under **Search** has a set of tools for creating and managing indices optimized for search use cases (non-time series data). You can also manage your ingest pipelines in this UI. ## Find pipelines in Content UI [ingest-pipeline-search-where] @@ -18,12 +17,11 @@ To find this tab in the Kibana UI: The tab is highlighted in this screenshot: -:::{image} ../../../images/elasticsearch-reference-ingest-pipeline-ent-search-ui.png +:::{image} /images/elasticsearch-reference-ingest-pipeline-ent-search-ui.png :alt: ingest pipeline ent search ui :class: screenshot ::: - ## Overview [ingest-pipeline-search-in-enterprise-search] These tools can be particularly helpful by providing a layer of customization and post-processing of documents. For example: @@ -36,9 +34,9 @@ It can be a lot of work to set up and manage production-ready pipelines from scr To this end, when you create indices for search use cases, (including [Elastic web crawler](https://www.elastic.co/guide/en/enterprise-search/current/crawler.html), [connectors](https://www.elastic.co/guide/en/elasticsearch/reference/current/es-connectors.html). , and API indices), each index already has a pipeline set up with several processors that optimize your content for search. -This pipeline is called `search-default-ingestion`. While it is a "managed" pipeline (meaning it should not be tampered with), you can view its details via the Kibana UI or the Elasticsearch API. You can also [read more about its contents below](../../../solutions/search/ingest-for-search.md#ingest-pipeline-search-details-generic-reference). +This pipeline is called `search-default-ingestion`. While it is a "managed" pipeline (meaning it should not be tampered with), you can view its details via the Kibana UI or the Elasticsearch API. You can also [read more about its contents below](/solutions/search/ingest-for-search.md#ingest-pipeline-search-details-generic-reference). -You can control whether you run some of these processors. While all features are enabled by default, they are eligible for opt-out. For [Elastic crawler](https://www.elastic.co/guide/en/enterprise-search/current/crawler.html) and [connectors](https://www.elastic.co/guide/en/elasticsearch/reference/current/es-connectors.html). , you can opt out (or back in) per index, and your choices are saved. For API indices, you can opt out (or back in) by including specific fields in your documents. [See below for details](../../../solutions/search/ingest-for-search.md#ingest-pipeline-search-pipeline-settings-using-the-api). +You can control whether you run some of these processors. While all features are enabled by default, they are eligible for opt-out. For [Elastic crawler](https://www.elastic.co/guide/en/enterprise-search/current/crawler.html) and [connectors](https://www.elastic.co/guide/en/elasticsearch/reference/current/es-connectors.html). , you can opt out (or back in) per index, and your choices are saved. For API indices, you can opt out (or back in) by including specific fields in your documents. [See below for details](/solutions/search/ingest-for-search.md#ingest-pipeline-search-pipeline-settings-using-the-api). At the deployment level, you can change the default settings for all new indices. This will not effect existing indices. @@ -48,7 +46,7 @@ Each index also provides the capability to easily create index-specific ingest p 2. `@custom` 3. `@ml-inference` -Like `search-default-ingestion`, the first of these is "managed", but the other two can and should be modified to fit your needs. You can view these pipelines using the platform tools (Kibana UI, Elasticsearch API), and can also [read more about their content below](../../../solutions/search/ingest-for-search.md#ingest-pipeline-search-details-specific). +Like `search-default-ingestion`, the first of these is "managed", but the other two can and should be modified to fit your needs. You can view these pipelines using the platform tools (Kibana UI, Elasticsearch API), and can also [read more about their content below](/solutions/search/ingest-for-search.md#ingest-pipeline-search-details-specific). ## Pipeline Settings [ingest-pipeline-search-pipeline-settings] @@ -97,10 +95,10 @@ If the pipeline is not specified, the underscore-prefixed fields will actually b ### `search-default-ingestion` Reference [ingest-pipeline-search-details-generic-reference] -You can access this pipeline with the [Elasticsearch Ingest Pipelines API](https://www.elastic.co/guide/en/elasticsearch/reference/current/get-pipeline-api.html) or via Kibana’s [Stack Management > Ingest Pipelines](../../../manage-data/ingest/transform-enrich/ingest-pipelines.md#create-manage-ingest-pipelines) UI. +You can access this pipeline with the [Elasticsearch Ingest Pipelines API](https://www.elastic.co/guide/en/elasticsearch/reference/current/get-pipeline-api.html) or via Kibana’s [Stack Management > Ingest Pipelines](/manage-data//ingest/transform-enrich/ingest-pipelines.md#create-manage-ingest-pipelines) UI. ::::{warning} -This pipeline is a "managed" pipeline. That means that it is not intended to be edited. Editing/updating this pipeline manually could result in unintended behaviors, or difficulty in upgrading in the future. If you want to make customizations, we recommend you utilize index-specific pipelines (see below), specifically [the `@custom` pipeline](../../../solutions/search/ingest-for-search.md#ingest-pipeline-search-details-specific-custom-reference). +This pipeline is a "managed" pipeline. That means that it is not intended to be edited. Editing/updating this pipeline manually could result in unintended behaviors, or difficulty in upgrading in the future. If you want to make customizations, we recommend you utilize index-specific pipelines (see below), specifically [the `@custom` pipeline](/solutions/search/ingest-for-search.md#ingest-pipeline-search-details-specific-custom-reference). :::: @@ -118,12 +116,12 @@ This pipeline is a "managed" pipeline. That means that it is not intended to be #### Control flow parameters [ingest-pipeline-search-details-generic-reference-params] -The `search-default-ingestion` pipeline does not always run all processors. It utilizes a feature of ingest pipelines to [conditionally run processors](../../../manage-data/ingest/transform-enrich/ingest-pipelines.md#conditionally-run-processor) based on the contents of each individual document. +The `search-default-ingestion` pipeline does not always run all processors. It utilizes a feature of ingest pipelines to [conditionally run processors](/manage-data//ingest/transform-enrich/ingest-pipelines.md#conditionally-run-processor) based on the contents of each individual document. * `_extract_binary_content` - if this field is present and has a value of `true` on a source document, the pipeline will attempt to run the `attachment`, `set_body`, and `remove_replacement_chars` processors. Note that the document will also need an `_attachment` field populated with base64-encoded binary data in order for the `attachment` processor to have any output. If the `_extract_binary_content` field is missing or `false` on a source document, these processors will be skipped. * `_reduce_whitespace` - if this field is present and has a value of `true` on a source document, the pipeline will attempt to run the `remove_extra_whitespace` and `trim` processors. These processors only apply to the `body` field. If the `_reduce_whitespace` field is missing or `false` on a source document, these processors will be skipped. -Crawler, Native Connectors, and Connector Clients will automatically add these control flow parameters based on the settings in the index’s Pipeline tab. To control what settings any new indices will have upon creation, see the deployment wide content settings. See [Pipeline Settings](../../../solutions/search/ingest-for-search.md#ingest-pipeline-search-pipeline-settings). +Crawler, Native Connectors, and Connector Clients will automatically add these control flow parameters based on the settings in the index’s Pipeline tab. To control what settings any new indices will have upon creation, see the deployment wide content settings. See [Pipeline Settings](/solutions/search/ingest-for-search.md#ingest-pipeline-search-pipeline-settings). ### Index-specific ingest pipelines [ingest-pipeline-search-details-specific] @@ -139,7 +137,7 @@ The "copy and customize" button is not available at all Elastic subscription lev #### `` Reference [ingest-pipeline-search-details-specific-reference] -This pipeline looks and behaves a lot like the [`search-default-ingestion` pipeline](../../../solutions/search/ingest-for-search.md#ingest-pipeline-search-details-generic-reference), but with [two additional processors](../../../solutions/search/ingest-for-search.md#ingest-pipeline-search-details-specific-reference-processors). +This pipeline looks and behaves a lot like the [`search-default-ingestion` pipeline](/solutions/search/ingest-for-search.md#ingest-pipeline-search-details-generic-reference), but with [two additional processors](/solutions/search/ingest-for-search.md#ingest-pipeline-search-details-specific-reference-processors). ::::{warning} You should not rename this pipeline. @@ -148,7 +146,7 @@ You should not rename this pipeline. ::::{warning} -This pipeline is a "managed" pipeline. That means that it is not intended to be edited. Editing/updating this pipeline manually could result in unintended behaviors, or difficulty in upgrading in the future. If you want to make customizations, we recommend you utilize [the `@custom` pipeline](../../../solutions/search/ingest-for-search.md#ingest-pipeline-search-details-specific-custom-reference). +This pipeline is a "managed" pipeline. That means that it is not intended to be edited. Editing/updating this pipeline manually could result in unintended behaviors, or difficulty in upgrading in the future. If you want to make customizations, we recommend you utilize [the `@custom` pipeline](/solutions/search/ingest-for-search.md#ingest-pipeline-search-details-specific-custom-reference). :::: @@ -156,7 +154,7 @@ This pipeline is a "managed" pipeline. That means that it is not intended to be ##### Processors [ingest-pipeline-search-details-specific-reference-processors] -In addition to the processors inherited from the [`search-default-ingestion` pipeline](../../../solutions/search/ingest-for-search.md#ingest-pipeline-search-details-generic-reference), the index-specific pipeline also defines: +In addition to the processors inherited from the [`search-default-ingestion` pipeline](/solutions/search/ingest-for-search.md#ingest-pipeline-search-details-generic-reference), the index-specific pipeline also defines: * `index_ml_inference_pipeline` - this uses the [Pipeline](https://www.elastic.co/guide/en/elasticsearch/reference/current/pipeline-processor.html) processor to run the `@ml-inference` pipeline. This processor will only be run if the source document includes a `_run_ml_inference` field with the value `true`. * `index_custom_pipeline` - this uses the [Pipeline](https://www.elastic.co/guide/en/elasticsearch/reference/current/pipeline-processor.html) processor to run the `@custom` pipeline. @@ -168,7 +166,7 @@ Like the `search-default-ingestion` pipeline, the `` pipeline does n * `_run_ml_inference` - if this field is present and has a value of `true` on a source document, the pipeline will attempt to run the `index_ml_inference_pipeline` processor. If the `_run_ml_inference` field is missing or `false` on a source document, this processor will be skipped. -Crawler, Native Connectors, and Connector Clients will automatically add these control flow parameters based on the settings in the index’s Pipeline tab. To control what settings any new indices will have upon creation, see the deployment wide content settings. See [Pipeline Settings](../../../solutions/search/ingest-for-search.md#ingest-pipeline-search-pipeline-settings). +Crawler, Native Connectors, and Connector Clients will automatically add these control flow parameters based on the settings in the index’s Pipeline tab. To control what settings any new indices will have upon creation, see the deployment wide content settings. See [Pipeline Settings](/solutions/search/ingest-for-search.md#ingest-pipeline-search-pipeline-settings). #### `@ml-inference` Reference [ingest-pipeline-search-details-specific-ml-reference] @@ -194,7 +192,7 @@ The `monitor_ml` Elasticsearch cluster permission is required in order to manage This pipeline is empty to start (no processors), but can be added to via the Kibana UI either through the Pipelines tab of your index, or from the **Stack Management > Ingest Pipelines** page. Unlike the `search-default-ingestion` pipeline and the `` pipeline, this pipeline is NOT "managed". -You are encouraged to make additions and edits to this pipeline, provided its name remains the same. This provides a convenient hook from which to add custom processing and transformations for your data. Be sure to read the [docs for ingest pipelines](../../../manage-data/ingest/transform-enrich/ingest-pipelines.md) to see what options are available. +You are encouraged to make additions and edits to this pipeline, provided its name remains the same. This provides a convenient hook from which to add custom processing and transformations for your data. Be sure to read the [docs for ingest pipelines](/manage-data//ingest/transform-enrich/ingest-pipelines.md) to see what options are available. ::::{warning} You should not rename this pipeline. @@ -206,9 +204,9 @@ You should not rename this pipeline. ## Upgrading notes [ingest-pipeline-search-upgrading-notes] ::::{dropdown} Expand to see upgrading notes -* `app_search_crawler` - Since 8.3, {{app-search-crawler}} has utilized this pipeline to power its binary content extraction. You can read more about this pipeline and its usage in the [App Search Guide](https://www.elastic.co/guide/en/app-search/current/web-crawler-reference.html#web-crawler-reference-binary-content-extraction). When upgrading from 8.3 to 8.5+, be sure to note any changes that you made to the `app_search_crawler` pipeline. These changes should be re-applied to each index’s `@custom` pipeline in order to ensure a consistent data processing experience. In 8.5+, the [index setting to enable binary content](../../../solutions/search/ingest-for-search.md#ingest-pipeline-search-pipeline-settings) is required **in addition** to the configurations mentioned in the [App Search Guide](https://www.elastic.co/guide/en/app-search/current/web-crawler-reference.html#web-crawler-reference-binary-content-extraction). -* `ent_search_crawler` - Since 8.4, the Elastic web crawler has utilized this pipeline to power its binary content extraction. You can read more about this pipeline and its usage in the [Elastic web crawler Guide](https://www.elastic.co/guide/en/enterprise-search/current/crawler-managing.html#crawler-managing-binary-content). When upgrading from 8.4 to 8.5+, be sure to note any changes that you made to the `ent_search_crawler` pipeline. These changes should be re-applied to each index’s `@custom` pipeline in order to ensure a consistent data processing experience. In 8.5+, the [index setting to enable binary content](../../../solutions/search/ingest-for-search.md#ingest-pipeline-search-pipeline-settings) is required **in addition** to the configurations mentioned in the [Elastic web crawler Guide](https://www.elastic.co/guide/en/enterprise-search/current/crawler-managing.html#crawler-managing-binary-content). +* `app_search_crawler` - Since 8.3, {{app-search-crawler}} has utilized this pipeline to power its binary content extraction. You can read more about this pipeline and its usage in the [App Search Guide](https://www.elastic.co/guide/en/app-search/current/web-crawler-reference.html#web-crawler-reference-binary-content-extraction). When upgrading from 8.3 to 8.5+, be sure to note any changes that you made to the `app_search_crawler` pipeline. These changes should be re-applied to each index’s `@custom` pipeline in order to ensure a consistent data processing experience. In 8.5+, the [index setting to enable binary content](/solutions/search/ingest-for-search.md#ingest-pipeline-search-pipeline-settings) is required **in addition** to the configurations mentioned in the [App Search Guide](https://www.elastic.co/guide/en/app-search/current/web-crawler-reference.html#web-crawler-reference-binary-content-extraction). +* `ent_search_crawler` - Since 8.4, the Elastic web crawler has utilized this pipeline to power its binary content extraction. You can read more about this pipeline and its usage in the [Elastic web crawler Guide](https://www.elastic.co/guide/en/enterprise-search/current/crawler-managing.html#crawler-managing-binary-content). When upgrading from 8.4 to 8.5+, be sure to note any changes that you made to the `ent_search_crawler` pipeline. These changes should be re-applied to each index’s `@custom` pipeline in order to ensure a consistent data processing experience. In 8.5+, the [index setting to enable binary content](/solutions/search/ingest-for-search.md#ingest-pipeline-search-pipeline-settings) is required **in addition** to the configurations mentioned in the [Elastic web crawler Guide](https://www.elastic.co/guide/en/enterprise-search/current/crawler-managing.html#crawler-managing-binary-content). * `ent-search-generic-ingestion` - Since 8.5, Native Connectors, Connector Clients, and new (>8.4) Elastic web crawler indices all made use of this pipeline by default. This pipeline evolved into the `search-default-ingestion` pipeline. -* `search-default-ingestion` - Since 9.0, Connectors have made use of this pipeline by default. You can [read more about this pipeline](../../../solutions/search/ingest-for-search.md#ingest-pipeline-search-details-generic-reference) above. As this pipeline is "managed", any modifications that were made to `app_search_crawler` and/or `ent_search_crawler` should NOT be made to `search-default-ingestion`. Instead, if such customizations are desired, you should utilize [Index-specific ingest pipelines](../../../solutions/search/ingest-for-search.md#ingest-pipeline-search-details-specific), placing all modifications in the `@custom` pipeline(s). +* `search-default-ingestion` - Since 9.0, Connectors have made use of this pipeline by default. You can [read more about this pipeline](/solutions/search/ingest-for-search.md#ingest-pipeline-search-details-generic-reference) above. As this pipeline is "managed", any modifications that were made to `app_search_crawler` and/or `ent_search_crawler` should NOT be made to `search-default-ingestion`. Instead, if such customizations are desired, you should utilize [Index-specific ingest pipelines](/solutions/search/ingest-for-search.md#ingest-pipeline-search-details-specific), placing all modifications in the `@custom` pipeline(s). :::: diff --git a/solutions/toc.yml b/solutions/toc.yml index dd37ca2d89..237052b942 100644 --- a/solutions/toc.yml +++ b/solutions/toc.yml @@ -619,6 +619,8 @@ toc: - file: search/building-search-in-your-app-or-site.md - file: search/search-templates.md - file: search/ingest-for-search.md + children: + - file: search/search-pipelines.md - file: search/full-text.md children: - file: search/full-text/search-with-synonyms.md From f63c3720bb1f783fcfd3df3e66c3350cd5a758f0 Mon Sep 17 00:00:00 2001 From: Liam Thompson Date: Wed, 5 Feb 2025 13:16:55 +0100 Subject: [PATCH 2/8] Update checklist, delete temp anchors --- solutions/search/ingest-for-search.md | 2 +- solutions/search/search-pipelines.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/solutions/search/ingest-for-search.md b/solutions/search/ingest-for-search.md index 05623262b6..bf41dfd45c 100644 --- a/solutions/search/ingest-for-search.md +++ b/solutions/search/ingest-for-search.md @@ -61,7 +61,7 @@ Refer to % - [x] ./raw-migrated-files/elasticsearch/elasticsearch-reference/es-ingestion-overview.md % - [x] ./raw-migrated-files/docs-content/serverless/elasticsearch-ingest-data-through-api.md -% - [ ] ./raw-migrated-files/elasticsearch/elasticsearch-reference/ingest-pipeline-search.md +% - [x] ./raw-migrated-files/elasticsearch/elasticsearch-reference/ingest-pipeline-search.md % - [x] ./raw-migrated-files/docs-content/serverless/elasticsearch-ingest-your-data.md % Internal links rely on the following IDs being on this page (e.g. as a heading ID, paragraph ID, etc): diff --git a/solutions/search/search-pipelines.md b/solutions/search/search-pipelines.md index 7522f69165..3ed8f7d91b 100644 --- a/solutions/search/search-pipelines.md +++ b/solutions/search/search-pipelines.md @@ -209,4 +209,4 @@ You should not rename this pipeline. * `ent-search-generic-ingestion` - Since 8.5, Native Connectors, Connector Clients, and new (>8.4) Elastic web crawler indices all made use of this pipeline by default. This pipeline evolved into the `search-default-ingestion` pipeline. * `search-default-ingestion` - Since 9.0, Connectors have made use of this pipeline by default. You can [read more about this pipeline](/solutions/search/ingest-for-search.md#ingest-pipeline-search-details-generic-reference) above. As this pipeline is "managed", any modifications that were made to `app_search_crawler` and/or `ent_search_crawler` should NOT be made to `search-default-ingestion`. Instead, if such customizations are desired, you should utilize [Index-specific ingest pipelines](/solutions/search/ingest-for-search.md#ingest-pipeline-search-details-specific), placing all modifications in the `@custom` pipeline(s). -:::: +:::: \ No newline at end of file From fe73f1c39907782c74faf18623cfd7c8e159ee22 Mon Sep 17 00:00:00 2001 From: Liam Thompson Date: Wed, 5 Feb 2025 13:17:23 +0100 Subject: [PATCH 3/8] idem --- solutions/search/ingest-for-search.md | 25 ------------------------- 1 file changed, 25 deletions(-) diff --git a/solutions/search/ingest-for-search.md b/solutions/search/ingest-for-search.md index bf41dfd45c..e67e927fa2 100644 --- a/solutions/search/ingest-for-search.md +++ b/solutions/search/ingest-for-search.md @@ -63,28 +63,3 @@ Refer to % - [x] ./raw-migrated-files/docs-content/serverless/elasticsearch-ingest-data-through-api.md % - [x] ./raw-migrated-files/elasticsearch/elasticsearch-reference/ingest-pipeline-search.md % - [x] ./raw-migrated-files/docs-content/serverless/elasticsearch-ingest-your-data.md - -% Internal links rely on the following IDs being on this page (e.g. as a heading ID, paragraph ID, etc): - - - - - - - - -$$$ingest-pipeline-search-details-specific-ml-reference$$$ - -$$$ingest-pipeline-search-in-enterprise-search$$$ - -$$$ingest-pipeline-search-details-generic-reference$$$ - -$$$ingest-pipeline-search-details-specific-custom-reference$$$ - -$$$ingest-pipeline-search-details-specific-reference-processors$$$ - -$$$ingest-pipeline-search-details-specific$$$ - -$$$ingest-pipeline-search-pipeline-settings-using-the-api$$$ - -$$$ingest-pipeline-search-pipeline-settings$$$ \ No newline at end of file From 5c272a08408a10e5cc6bd4fbf142336ed709287a Mon Sep 17 00:00:00 2001 From: Liam Thompson Date: Wed, 5 Feb 2025 13:22:41 +0100 Subject: [PATCH 4/8] Fix anchor links --- solutions/search/search-pipelines.md | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/solutions/search/search-pipelines.md b/solutions/search/search-pipelines.md index 3ed8f7d91b..f1afd04856 100644 --- a/solutions/search/search-pipelines.md +++ b/solutions/search/search-pipelines.md @@ -32,11 +32,11 @@ These tools can be particularly helpful by providing a layer of customization an It can be a lot of work to set up and manage production-ready pipelines from scratch. Considerations such as error handling, conditional execution, sequencing, versioning, and modularization must all be taken into account. -To this end, when you create indices for search use cases, (including [Elastic web crawler](https://www.elastic.co/guide/en/enterprise-search/current/crawler.html), [connectors](https://www.elastic.co/guide/en/elasticsearch/reference/current/es-connectors.html). , and API indices), each index already has a pipeline set up with several processors that optimize your content for search. +To this end, when you create indices for search use cases, (including web crawler, search connectors and API indices), each index already has a pipeline set up with several processors that optimize your content for search. -This pipeline is called `search-default-ingestion`. While it is a "managed" pipeline (meaning it should not be tampered with), you can view its details via the Kibana UI or the Elasticsearch API. You can also [read more about its contents below](/solutions/search/ingest-for-search.md#ingest-pipeline-search-details-generic-reference). +This pipeline is called `search-default-ingestion`. While it is a "managed" pipeline (meaning it should not be tampered with), you can view its details via the Kibana UI or the Elasticsearch API. You can also [read more about its contents below](#ingest-pipeline-search-details-generic-reference). -You can control whether you run some of these processors. While all features are enabled by default, they are eligible for opt-out. For [Elastic crawler](https://www.elastic.co/guide/en/enterprise-search/current/crawler.html) and [connectors](https://www.elastic.co/guide/en/elasticsearch/reference/current/es-connectors.html). , you can opt out (or back in) per index, and your choices are saved. For API indices, you can opt out (or back in) by including specific fields in your documents. [See below for details](/solutions/search/ingest-for-search.md#ingest-pipeline-search-pipeline-settings-using-the-api). +You can control whether you run some of these processors. While all features are enabled by default, they are eligible for opt-out. For [Elastic crawler](https://www.elastic.co/guide/en/enterprise-search/current/crawler.html) and [connectors](https://www.elastic.co/guide/en/elasticsearch/reference/current/es-connectors.html). , you can opt out (or back in) per index, and your choices are saved. For API indices, you can opt out (or back in) by including specific fields in your documents. [See below for details](#ingest-pipeline-search-pipeline-settings-using-the-api). At the deployment level, you can change the default settings for all new indices. This will not effect existing indices. @@ -46,7 +46,7 @@ Each index also provides the capability to easily create index-specific ingest p 2. `@custom` 3. `@ml-inference` -Like `search-default-ingestion`, the first of these is "managed", but the other two can and should be modified to fit your needs. You can view these pipelines using the platform tools (Kibana UI, Elasticsearch API), and can also [read more about their content below](/solutions/search/ingest-for-search.md#ingest-pipeline-search-details-specific). +Like `search-default-ingestion`, the first of these is "managed", but the other two can and should be modified to fit your needs. You can view these pipelines using the platform tools (Kibana UI, Elasticsearch API), and can also [read more about their content below](#ingest-pipeline-search-details-specific). ## Pipeline Settings [ingest-pipeline-search-pipeline-settings] @@ -98,7 +98,7 @@ If the pipeline is not specified, the underscore-prefixed fields will actually b You can access this pipeline with the [Elasticsearch Ingest Pipelines API](https://www.elastic.co/guide/en/elasticsearch/reference/current/get-pipeline-api.html) or via Kibana’s [Stack Management > Ingest Pipelines](/manage-data//ingest/transform-enrich/ingest-pipelines.md#create-manage-ingest-pipelines) UI. ::::{warning} -This pipeline is a "managed" pipeline. That means that it is not intended to be edited. Editing/updating this pipeline manually could result in unintended behaviors, or difficulty in upgrading in the future. If you want to make customizations, we recommend you utilize index-specific pipelines (see below), specifically [the `@custom` pipeline](/solutions/search/ingest-for-search.md#ingest-pipeline-search-details-specific-custom-reference). +This pipeline is a "managed" pipeline. That means that it is not intended to be edited. Editing/updating this pipeline manually could result in unintended behaviors, or difficulty in upgrading in the future. If you want to make customizations, we recommend you utilize index-specific pipelines (see below), specifically [the `@custom` pipeline](#ingest-pipeline-search-details-specific-custom-reference). :::: @@ -121,7 +121,7 @@ The `search-default-ingestion` pipeline does not always run all processors. It u * `_extract_binary_content` - if this field is present and has a value of `true` on a source document, the pipeline will attempt to run the `attachment`, `set_body`, and `remove_replacement_chars` processors. Note that the document will also need an `_attachment` field populated with base64-encoded binary data in order for the `attachment` processor to have any output. If the `_extract_binary_content` field is missing or `false` on a source document, these processors will be skipped. * `_reduce_whitespace` - if this field is present and has a value of `true` on a source document, the pipeline will attempt to run the `remove_extra_whitespace` and `trim` processors. These processors only apply to the `body` field. If the `_reduce_whitespace` field is missing or `false` on a source document, these processors will be skipped. -Crawler, Native Connectors, and Connector Clients will automatically add these control flow parameters based on the settings in the index’s Pipeline tab. To control what settings any new indices will have upon creation, see the deployment wide content settings. See [Pipeline Settings](/solutions/search/ingest-for-search.md#ingest-pipeline-search-pipeline-settings). +Crawler, Native Connectors, and Connector Clients will automatically add these control flow parameters based on the settings in the index’s Pipeline tab. To control what settings any new indices will have upon creation, see the deployment wide content settings. See [Pipeline Settings](#ingest-pipeline-search-pipeline-settings). ### Index-specific ingest pipelines [ingest-pipeline-search-details-specific] @@ -137,7 +137,7 @@ The "copy and customize" button is not available at all Elastic subscription lev #### `` Reference [ingest-pipeline-search-details-specific-reference] -This pipeline looks and behaves a lot like the [`search-default-ingestion` pipeline](/solutions/search/ingest-for-search.md#ingest-pipeline-search-details-generic-reference), but with [two additional processors](/solutions/search/ingest-for-search.md#ingest-pipeline-search-details-specific-reference-processors). +This pipeline looks and behaves a lot like the [`search-default-ingestion` pipeline](#ingest-pipeline-search-details-generic-reference), but with [two additional processors](#ingest-pipeline-search-details-specific-reference-processors). ::::{warning} You should not rename this pipeline. @@ -146,7 +146,7 @@ You should not rename this pipeline. ::::{warning} -This pipeline is a "managed" pipeline. That means that it is not intended to be edited. Editing/updating this pipeline manually could result in unintended behaviors, or difficulty in upgrading in the future. If you want to make customizations, we recommend you utilize [the `@custom` pipeline](/solutions/search/ingest-for-search.md#ingest-pipeline-search-details-specific-custom-reference). +This pipeline is a "managed" pipeline. That means that it is not intended to be edited. Editing/updating this pipeline manually could result in unintended behaviors, or difficulty in upgrading in the future. If you want to make customizations, we recommend you utilize [the `@custom` pipeline](#ingest-pipeline-search-details-specific-custom-reference). :::: @@ -154,7 +154,7 @@ This pipeline is a "managed" pipeline. That means that it is not intended to be ##### Processors [ingest-pipeline-search-details-specific-reference-processors] -In addition to the processors inherited from the [`search-default-ingestion` pipeline](/solutions/search/ingest-for-search.md#ingest-pipeline-search-details-generic-reference), the index-specific pipeline also defines: +In addition to the processors inherited from the [`search-default-ingestion` pipeline](#ingest-pipeline-search-details-generic-reference), the index-specific pipeline also defines: * `index_ml_inference_pipeline` - this uses the [Pipeline](https://www.elastic.co/guide/en/elasticsearch/reference/current/pipeline-processor.html) processor to run the `@ml-inference` pipeline. This processor will only be run if the source document includes a `_run_ml_inference` field with the value `true`. * `index_custom_pipeline` - this uses the [Pipeline](https://www.elastic.co/guide/en/elasticsearch/reference/current/pipeline-processor.html) processor to run the `@custom` pipeline. @@ -166,7 +166,7 @@ Like the `search-default-ingestion` pipeline, the `` pipeline does n * `_run_ml_inference` - if this field is present and has a value of `true` on a source document, the pipeline will attempt to run the `index_ml_inference_pipeline` processor. If the `_run_ml_inference` field is missing or `false` on a source document, this processor will be skipped. -Crawler, Native Connectors, and Connector Clients will automatically add these control flow parameters based on the settings in the index’s Pipeline tab. To control what settings any new indices will have upon creation, see the deployment wide content settings. See [Pipeline Settings](/solutions/search/ingest-for-search.md#ingest-pipeline-search-pipeline-settings). +Crawler, Native Connectors, and Connector Clients will automatically add these control flow parameters based on the settings in the index’s Pipeline tab. To control what settings any new indices will have upon creation, see the deployment wide content settings. See [Pipeline Settings](#ingest-pipeline-search-pipeline-settings). #### `@ml-inference` Reference [ingest-pipeline-search-details-specific-ml-reference] @@ -204,9 +204,9 @@ You should not rename this pipeline. ## Upgrading notes [ingest-pipeline-search-upgrading-notes] ::::{dropdown} Expand to see upgrading notes -* `app_search_crawler` - Since 8.3, {{app-search-crawler}} has utilized this pipeline to power its binary content extraction. You can read more about this pipeline and its usage in the [App Search Guide](https://www.elastic.co/guide/en/app-search/current/web-crawler-reference.html#web-crawler-reference-binary-content-extraction). When upgrading from 8.3 to 8.5+, be sure to note any changes that you made to the `app_search_crawler` pipeline. These changes should be re-applied to each index’s `@custom` pipeline in order to ensure a consistent data processing experience. In 8.5+, the [index setting to enable binary content](/solutions/search/ingest-for-search.md#ingest-pipeline-search-pipeline-settings) is required **in addition** to the configurations mentioned in the [App Search Guide](https://www.elastic.co/guide/en/app-search/current/web-crawler-reference.html#web-crawler-reference-binary-content-extraction). -* `ent_search_crawler` - Since 8.4, the Elastic web crawler has utilized this pipeline to power its binary content extraction. You can read more about this pipeline and its usage in the [Elastic web crawler Guide](https://www.elastic.co/guide/en/enterprise-search/current/crawler-managing.html#crawler-managing-binary-content). When upgrading from 8.4 to 8.5+, be sure to note any changes that you made to the `ent_search_crawler` pipeline. These changes should be re-applied to each index’s `@custom` pipeline in order to ensure a consistent data processing experience. In 8.5+, the [index setting to enable binary content](/solutions/search/ingest-for-search.md#ingest-pipeline-search-pipeline-settings) is required **in addition** to the configurations mentioned in the [Elastic web crawler Guide](https://www.elastic.co/guide/en/enterprise-search/current/crawler-managing.html#crawler-managing-binary-content). +* `app_search_crawler` - Since 8.3, {{app-search-crawler}} has utilized this pipeline to power its binary content extraction. You can read more about this pipeline and its usage in the [App Search Guide](https://www.elastic.co/guide/en/app-search/current/web-crawler-reference.html#web-crawler-reference-binary-content-extraction). When upgrading from 8.3 to 8.5+, be sure to note any changes that you made to the `app_search_crawler` pipeline. These changes should be re-applied to each index’s `@custom` pipeline in order to ensure a consistent data processing experience. In 8.5+, the [index setting to enable binary content](#ingest-pipeline-search-pipeline-settings) is required **in addition** to the configurations mentioned in the [App Search Guide](https://www.elastic.co/guide/en/app-search/current/web-crawler-reference.html#web-crawler-reference-binary-content-extraction). +* `ent_search_crawler` - Since 8.4, the Elastic web crawler has utilized this pipeline to power its binary content extraction. You can read more about this pipeline and its usage in the [Elastic web crawler Guide](https://www.elastic.co/guide/en/enterprise-search/current/crawler-managing.html#crawler-managing-binary-content). When upgrading from 8.4 to 8.5+, be sure to note any changes that you made to the `ent_search_crawler` pipeline. These changes should be re-applied to each index’s `@custom` pipeline in order to ensure a consistent data processing experience. In 8.5+, the [index setting to enable binary content](#ingest-pipeline-search-pipeline-settings) is required **in addition** to the configurations mentioned in the [Elastic web crawler Guide](https://www.elastic.co/guide/en/enterprise-search/current/crawler-managing.html#crawler-managing-binary-content). * `ent-search-generic-ingestion` - Since 8.5, Native Connectors, Connector Clients, and new (>8.4) Elastic web crawler indices all made use of this pipeline by default. This pipeline evolved into the `search-default-ingestion` pipeline. -* `search-default-ingestion` - Since 9.0, Connectors have made use of this pipeline by default. You can [read more about this pipeline](/solutions/search/ingest-for-search.md#ingest-pipeline-search-details-generic-reference) above. As this pipeline is "managed", any modifications that were made to `app_search_crawler` and/or `ent_search_crawler` should NOT be made to `search-default-ingestion`. Instead, if such customizations are desired, you should utilize [Index-specific ingest pipelines](/solutions/search/ingest-for-search.md#ingest-pipeline-search-details-specific), placing all modifications in the `@custom` pipeline(s). +* `search-default-ingestion` - Since 9.0, Connectors have made use of this pipeline by default. You can [read more about this pipeline](#ingest-pipeline-search-details-generic-reference) above. As this pipeline is "managed", any modifications that were made to `app_search_crawler` and/or `ent_search_crawler` should NOT be made to `search-default-ingestion`. Instead, if such customizations are desired, you should utilize [Index-specific ingest pipelines](#ingest-pipeline-search-details-specific), placing all modifications in the `@custom` pipeline(s). :::: \ No newline at end of file From e4ac195dbc5197eb46e86bc56b5c86626d127cea Mon Sep 17 00:00:00 2001 From: Liam Thompson Date: Wed, 5 Feb 2025 13:29:01 +0100 Subject: [PATCH 5/8] Fix links in E&A doc --- explore-analyze/machine-learning/nlp/inference-processing.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/explore-analyze/machine-learning/nlp/inference-processing.md b/explore-analyze/machine-learning/nlp/inference-processing.md index e9f668b529..6452b7cee7 100644 --- a/explore-analyze/machine-learning/nlp/inference-processing.md +++ b/explore-analyze/machine-learning/nlp/inference-processing.md @@ -5,7 +5,7 @@ mapped_pages: # Inference processing [ingest-pipeline-search-inference] -When you create an index through the **Content** UI, a set of default ingest pipelines are also created, including a ML inference pipeline. The [ML inference pipeline](../../../solutions/search/ingest-for-search.md#ingest-pipeline-search-details-specific-ml-reference) uses inference processors to analyze fields and enrich documents with the output. Inference processors use ML trained models, so you need to use a built-in model or [deploy a trained model in your cluster^](ml-nlp-deploy-models.md) to use this feature. +When you create an index through the **Content** UI, a set of default ingest pipelines are also created, including a ML inference pipeline. The [ML inference pipeline](/solutions/search/search-pipelines.md#ingest-pipeline-search-details-specific-ml-reference) uses inference processors to analyze fields and enrich documents with the output. Inference processors use ML trained models, so you need to use a built-in model or [deploy a trained model in your cluster^](ml-nlp-deploy-models.md) to use this feature. This guide focuses on the ML inference pipeline, its use, and how to manage it. @@ -129,7 +129,7 @@ To ensure the ML inference pipeline will be run when ingesting documents, you mu ## Learn More [ingest-pipeline-search-inference-learn-more] -* See [Overview](../../../solutions/search/ingest-for-search.md#ingest-pipeline-search-in-enterprise-search) for information on the various pipelines that are created. +* See [Overview](/solutions/search/search-pipelines.md#ingest-pipeline-search-in-enterprise-search) for information on the various pipelines that are created. * Learn about [ELSER](ml-nlp-elser.md), Elastic’s proprietary retrieval model for semantic search with sparse vectors. * [NER HuggingFace Models](https://huggingface.co/models?library=pytorch&pipeline_tag=token-classification&sort=downloads) * [Text Classification HuggingFace Models](https://huggingface.co/models?library=pytorch&pipeline_tag=text-classification&sort=downloads) From e5ef3edb3351d77bd43ca12a6233121367a84c8a Mon Sep 17 00:00:00 2001 From: Liam Thompson Date: Wed, 5 Feb 2025 13:32:03 +0100 Subject: [PATCH 6/8] Fix typo --- solutions/search/search-pipelines.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/solutions/search/search-pipelines.md b/solutions/search/search-pipelines.md index f1afd04856..78ede152ea 100644 --- a/solutions/search/search-pipelines.md +++ b/solutions/search/search-pipelines.md @@ -95,7 +95,7 @@ If the pipeline is not specified, the underscore-prefixed fields will actually b ### `search-default-ingestion` Reference [ingest-pipeline-search-details-generic-reference] -You can access this pipeline with the [Elasticsearch Ingest Pipelines API](https://www.elastic.co/guide/en/elasticsearch/reference/current/get-pipeline-api.html) or via Kibana’s [Stack Management > Ingest Pipelines](/manage-data//ingest/transform-enrich/ingest-pipelines.md#create-manage-ingest-pipelines) UI. +You can access this pipeline with the [Elasticsearch Ingest Pipelines API](https://www.elastic.co/guide/en/elasticsearch/reference/current/get-pipeline-api.html) or via Kibana’s [Stack Management > Ingest Pipelines](/manage-data/ingest/transform-enrich/ingest-pipelines.md#create-manage-ingest-pipelines) UI. ::::{warning} This pipeline is a "managed" pipeline. That means that it is not intended to be edited. Editing/updating this pipeline manually could result in unintended behaviors, or difficulty in upgrading in the future. If you want to make customizations, we recommend you utilize index-specific pipelines (see below), specifically [the `@custom` pipeline](#ingest-pipeline-search-details-specific-custom-reference). @@ -116,7 +116,7 @@ This pipeline is a "managed" pipeline. That means that it is not intended to be #### Control flow parameters [ingest-pipeline-search-details-generic-reference-params] -The `search-default-ingestion` pipeline does not always run all processors. It utilizes a feature of ingest pipelines to [conditionally run processors](/manage-data//ingest/transform-enrich/ingest-pipelines.md#conditionally-run-processor) based on the contents of each individual document. +The `search-default-ingestion` pipeline does not always run all processors. It utilizes a feature of ingest pipelines to [conditionally run processors](/manage-data/ingest/transform-enrich/ingest-pipelines.md#conditionally-run-processor) based on the contents of each individual document. * `_extract_binary_content` - if this field is present and has a value of `true` on a source document, the pipeline will attempt to run the `attachment`, `set_body`, and `remove_replacement_chars` processors. Note that the document will also need an `_attachment` field populated with base64-encoded binary data in order for the `attachment` processor to have any output. If the `_extract_binary_content` field is missing or `false` on a source document, these processors will be skipped. * `_reduce_whitespace` - if this field is present and has a value of `true` on a source document, the pipeline will attempt to run the `remove_extra_whitespace` and `trim` processors. These processors only apply to the `body` field. If the `_reduce_whitespace` field is missing or `false` on a source document, these processors will be skipped. @@ -192,7 +192,7 @@ The `monitor_ml` Elasticsearch cluster permission is required in order to manage This pipeline is empty to start (no processors), but can be added to via the Kibana UI either through the Pipelines tab of your index, or from the **Stack Management > Ingest Pipelines** page. Unlike the `search-default-ingestion` pipeline and the `` pipeline, this pipeline is NOT "managed". -You are encouraged to make additions and edits to this pipeline, provided its name remains the same. This provides a convenient hook from which to add custom processing and transformations for your data. Be sure to read the [docs for ingest pipelines](/manage-data//ingest/transform-enrich/ingest-pipelines.md) to see what options are available. +You are encouraged to make additions and edits to this pipeline, provided its name remains the same. This provides a convenient hook from which to add custom processing and transformations for your data. Be sure to read the [docs for ingest pipelines](/manage-data/ingest/transform-enrich/ingest-pipelines.md) to see what options are available. ::::{warning} You should not rename this pipeline. From 2bd76a701ac795a442fe206c0adc83f6b4bd6dd8 Mon Sep 17 00:00:00 2001 From: Liam Thompson Date: Wed, 5 Feb 2025 13:43:24 +0100 Subject: [PATCH 7/8] Cleanup --- solutions/search/ingest-for-search.md | 12 +++--------- 1 file changed, 3 insertions(+), 9 deletions(-) diff --git a/solutions/search/ingest-for-search.md b/solutions/search/ingest-for-search.md index e67e927fa2..9437ce8d57 100644 --- a/solutions/search/ingest-for-search.md +++ b/solutions/search/ingest-for-search.md @@ -54,12 +54,6 @@ You can also transform and enrich your content at ingest time with ingest pipeli You can manage ingest pipelines through Elasticsearch APIs or Kibana UIs. -The Content UI under Search has a set of tools for creating and managing indices optimized for search use cases (non time series data). You can also manage your ingest pipelines in this UI. -Refer to - -% Use migrated content from existing pages that map to this page: - -% - [x] ./raw-migrated-files/elasticsearch/elasticsearch-reference/es-ingestion-overview.md -% - [x] ./raw-migrated-files/docs-content/serverless/elasticsearch-ingest-data-through-api.md -% - [x] ./raw-migrated-files/elasticsearch/elasticsearch-reference/ingest-pipeline-search.md -% - [x] ./raw-migrated-files/docs-content/serverless/elasticsearch-ingest-your-data.md +::::{tip} +The UI also has a set of tools for creating and managing indices optimized for search use cases. You can also manage your ingest pipelines in this UI. Learn more in [](search-pipelines.md) +:::: \ No newline at end of file From 0a03795041528bb19bd7d28d3c6ed02905505579 Mon Sep 17 00:00:00 2001 From: Liam Thompson Date: Wed, 5 Feb 2025 15:22:20 +0100 Subject: [PATCH 8/8] Cleanup headings, pipelines section --- solutions/search/ingest-for-search.md | 21 ++++++--------------- 1 file changed, 6 insertions(+), 15 deletions(-) diff --git a/solutions/search/ingest-for-search.md b/solutions/search/ingest-for-search.md index 9437ce8d57..0fc7cd3129 100644 --- a/solutions/search/ingest-for-search.md +++ b/solutions/search/ingest-for-search.md @@ -21,19 +21,19 @@ Search use cases usually focus on general **content**, typically text-heavy data Once you've decided how to [deploy Elastic](/deploy-manage/index.md), the next step is getting your content into {{es}}. Your choice of ingestion method depends on where your content lives and how you need to access it. -There are several methods to ingest data into {es} for search use cases. Choose one or more based on your requirements. +There are several methods to ingest data into {{es}} for search use cases. Choose one or more based on your requirements. ::::{tip} If you just want to do a quick test, you can load [sample data](/manage-data/ingest/sample-data.md) into your {{es}} cluster using the UI. :::: -## Ingest data using APIs [es-ingestion-overview-apis] +## Use APIs [es-ingestion-overview-apis] You can use the [`_bulk` API](https://www.elastic.co/docs/api/doc/elasticsearch/v8/group/endpoint-document) to add data to your {{es}} indices, using any HTTP client, including the [{{es}} client libraries](/solutions/search/site-or-app/clients.md). While the {{es}} APIs can be used for any data type, Elastic provides specialized tools that optimize ingestion for specific use cases. -## Specialized tools [es-ingestion-overview-general-content] +## Use specialized tools [es-ingestion-overview-general-content] You can use these specialized tools to add general content to {{es}} indices. @@ -43,17 +43,8 @@ You can use these specialized tools to add general content to {{es}} indices. | [**Search connectors**]() | Third-party integrations to popular content sources like databases, cloud storage, and business applications | Choose from a range of Elastic-built connectors or build your own in Python using the Elastic connector framework| | [**File upload**](/manage-data/ingest/tools/upload-data-files.md)| One-off manual uploads through the UI | Useful for testing or very small-scale use cases, but not recommended for production workflows | -### (Optional) Content processing +### Process data at ingest time -You can also transform and enrich your content at ingest time with ingest pipelines, or at query time with runtime fields. Choose the right approach based on your requirements: +You can also transform and enrich your content at ingest time using [ingest pipelines](/manage-data/ingest/transform-enrich/ingest-pipelines.md). -| Processing Type | Description | Use cases | -|----------------|-------------|------------| -| **Ingest pipelines** | Choose from a range of built-in processors or create custom processors | Data enrichment, content extraction from PDFs, ML inference, custom business logic | -| **Runtime fields** | Fields computed during query execution | Price calculation with current exchange rates, distance calculations, user-specific scoring | - -You can manage ingest pipelines through Elasticsearch APIs or Kibana UIs. - -::::{tip} -The UI also has a set of tools for creating and managing indices optimized for search use cases. You can also manage your ingest pipelines in this UI. Learn more in [](search-pipelines.md) -:::: \ No newline at end of file +The Elastic UI has a set of tools for creating and managing indices optimized for search use cases. You can also manage your ingest pipelines in this UI. Learn more in [](search-pipelines.md). \ No newline at end of file