diff --git a/bin/find-notebooks-to-test.sh b/bin/find-notebooks-to-test.sh
index b603fae4..01a410ed 100755
--- a/bin/find-notebooks-to-test.sh
+++ b/bin/find-notebooks-to-test.sh
@@ -30,6 +30,7 @@ EXEMPT_NOTEBOOKS=(
"notebooks/integrations/llama3/rag-elastic-llama3.ipynb"
"notebooks/integrations/azure-openai/vector-search-azure-openai-elastic.ipynb"
"notebooks/enterprise-search/app-search-engine-exporter.ipynb",
+ "notebooks/enterprise-search/elastic-crawler-to-open-crawler-migration.ipynb",
"notebooks/playground-examples/bedrock-anthropic-elasticsearch-client.ipynb",
"notebooks/playground-examples/openai-elasticsearch-client.ipynb",
"notebooks/integrations/hugging-face/huggingface-integration-millions-of-documents-with-cohere-reranking.ipynb",
diff --git a/notebooks/enterprise-search/elastic-crawler-to-open-crawler-migration.ipynb b/notebooks/enterprise-search/elastic-crawler-to-open-crawler-migration.ipynb
new file mode 100644
index 00000000..0c1d48ed
--- /dev/null
+++ b/notebooks/enterprise-search/elastic-crawler-to-open-crawler-migration.ipynb
@@ -0,0 +1,563 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "89b4646f-6a71-44e0-97b9-846319bf0162",
+ "metadata": {},
+ "source": [
+ "## Hello, future Open Crawler user!\n",
+ "[](https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/enterprise-search/elastic-crawler-to-open-crawler-migration.ipynb)\n",
+ "\n",
+ "This notebook is designed to help you migrate your Elastic Crawler configurations to Open Crawler-friendly YAML!\n",
+ "\n",
+ "We recommend running each cell individually in a sequential fashion, as each cell is dependent on previous cells having been run. Furthermore, we recommend that you only run each cell once as re-running cells may result in errors or incorrect YAML files.\n",
+ "\n",
+ "### Setup\n",
+ "First, let's start by making sure `elasticsearch` and other required dependencies are installed and imported by running the following cell:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "da411d2f-9aff-46af-845a-5fe9be19ea3c",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!pip install elasticsearch\n",
+ "\n",
+ "from getpass import getpass\n",
+ "from elasticsearch import Elasticsearch\n",
+ "\n",
+ "import os\n",
+ "import json\n",
+ "import yaml\n",
+ "import pprint"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f4131f88-9895-4c0e-8b0a-6ec7b3b45653",
+ "metadata": {},
+ "source": [
+ "We are going to need a few things from your Elasticsearch deployment before we can migrate your configurations:\n",
+ "- Your **Elasticsearch Endpoint URL**\n",
+ "- Your **Elasticsearch Endpoint Port number**\n",
+ "- An **API key**\n",
+ "\n",
+ "You can find your Endpoint URL and port number by visiting your Elasticsearch Overview page in Kibana.\n",
+ "\n",
+ "You can create a new API key from the Stack Management -> API keys menu in Kibana. Be sure to copy or write down your key in a safe place, as it will be displayed only once upon creation."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "08e6e3d2-62d3-4890-a6be-41fe0a931ef6",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "ELASTIC_ENDPOINT = getpass(\"Elastic Endpoint: \")\n",
+ "ELASTIC_PORT = getpass(\"Port\")\n",
+ "API_KEY = getpass(\"Elastic Api Key: \")\n",
+ "\n",
+ "es_client = Elasticsearch(\n",
+ " \":\".join([ELASTIC_ENDPOINT, ELASTIC_PORT]),\n",
+ " api_key=API_KEY,\n",
+ ")\n",
+ "\n",
+ "# ping ES to make sure we have positive connection\n",
+ "es_client.info()[\"tagline\"]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "85f99942-58ae-437d-a72b-70b8d1f4432c",
+ "metadata": {},
+ "source": [
+ "Hopefully you received our tagline 'You Know, for Search'. If so, we are connected and ready to go!\n",
+ "\n",
+ "If not, please double-check your Cloud ID and API key that you provided above. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a55236e7-19dc-4f4c-92b9-d10848dd6af9",
+ "metadata": {},
+ "source": [
+ "### Step 1: Acquire Basic Configurations\n",
+ "\n",
+ "First, we need to establish what Crawlers you have and their basic configuration details.\n",
+ "This migration notebook will attempt to pull configurations for every distinct Crawler you have in your Elasticsearch instance."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "0a698b05-e939-42a5-aa31-51b1b1883e6f",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# in-memory data structure that maintains current state of the configs we've pulled\n",
+ "inflight_configuration_data = {}\n",
+ "\n",
+ "crawler_configurations = es_client.search(\n",
+ " index=\".ent-search-actastic-crawler2_configurations_v2\",\n",
+ ")\n",
+ "\n",
+ "crawler_counter = 1\n",
+ "for configuration in crawler_configurations[\"hits\"][\"hits\"]:\n",
+ " source = configuration[\"_source\"]\n",
+ "\n",
+ " # extract values\n",
+ " crawler_oid = source[\"id\"]\n",
+ " output_index = source[\"index_name\"]\n",
+ "\n",
+ " print(f\"{crawler_counter}. {output_index}\")\n",
+ " crawler_counter += 1\n",
+ "\n",
+ " crawl_schedule = (\n",
+ " []\n",
+ " ) # either no schedule or a specific schedule - determined in Step 4\n",
+ " if (\n",
+ " source[\"use_connector_schedule\"] == False and source[\"crawl_schedule\"]\n",
+ " ): # an interval schedule is being used\n",
+ " print(\n",
+ " f\" {output_index} uses an interval schedule, which is not supported in Open Crawler!\"\n",
+ " )\n",
+ "\n",
+ " # populate a temporary hashmap\n",
+ " temp_conf_map = {\"output_index\": output_index, \"schedule\": crawl_schedule}\n",
+ " # pre-populate some necessary fields in preparation for upcoming steps\n",
+ " temp_conf_map[\"domains_temp\"] = {}\n",
+ " temp_conf_map[\"output_sink\"] = \"elasticsearch\"\n",
+ " temp_conf_map[\"full_html_extraction_enabled\"] = False\n",
+ " temp_conf_map[\"elasticsearch\"] = {\n",
+ " \"host\": \"\",\n",
+ " \"port\": \"\",\n",
+ " \"api_key\": \"\",\n",
+ " }\n",
+ " # populate the in-memory data structure\n",
+ " inflight_configuration_data[crawler_oid] = temp_conf_map"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2804d02b-870d-4173-9c5f-6d5eb434d49b",
+ "metadata": {},
+ "source": [
+ "**Before continuing, please verify in the output above that the correct number of Crawlers was found.**\n",
+ "\n",
+ "Now that we have some basic data about your Crawlers, let's use this information to get more configuration values!"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2b9e2da7-853c-40bd-9ee1-02c4d92b3b43",
+ "metadata": {},
+ "source": [
+ "### Step 2: URLs, Sitemaps, and Crawl Rules\n",
+ "\n",
+ "In the next cell, we will need to query Elasticsearch for information about each Crawler's domain URLs, seed URLs, sitemaps, and crawling rules."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "e1c64c3d-c8d7-4236-9ed9-c9b1cb5e7972",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "crawler_ids_to_query = inflight_configuration_data.keys()\n",
+ "\n",
+ "crawler_counter = 1\n",
+ "for crawler_oid in crawler_ids_to_query:\n",
+ " # query ES to get the crawler's domain configurations\n",
+ " crawler_domains = es_client.search(\n",
+ " index=\".ent-search-actastic-crawler2_domains\",\n",
+ " query={\"match\": {\"configuration_oid\": crawler_oid}},\n",
+ " _source=[\n",
+ " \"name\",\n",
+ " \"configuration_oid\",\n",
+ " \"id\",\n",
+ " \"sitemaps\",\n",
+ " \"crawl_rules\",\n",
+ " \"seed_urls\",\n",
+ " \"auth\",\n",
+ " ],\n",
+ " )\n",
+ " print(f\"{crawler_counter}.) Crawler ID {crawler_oid}\")\n",
+ " crawler_counter += 1\n",
+ "\n",
+ " # for each domain the Crawler has, grab its config values\n",
+ " # and update the in-memory data structure\n",
+ " for domain_info in crawler_domains[\"hits\"][\"hits\"]:\n",
+ " source = domain_info[\"_source\"]\n",
+ "\n",
+ " # extract values\n",
+ " domain_oid = str(source[\"id\"])\n",
+ " domain_url = source[\"name\"]\n",
+ " seed_urls = source[\"seed_urls\"]\n",
+ " sitemap_urls = source[\"sitemaps\"]\n",
+ " crawl_rules = source[\"crawl_rules\"]\n",
+ "\n",
+ " print(f\" Domain {domain_url} found!\")\n",
+ "\n",
+ " # transform seed, sitemap, and crawl rules into arrays\n",
+ " seed_urls_list = []\n",
+ " for seed_obj in seed_urls:\n",
+ " seed_urls_list.append(seed_obj[\"url\"])\n",
+ "\n",
+ " sitemap_urls_list = []\n",
+ " for sitemap_obj in sitemap_urls:\n",
+ " sitemap_urls_list.append(sitemap_obj[\"url\"])\n",
+ "\n",
+ " crawl_rules_list = []\n",
+ " for crawl_rules_obj in crawl_rules:\n",
+ " crawl_rules_list.append(\n",
+ " {\n",
+ " \"policy\": crawl_rules_obj[\"policy\"],\n",
+ " \"type\": crawl_rules_obj[\"rule\"],\n",
+ " \"pattern\": crawl_rules_obj[\"pattern\"],\n",
+ " }\n",
+ " )\n",
+ "\n",
+ " # populate a temporary hashmap\n",
+ " temp_domain_conf = {\"url\": domain_url}\n",
+ " if seed_urls_list:\n",
+ " temp_domain_conf[\"seed_urls\"] = seed_urls_list\n",
+ " print(f\" Seed URls found: {seed_urls_list}\")\n",
+ " if sitemap_urls_list:\n",
+ " temp_domain_conf[\"sitemap_urls\"] = sitemap_urls_list\n",
+ " print(f\" Sitemap URLs found: {sitemap_urls_list}\")\n",
+ " if crawl_rules_list:\n",
+ " temp_domain_conf[\"crawl_rules\"] = crawl_rules_list\n",
+ " print(f\" Crawl rules found: {crawl_rules_list}\")\n",
+ "\n",
+ " # populate the in-memory data structure\n",
+ " inflight_configuration_data[crawler_oid][\"domains_temp\"][\n",
+ " domain_oid\n",
+ " ] = temp_domain_conf\n",
+ " print()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "575c00ac-7c84-465e-83d7-aa51f8e5310d",
+ "metadata": {},
+ "source": [
+ "### Step 3: Extracting the Extraction Rules\n",
+ "\n",
+ "In the next cell, we will find any extraction rules you set for your Elastic Crawlers."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "61a7df7a-72ad-4330-a30c-da319befd55c",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "extraction_rules = es_client.search(\n",
+ " index=\".ent-search-actastic-crawler2_extraction_rules\",\n",
+ " _source=[\"configuration_oid\", \"domain_oid\", \"rules\", \"url_filters\"],\n",
+ ")\n",
+ "\n",
+ "extr_count = 1\n",
+ "for exr_rule in extraction_rules[\"hits\"][\"hits\"]:\n",
+ " source = exr_rule[\"_source\"]\n",
+ "\n",
+ " config_oid = source[\"configuration_oid\"]\n",
+ " domain_oid = source[\"domain_oid\"]\n",
+ "\n",
+ " all_rules = source[\"rules\"]\n",
+ " all_url_filters = source[\"url_filters\"]\n",
+ "\n",
+ " # extract url filters\n",
+ " url_filters = []\n",
+ " if all_url_filters:\n",
+ " url_filters = [\n",
+ " {\n",
+ " \"type\": all_url_filters[0][\"filter\"],\n",
+ " \"pattern\": all_url_filters[0][\"pattern\"],\n",
+ " }\n",
+ " ]\n",
+ "\n",
+ " # extract rulesets\n",
+ " action_translation_map = {\n",
+ " \"fixed\": \"set\",\n",
+ " \"extracted\": \"extract\",\n",
+ " }\n",
+ "\n",
+ " ruleset = {}\n",
+ " if all_rules:\n",
+ " ruleset = [\n",
+ " {\n",
+ " \"action\": action_translation_map[\n",
+ " all_rules[0][\"content_from\"][\"value_type\"]\n",
+ " ],\n",
+ " \"field_name\": all_rules[0][\"field_name\"],\n",
+ " \"selector\": all_rules[0][\"selector\"],\n",
+ " \"join_as\": all_rules[0][\"multiple_objects_handling\"],\n",
+ " \"value\": all_rules[0][\"content_from\"][\"value\"],\n",
+ " \"source\": all_rules[0][\"source_type\"],\n",
+ " }\n",
+ " ]\n",
+ "\n",
+ " # populate the in-memory data structure\n",
+ " temp_extraction_rulesets = [\n",
+ " {\n",
+ " \"url_filters\": url_filters,\n",
+ " \"rules\": ruleset,\n",
+ " }\n",
+ " ]\n",
+ "\n",
+ " print(\n",
+ " f\"{extr_count}.) Crawler {config_oid} has extraction rules {temp_extraction_rulesets}\\n\"\n",
+ " )\n",
+ " extr_count += 1\n",
+ " inflight_configuration_data[config_oid][\"domains_temp\"][domain_oid][\n",
+ " \"extraction_rulesets\"\n",
+ " ] = temp_extraction_rulesets"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "538fb054-1399-4b88-bd1e-fef116491421",
+ "metadata": {},
+ "source": [
+ "### Step 4: Schedules\n",
+ "\n",
+ "In the next cell, we will gather any specific time schedules your Crawlers have set. Please note that _interval time schedules_ are not supported by Open Crawler and will be ignored."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "d880e081-f960-41c7-921e-26896f248eab",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def convert_quartz_to_cron(quartz_schedule: str) -> str:\n",
+ " _, minutes, hours, day_of_month, month, day_of_week, year = (\n",
+ " quartz_schedule.split(\" \") + [None]\n",
+ " )[:7]\n",
+ "\n",
+ " # Day of week is 1-7 starting from Sunday in Quartz\n",
+ " # and from Monday in regular Cron, adjust:\n",
+ " # Days before: 1 - SUN, 2 - MON ... 7 - SAT\n",
+ " # Days after: 1 - MON, 2 - TUE ... 7 - SUN\n",
+ " if day_of_week.isnumeric():\n",
+ " day_of_week = (int(day_of_week) - 2) % 7 + 1\n",
+ "\n",
+ " # ignore year\n",
+ " repackaged_definition = f\"{minutes} {hours} {day_of_month} {month} {day_of_week} \"\n",
+ "\n",
+ " # ? comes from Quartz Cron, regular cron doesn't handle it well\n",
+ " repackaged_definition = repackaged_definition.replace(\"?\", \"*\")\n",
+ " return repackaged_definition\n",
+ "\n",
+ "\n",
+ "# ---------------------------------------------------------------\n",
+ "\n",
+ "crawler_counter = 1\n",
+ "for crawler_oid, crawler_config in inflight_configuration_data.items():\n",
+ " output_index = crawler_config[\"output_index\"]\n",
+ "\n",
+ " existing_schedule_value = crawler_config[\"schedule\"]\n",
+ "\n",
+ " if not existing_schedule_value:\n",
+ " # query ES to get this Crawler's specific time schedule\n",
+ " schedules_result = es_client.search(\n",
+ " index=\".elastic-connectors-v1\",\n",
+ " query={\"match\": {\"index_name\": output_index}},\n",
+ " _source=[\"index_name\", \"scheduling\"],\n",
+ " )\n",
+ " # update schedule field with cron expression if specific time scheduling is enabled\n",
+ " if schedules_result[\"hits\"][\"hits\"][0][\"_source\"][\"scheduling\"][\"full\"][\n",
+ " \"enabled\"\n",
+ " ]:\n",
+ " quartz_schedule = schedules_result[\"hits\"][\"hits\"][0][\"_source\"][\n",
+ " \"scheduling\"\n",
+ " ][\"full\"][\"interval\"]\n",
+ " crawler_config[\"schedule\"] = convert_quartz_to_cron(quartz_schedule)\n",
+ " print(\n",
+ " f\"{crawler_counter}.) Crawler {output_index} has the schedule {crawler_config['schedule']}\"\n",
+ " )\n",
+ " crawler_counter += 1"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b1586df2-283d-435f-9b08-ba9fad3a7e0a",
+ "metadata": {},
+ "source": [
+ "### Step 5: Creating the Open Crawler YAML configuration files\n",
+ "\n",
+ "In this final step, we will create the actual YAML files you need to get up and running with Open Crawler!\n",
+ "\n",
+ "The next cell performs some final transformations to the in-memory data structure that is keeping track of your configurations."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "dd70f102-33ee-4106-8861-0aa0f9a223a1",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Final transform of the in-memory data structure to a form we can dump to YAML\n",
+ "# for each crawler, collect all of its domain configurations into a list\n",
+ "for crawler_oid, crawler_config in inflight_configuration_data.items():\n",
+ " all_crawler_domains = []\n",
+ "\n",
+ " for domain_config in crawler_config[\"domains_temp\"].values():\n",
+ " all_crawler_domains.append(domain_config)\n",
+ " # create a new key called \"domains\" that points to a list of domain configs only - no domain_oid values as keys\n",
+ " crawler_config[\"domains\"] = all_crawler_domains\n",
+ " # delete the temporary domain key\n",
+ " del crawler_config[\"domains_temp\"]\n",
+ " print(f\"Transform for {crawler_oid} complete!\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e611a486-e12f-4951-ab95-ca54241a7a06",
+ "metadata": {},
+ "source": [
+ "#### **Wait! Before we continue onto creating our YAML files, we're going to need your input on a few things.**\n",
+ "\n",
+ "In the next cell, please enter the following details about the _Elasticsearch instance you will be using with Open Crawler_. This instance can be Elastic Cloud Hosted, Serverless, or a local instance.\n",
+ "\n",
+ "- The Elasticsearch endpoint URL\n",
+ "- The port number of your Elasticsearch endpoint _(Optional, will default to 443 if left blank)_\n",
+ "- An API key"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "213880cc-cbf3-40d9-8c7d-6fcf6428c16b",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "ENDPOINT = getpass(\"Elasticsearch endpoint URL: \")\n",
+ "PORT = getpass(\"[OPTIONAL] Elasticsearch endpoint port number: \")\n",
+ "OUTPUT_API_KEY = getpass(\"Elasticsearch API key: \")\n",
+ "\n",
+ "# set the above values in each Crawler's configuration\n",
+ "for crawler_config in inflight_configuration_data.values():\n",
+ " crawler_config[\"elasticsearch\"][\"host\"] = ENDPOINT\n",
+ " crawler_config[\"elasticsearch\"][\"port\"] = int(PORT) if PORT else 443\n",
+ " crawler_config[\"elasticsearch\"][\"api_key\"] = OUTPUT_API_KEY\n",
+ "\n",
+ "# ping ES to make sure we have positive connection\n",
+ "es_client = Elasticsearch(\n",
+ " \":\".join([ENDPOINT, PORT]),\n",
+ " api_key=OUTPUT_API_KEY,\n",
+ ")\n",
+ "\n",
+ "es_client.info()[\"tagline\"]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "67dfc7c6-429e-42f0-ab08-2c84d72945cb",
+ "metadata": {},
+ "source": [
+ "#### **This is the final step! You have two options here:**\n",
+ "\n",
+ "- The \"Write to YAML\" cell will create _n_ number of YAML files, one for each Crawler you have.\n",
+ "- The \"Print to output\" cell will print each Crawler's configuration YAML in the Notebook, so you can copy-paste them into your Open Crawler YAML files manually.\n",
+ "\n",
+ "Feel free to run both! You can run Option 2 first to see the output before running Option 1 to save the configs into YAML files."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7ca5ad33-364c-4d13-88fc-db19052363d5",
+ "metadata": {},
+ "source": [
+ "#### Option 1: Write to YAML file"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6adc53db-d781-4b72-a5f3-441364f354b8",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Dump each Crawler's configuration into its own YAML file\n",
+ "for crawler_config in inflight_configuration_data.values():\n",
+ " base_dir = os.getcwd()\n",
+ " file_name = (\n",
+ " f\"{crawler_config['output_index']}-config.yml\" # autogen a custom filename\n",
+ " )\n",
+ " output_path = os.path.join(base_dir, file_name)\n",
+ "\n",
+ " if os.path.exists(base_dir):\n",
+ " with open(output_path, \"w\") as file:\n",
+ " yaml.safe_dump(crawler_config, file, sort_keys=False)\n",
+ " print(f\" Wrote {file_name} to {output_path}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "35c56a2b-4acd-47f5-90e3-9dd39fa4383f",
+ "metadata": {},
+ "source": [
+ "#### Option 2: Print to output"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "525aabb8-0537-4ba6-8109-109490dddafe",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "for crawler_config in inflight_configuration_data.values():\n",
+ " yaml_out = yaml.safe_dump(crawler_config, sort_keys=False)\n",
+ "\n",
+ " print(f\"YAML config => {crawler_config['output_index']}-config.yml\\n--------\")\n",
+ " print(yaml_out)\n",
+ " print(\n",
+ " \"--------------------------------------------------------------------------------\"\n",
+ " )"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "dd4d18de-7b3b-4ebe-831b-c96bc55d6eb9",
+ "metadata": {},
+ "source": [
+ "### Next Steps\n",
+ "\n",
+ "Now that the YAML files have been generated, you can visit the Open Crawler GitHub repository to learn more about how to deploy Open Crawler: https://github.com/elastic/crawler#quickstart\n",
+ "\n",
+ "If you find any problems with this Notebook, please feel free to create an issue in the elasticsearch-labs repository: https://github.com/elastic/elasticsearch-labs/issues"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.12.8"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/supporting-blog-content/unifying-elastic-vector-database-and-llms-for-intelligent-query/Unifying_Elastic_Vector_Database_and_LLMs_for_Intelligent_Query.ipynb b/supporting-blog-content/unifying-elastic-vector-database-and-llms-for-intelligent-query/Unifying_Elastic_Vector_Database_and_LLMs_for_Intelligent_Query.ipynb
index c939e542..5a3be672 100644
--- a/supporting-blog-content/unifying-elastic-vector-database-and-llms-for-intelligent-query/Unifying_Elastic_Vector_Database_and_LLMs_for_Intelligent_Query.ipynb
+++ b/supporting-blog-content/unifying-elastic-vector-database-and-llms-for-intelligent-query/Unifying_Elastic_Vector_Database_and_LLMs_for_Intelligent_Query.ipynb
@@ -10,7 +10,7 @@
"# Objective\n",
"This notebook demonstrates how blending the capabilities of Elasticsearch as a vector database (VectorDB), search templates, and LLM functions can provide an intelligent query layer.\n",
"\n",
- "
\n",
+ "
\n",
"\n",
"\n",
"- **Elasticsearch as the VectorDB**: Acts as the core search engine, storing and retrieving dense vector embeddings efficiently.\n",
@@ -96,97 +96,97 @@
},
{
"cell_type": "markdown",
- "source": [
- "###Completions Endpoint & API Key"
- ],
"metadata": {
"id": "MReWDS6VmHih"
- }
+ },
+ "source": [
+ "###Completions Endpoint & API Key"
+ ]
},
{
"cell_type": "markdown",
- "source": [
- ""
- ],
"metadata": {
"id": "_i3PT0v2lVHI"
- }
+ },
+ "source": [
+ ""
+ ]
},
{
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "D4giT3X4lgbm"
+ },
+ "outputs": [],
"source": [
"ENDPOINT = getpass(\"Azure OpenAI Completions Endpoint: \")\n",
"\n",
"AZURE_API_KEY = getpass(\"Azure OpenAI API Key: \")"
- ],
- "metadata": {
- "id": "D4giT3X4lgbm"
- },
- "execution_count": null,
- "outputs": []
+ ]
},
{
"cell_type": "markdown",
- "source": [
- "###Deployment Name"
- ],
"metadata": {
"id": "s6WU8bZ9mFfB"
- }
+ },
+ "source": [
+ "###Deployment Name"
+ ]
},
{
"cell_type": "markdown",
- "source": [
- ""
- ],
"metadata": {
"id": "OjfotdiRlZDU"
- }
+ },
+ "source": [
+ ""
+ ]
},
{
"cell_type": "code",
- "source": [
- "DEPLOYMENT_NAME = getpass(\"Azure OpenAI Deployment Name: \")\n",
- "deployment_name = DEPLOYMENT_NAME"
- ],
+ "execution_count": null,
"metadata": {
"id": "nOLvH3M_ll64"
},
- "execution_count": null,
- "outputs": []
+ "outputs": [],
+ "source": [
+ "DEPLOYMENT_NAME = getpass(\"Azure OpenAI Deployment Name: \")\n",
+ "deployment_name = DEPLOYMENT_NAME"
+ ]
},
{
"cell_type": "markdown",
- "source": [
- "###API Version"
- ],
"metadata": {
"id": "_xVZyR5rmDvS"
- }
+ },
+ "source": [
+ "###API Version"
+ ]
},
{
"cell_type": "markdown",
- "source": [
- ""
- ],
"metadata": {
"id": "XkJzWUM3lvwO"
- }
+ },
+ "source": [
+ ""
+ ]
},
{
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "vVzZyB8ml0wa"
+ },
+ "outputs": [],
"source": [
"API_VERSION = getpass(\"Completions Endpoint API Version: \")\n",
"\n",
"client = AzureOpenAI(\n",
" azure_endpoint=ENDPOINT, api_key=AZURE_API_KEY, api_version=API_VERSION\n",
")"
- ],
- "metadata": {
- "id": "vVzZyB8ml0wa"
- },
- "execution_count": null,
- "outputs": []
+ ]
},
{
"cell_type": "markdown",
@@ -199,12 +199,12 @@
},
{
"cell_type": "markdown",
- "source": [
- ""
- ],
"metadata": {
"id": "bckurffTnkI4"
- }
+ },
+ "source": [
+ ""
+ ]
},
{
"cell_type": "code",
@@ -1032,8 +1032,8 @@
},
"outputs": [
{
- "output_type": "stream",
"name": "stdout",
+ "output_type": "stream",
"text": [
"\n",
"Formatted Messages:\n",
@@ -1138,4 +1138,4 @@
},
"nbformat": 4,
"nbformat_minor": 0
-}
\ No newline at end of file
+}