## Hello, future Elastic Open Crawler user!
This notebook is designed to help you painlessly migrate your Elastic Crawler configurations to Open Crawler-friendly YAML!

We recommend running each cell individually in a sequential fashion, as each cell is dependent on previous cells having been run.

_If you are running this notebook inside Google Colab, or have not installed elasticsearch in your local environment yet, please run the following cell to make sure the Python `elasticsearch` client is installed._

### Setup
First, let's start by making sure `elasticsearch` and other required dependencies are installed and imported by running the following cell:

In [1]:
!pip install elasticsearch

from getpass import getpass
from elasticsearch import Elasticsearch
import json



We are going to need a few things from your Elasticsearch deployment before we can migrate your configurations:
- Your **Elasticsearch Cloud ID**
- An **API key**

To find the Cloud ID for your deployment, go to https://cloud.elastic.co/deployments and select your deployment.
You can create a new API key from the Stack Management -> API keys menu in Kibana. Be sure to copy or write down your key in a safe place once it is created it will be displayed only upon creation.

In [6]:
ELASTIC_CLOUD_ID = getpass("Elastic Cloud ID: ")
API_KEY = getpass("Elastic Api Key: ")

Elastic Cloud ID:  ········
Elastic Api Key:  ········


Great! Now let's try connecting to your Elasticsearch instance.

In [7]:
es_client = Elasticsearch(
    cloud_id=ELASTIC_CLOUD_ID,
    api_key=API_KEY,
)

# ping ES to make sure we have positive connection
es_client.info()['tagline']

'You Know, for Search'

Hopefully you received our tagline 'You Know, for Search'. If so, we are connected and ready to go!

If not, please double-check your Cloud ID and API key that you provided above. 

#### Step 1: Grabbing basic configurations

The first order of business is to establish what Crawlers you have, and their basic configuration details.
This migration notebook will attempt to pull configurations for every distinct Crawler you have in your Elasticsearch instance.

In [4]:
# define an intermediate data structure
inflight_configuration_data = {}

crawler_configurations = es_client.search(
    index=".ent-search-actastic-crawler2_extraction_rules",
)

for configuration in crawler_configurations["hits"]["hits"]:
    source = configuration['_source']
    conf_map = {} # this will be the entire config hashmap for a single Crawler
    output_index = configuration["_index"]