# Load NER data into Elasticsearch

The Gretel Console (https://console.gretel.cloud) provides several views and tools to help explore your data.  For those who cannot or do not wish to use the Console, another option is to load the data into Elasticsearch for exploration and reporting.  This blueprint uses Gretel sample data to bootstrap a new project and shows a simple workflow for loading that data into an Elasticsearch cluster.  We run a sample query to verify that records were loaded and show how to look for an NER label in your records.

In [None]:
!docker-compose up -d

In [None]:
%%capture

import sys

# Elasticsearch client version should match cluster version.  See https://elasticsearch-py.readthedocs.io/en/master/#compatibility
!{sys.executable} -m pip install -U gretel-client elasticsearch

In [None]:
# Be sure to use your Gretel API key here, which is available from the Profile menu in the Console

import getpass
import os

gretel_api_key = os.getenv("GRETEL_API_KEY") or getpass.getpass("Your Gretel API Key")

In [None]:
# Install Gretel SDKs and bootstrap the project

from gretel_client import project_from_uri, get_cloud_client

client = get_cloud_client("api", gretel_api_key)
client.install_packages()
project = client.get_project(create=True)

project.send_bulk(client.get_sample('bike-customer-orders'))

In [None]:
from copy import deepcopy

index_name = "gretel_ner_blueprint"

def trim_record(record):
    trim = {}
    # Keep the original record
    trim['record'] = deepcopy(record['record'])
    # Keep ingest_time for time series
    trim['ingest_time'] = record['ingest_time']
    # Keep just score_* lists for simplicity
    trim['score_high'] = record['metadata']['entities']['score_high']
    trim['score_med'] = record['metadata']['entities']['score_med']
    trim['score_low'] = record['metadata']['entities']['score_low']
    # Specify the Elasticsearch index for the record.
    trim['_index'] = index_name
    return trim


In [None]:
from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk

elasticsearch_host = 'localhost'
elasticsearch_port = 9200
es = Elasticsearch(
    hosts=[{'host': elasticsearch_host, 'port': elasticsearch_port}]
)

bulk(
    es, 
    project.iter_records(
        # NOTE that default direction is "forward".  Client will block and wait for new records.
        direction="backward",
        post_process=trim_record, 
        params={"flatten": "no"}))


In [None]:
# Find records tagged with the NER 'location' label.  They must also be in London with 3 or more cars.
# Include an aggregation showing the number of children.
aggregation_query = {
    "query": {
        "bool": {
            "must": [
                {"match": {"record.City": "London"}}, 
                {"range": {"record.NumberCarsOwned": {"gte": 3}}},
                {"match": {"score_high": "location"}}
            ]
        }
    },
    "size": 3,
    "aggs" : {
        "children": {
            "terms": {"field": "record.TotalChildren"}
        }
    }
}

es.search(index=index_name, body=aggregation_query)

In [None]:
# Clean up
!docker-compose down
project.delete()