# Enriching and updating data using lookup tables
<!--
  ~ Licensed to the Apache Software Foundation (ASF) under one
  ~ or more contributor license agreements.  See the NOTICE file
  ~ distributed with this work for additional information
  ~ regarding copyright ownership.  The ASF licenses this file
  ~ to you under the Apache License, Version 2.0 (the
  ~ "License"); you may not use this file except in compliance
  ~ with the License.  You may obtain a copy of the License at
  ~
  ~   http://www.apache.org/licenses/LICENSE-2.0
  ~
  ~ Unless required by applicable law or agreed to in writing,
  ~ software distributed under the License is distributed on an
  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  ~ KIND, either express or implied.  See the License for the
  ~ specific language governing permissions and limitations
  ~ under the License.
  -->

Lookups are key/value-pair tables distributed around query processes in a cluster that can be referenced at ingestion and query time that can be updated regularly either manually or automatically.

## Prerequisites

This tutorial works with Druid 27.0.0 or later.

#### Run with Docker

Launch this tutorial and all prerequisites using the `druid-jupyter` profile of the Docker Compose file for Jupyter-based Druid tutorials. For more information, see the Learn Druid repository [readme](https://github.com/implydata/learn-druid).
   
#### Run without Docker

If you do not use the Docker Compose environment, you need the following:

* A running Apache Druid instance, with a `DRUID_HOST` local environment variable containing the server name of your Druid router
* [druidapi](https://github.com/apache/druid/blob/master/examples/quickstart/jupyter-notebooks/druidapi/README.md), a Python client for Apache Druid. Follow the instructions in the Install section of the README file.

## Initialization

Run the next cell to set up the Druid Python client's connection to Apache Druid.

If successful, the Druid version number will be shown in the output.

In [None]:
import druidapi
import os

if 'DRUID_HOST' not in os.environ.keys():
    druid_host=f"http://localhost:8888"
else:
    druid_host=f"http://{os.environ['DRUID_HOST']}:8888"
    
print(f"Opening a connection to {druid_host}.")
druid = druidapi.jupyter_client(druid_host)

display = druid.display
sql_client = druid.sql
status_client = druid.status

status_client.version

### Load example data

Once your Druid environment is up and running, ingest the sample data for this tutorial.

Run the following cell to create a table called `example-flights-lookup`. The query only ingests specific columns from the source data.

When completed, you'll see a description of the final table.

In [None]:
sql='''
REPLACE INTO "example-flights-lookup" OVERWRITE ALL
WITH "ext" AS (SELECT *
FROM TABLE(
  EXTERN(
    '{"type":"http","uris":["https://static.imply.io/example-data/flight_on_time/flights/On_Time_Reporting_Carrier_On_Time_Performance_(1987_present)_2005_11.csv.zip"]}',
    '{"type":"csv","findColumnsFromHeader":true}'
  )
) EXTEND ("depaturetime" VARCHAR, "arrivalime" VARCHAR, "Year" BIGINT, "Quarter" BIGINT, "Month" BIGINT, "DayofMonth" BIGINT, "DayOfWeek" BIGINT, "FlightDate" VARCHAR, "Reporting_Airline" VARCHAR, "DOT_ID_Reporting_Airline" BIGINT, "IATA_CODE_Reporting_Airline" VARCHAR, "Tail_Number" VARCHAR, "Flight_Number_Reporting_Airline" BIGINT, "OriginAirportID" BIGINT, "OriginAirportSeqID" BIGINT, "OriginCityMarketID" BIGINT, "Origin" VARCHAR, "OriginCityName" VARCHAR, "OriginState" VARCHAR, "OriginStateFips" BIGINT, "OriginStateName" VARCHAR, "OriginWac" BIGINT, "DestAirportID" BIGINT, "DestAirportSeqID" BIGINT, "DestCityMarketID" BIGINT, "Dest" VARCHAR, "DestCityName" VARCHAR, "DestState" VARCHAR, "DestStateFips" BIGINT, "DestStateName" VARCHAR, "DestWac" BIGINT, "CRSDepTime" BIGINT, "DepTime" BIGINT, "DepDelay" BIGINT, "DepDelayMinutes" BIGINT, "DepDel15" BIGINT, "DepartureDelayGroups" BIGINT, "DepTimeBlk" VARCHAR, "TaxiOut" BIGINT, "WheelsOff" BIGINT, "WheelsOn" BIGINT, "TaxiIn" BIGINT, "CRSArrTime" BIGINT, "ArrTime" BIGINT, "ArrDelay" BIGINT, "ArrDelayMinutes" BIGINT, "ArrDel15" BIGINT, "ArrivalDelayGroups" BIGINT, "ArrTimeBlk" VARCHAR, "Cancelled" BIGINT, "CancellationCode" VARCHAR, "Diverted" BIGINT, "CRSElapsedTime" BIGINT, "ActualElapsedTime" BIGINT, "AirTime" BIGINT, "Flights" BIGINT, "Distance" BIGINT, "DistanceGroup" BIGINT, "CarrierDelay" BIGINT, "WeatherDelay" BIGINT, "NASDelay" BIGINT, "SecurityDelay" BIGINT, "LateAircraftDelay" BIGINT, "FirstDepTime" VARCHAR, "TotalAddGTime" VARCHAR, "LongestAddGTime" VARCHAR, "DivAirportLandings" VARCHAR, "DivReachedDest" VARCHAR, "DivActualElapsedTime" VARCHAR, "DivArrDelay" VARCHAR, "DivDistance" VARCHAR, "Div1Airport" VARCHAR, "Div1AirportID" VARCHAR, "Div1AirportSeqID" VARCHAR, "Div1WheelsOn" VARCHAR, "Div1TotalGTime" VARCHAR, "Div1LongestGTime" VARCHAR, "Div1WheelsOff" VARCHAR, "Div1TailNum" VARCHAR, "Div2Airport" VARCHAR, "Div2AirportID" VARCHAR, "Div2AirportSeqID" VARCHAR, "Div2WheelsOn" VARCHAR, "Div2TotalGTime" VARCHAR, "Div2LongestGTime" VARCHAR, "Div2WheelsOff" VARCHAR, "Div2TailNum" VARCHAR, "Div3Airport" VARCHAR, "Div3AirportID" VARCHAR, "Div3AirportSeqID" VARCHAR, "Div3WheelsOn" VARCHAR, "Div3TotalGTime" VARCHAR, "Div3LongestGTime" VARCHAR, "Div3WheelsOff" VARCHAR, "Div3TailNum" VARCHAR, "Div4Airport" VARCHAR, "Div4AirportID" VARCHAR, "Div4AirportSeqID" VARCHAR, "Div4WheelsOn" VARCHAR, "Div4TotalGTime" VARCHAR, "Div4LongestGTime" VARCHAR, "Div4WheelsOff" VARCHAR, "Div4TailNum" VARCHAR, "Div5Airport" VARCHAR, "Div5AirportID" VARCHAR, "Div5AirportSeqID" VARCHAR, "Div5WheelsOn" VARCHAR, "Div5TotalGTime" VARCHAR, "Div5LongestGTime" VARCHAR, "Div5WheelsOff" VARCHAR, "Div5TailNum" VARCHAR, "Unnamed: 109" VARCHAR))
SELECT
  TIME_PARSE("depaturetime") AS "__time",
  "Reporting_Airline",
  "Tail_Number",
  "Distance",
  "Origin",
  "Dest"
FROM "ext"
PARTITIONED BY DAY
'''

display.run_task(sql)
sql_client.wait_until_ready('example-flights-lookup')
display.table('example-flights-lookup')

Run the following cell to import the datetime module, used later to create a version number for each [lookup datasource](https://druid.apache.org/docs/latest/querying/datasource#lookup) that you will create.

The cell also contains two functions - `postLookup` and `waitForLookup`, which require the requests and time Python modules.

* You will use the `postLookup` function to call the [lookup configuration API](https://druid.apache.org/docs/latest/api-reference/lookups-api) to create and update lookup tables.
* The `waitForLookup` function will be used to give you feedback on Druid's progress in [distributing](https://druid.apache.org/docs/latest/querying/lookups#configuration-propagation-behavior) the lookup table around the query-serving processes in Druid.

In [None]:
import datetime

import requests
import time

def postLookup(definition):
    x = requests.post(druid_host + '/druid/coordinator/v1/lookups/config', json=definition)

    if "error" in x.text:
        raise Exception('Not able to complete the request. \n\n'+x.text)
    else:
        print('Successfully submitted the lookup request.')

def waitForLookup(tier, name, ticsMax):

    # The default time period between checks of lookup definition changes (druid.manager.lookups.period)
    # is two minutes. The notebook environment reduces this for learning purposes.
    # 
    # https://druid.apache.org/docs/latest/configuration/#lookups-dynamic-configuration

    tics = 0
    ticsWait = 1    
    ticsMax = min(ticsMax,360)
    ticsSpinner = "/-\|"
    
    apicall = druid_host + '/druid/coordinator/v1/lookups/status/'+tier+'/'+name+'?detailed=true'

    x = requests.get(apicall)

    while (x.text != '{"loaded":true,"pendingNodes":[]}' and tics < ticsMax):
        print(x.text + ' ' + ticsSpinner[tics%len(ticsSpinner)] + ' [' + str(ticsMax-tics) + ']   ', end='\r')
        time.sleep(ticsWait)
        tics += 1
        x = requests.get(apicall) 

    if (tics == ticsMax):
        raise Exception('\nTimeout waiting for Druid to load the ' + name + ' lookup to ' + tier + 'tier. Run the cell again.')
    else:
        print('\nSuccess. ' + name + ' lookup in ' + tier + ' tier is fully available.')

Before a table can be created in the lookup schema, it must be initialized.

Run the following cell, which posts an empty JSON object to the configuration API.

In [None]:
empty_post = {}
postLookup(empty_post)

## Create a lookup table

Run the following cell to set some variables necessary to create a table in the lookup schema:

* lookup [tier](https://druid.apache.org/docs/latest/querying/lookups#dynamic-configuration)
* lookup name

In [None]:
lookup_tier = "__default"
lookup_name = "example-flights-airportsizes"

Run the next cell to create a dictionary object that defines the table.

Notice that:

* The variables above are used in defining the table's name and tier.
* The definition has a version calculated from the current data and time.
* The `type` of `map` is used so that the data for the table can be put directly in the `POST` request to the API.
* The data is contained in the `map` - a series of key / value pairs for the airport code and its size.

> Given the size of the `map` data, you may want to collapse this cell once you have run it.

In [None]:
lookup_definition_version = datetime.datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ")

lookup_definition = {
    lookup_tier: {
        lookup_name: {
            "version": lookup_definition_version,  
            "lookupExtractorFactory": {
                "type": "map",
                    "map": {
"ABE": "medium_airport",
"ABI": "medium_airport",
"ABQ": "large_airport",
"ABY": "medium_airport",
"ACT": "medium_airport",
"ACV": "medium_airport",
"ACY": "medium_airport",
"ADK": "medium_airport",
"ADQ": "medium_airport",
"AEX": "medium_airport",
"AGS": "large_airport",
"AKN": "medium_airport",
"ALB": "medium_airport",
"AMA": "large_airport",
"ANC": "large_airport",
"APF": "medium_airport",
"ATL": "large_airport",
"ATW": "small_airport",
"AUS": "large_airport",
"AVL": "large_airport",
"AVP": "medium_airport",
"AZO": "medium_airport",
"BDL": "large_airport",
"BET": "medium_airport",
"BFL": "medium_airport",
"BGM": "medium_airport",
"BGR": "large_airport",
"BHM": "large_airport",
"BIL": "large_airport",
"BIS": "medium_airport",
"BMI": "large_airport",
"BNA": "large_airport",
"BOI": "large_airport",
"BOS": "large_airport",
"BQK": "medium_airport",
"BQN": "medium_airport",
"BRO": "medium_airport",
"BRW": "medium_airport",
"BTM": "medium_airport",
"BTR": "medium_airport",
"BTV": "medium_airport",
"BUF": "large_airport",
"BUR": "medium_airport",
"BWI": "large_airport",
"BZN": "medium_airport",
"CAE": "large_airport",
"CAK": "medium_airport",
"CDC": "medium_airport",
"CDV": "medium_airport",
"CEC": "medium_airport",
"CHA": "large_airport",
"CHO": "medium_airport",
"CHS": "large_airport",
"CIC": "small_airport",
"CID": "large_airport",
"CRQ": "medium_airport",
"CLE": "large_airport",
"CLL": "medium_airport",
"CLT": "large_airport",
"CMH": "large_airport",
"CMI": "medium_airport",
"COD": "medium_airport",
"COS": "large_airport",
"CPR": "medium_airport",
"CRP": "large_airport",
"CRW": "large_airport",
"CSG": "medium_airport",
"CVG": "large_airport",
"CWA": "medium_airport",
"DAB": "large_airport",
"DAL": "large_airport",
"DAY": "large_airport",
"DBQ": "large_airport",
"DCA": "large_airport",
"DEN": "large_airport",
"DFW": "large_airport",
"DHN": "medium_airport",
"DLG": "medium_airport",
"DLH": "large_airport",
"DSM": "large_airport",
"DTW": "large_airport",
"EGE": "medium_airport",
"EKO": "medium_airport",
"ELP": "medium_airport",
"ERI": "large_airport",
"EUG": "medium_airport",
"EVV": "medium_airport",
"EWR": "large_airport",
"EYW": "medium_airport",
"FAI": "large_airport",
"FAR": "medium_airport",
"FAT": "medium_airport",
"FAY": "medium_airport",
"GPI": "medium_airport",
"FLL": "large_airport",
"FLO": "medium_airport",
"FNT": "medium_airport",
"FSD": "medium_airport",
"FSM": "large_airport",
"FWA": "large_airport",
"GEG": "large_airport",
"GFK": "medium_airport",
"GGG": "medium_airport",
"GJT": "medium_airport",
"GNV": "medium_airport",
"GPT": "large_airport",
"GRB": "large_airport",
"GRK": "medium_airport",
"GRR": "medium_airport",
"GSO": "large_airport",
"GSP": "large_airport",
"GTF": "medium_airport",
"GTR": "medium_airport",
"HDN": "small_airport",
"HKY": "medium_airport",
"HLN": "medium_airport",
"HNL": "large_airport",
"HOU": "large_airport",
"HPN": "medium_airport",
"HRL": "medium_airport",
"HSV": "large_airport",
"HTS": "large_airport",
"HVN": "medium_airport",
"IAD": "large_airport",
"IAH": "large_airport",
"ICT": "large_airport",
"IDA": "medium_airport",
"ILM": "medium_airport",
"IND": "large_airport",
"IPL": "medium_airport",
"ISO": "medium_airport",
"ISP": "medium_airport",
"ITO": "medium_airport",
"IYK": "small_airport",
"JAC": "medium_airport",
"JAN": "large_airport",
"JAX": "large_airport",
"JFK": "large_airport",
"JNU": "medium_airport",
"KOA": "medium_airport",
"KTN": "medium_airport",
"LAN": "medium_airport",
"LAS": "large_airport",
"LAW": "medium_airport",
"LAX": "large_airport",
"LBB": "large_airport",
"LCH": "medium_airport",
"LEX": "large_airport",
"LFT": "large_airport",
"LGA": "large_airport",
"LGB": "medium_airport",
"LIH": "medium_airport",
"LIT": "large_airport",
"LNK": "medium_airport",
"LRD": "medium_airport",
"LSE": "medium_airport",
"LWS": "medium_airport",
"LYH": "medium_airport",
"MAF": "medium_airport",
"MBS": "large_airport",
"MCI": "large_airport",
"MCN": "medium_airport",
"MCO": "large_airport",
"MDT": "medium_airport",
"MDW": "large_airport",
"MEI": "medium_airport",
"MEM": "large_airport",
"MFE": "medium_airport",
"MFR": "medium_airport",
"MGM": "large_airport",
"MHT": "large_airport",
"MIA": "large_airport",
"MKE": "large_airport",
"MLB": "medium_airport",
"MLI": "large_airport",
"MLU": "large_airport",
"MOB": "large_airport",
"MOD": "medium_airport",
"MOT": "medium_airport",
"SAW": "medium_airport",
"MRY": "medium_airport",
"MSN": "large_airport",
"MSO": "medium_airport",
"MSP": "large_airport",
"MSY": "large_airport",
"MTJ": "medium_airport",
"MYR": "medium_airport",
"OAK": "large_airport",
"OGG": "medium_airport",
"OKC": "large_airport",
"OMA": "large_airport",
"OME": "medium_airport",
"ONT": "large_airport",
"ORD": "large_airport",
"ORF": "large_airport",
"OTZ": "medium_airport",
"OXR": "medium_airport",
"PBI": "large_airport",
"PDX": "large_airport",
"PHF": "large_airport",
"PHL": "large_airport",
"PHX": "large_airport",
"PIA": "large_airport",
"PIH": "medium_airport",
"PIT": "large_airport",
"PNS": "medium_airport",
"PSC": "medium_airport",
"PSE": "medium_airport",
"PSG": "medium_airport",
"PSP": "medium_airport",
"PVD": "large_airport",
"PWM": "large_airport",
"RAP": "medium_airport",
"RDD": "medium_airport",
"RDM": "medium_airport",
"RDU": "large_airport",
"RIC": "large_airport",
"RNO": "large_airport",
"ROA": "large_airport",
"ROC": "large_airport",
"RST": "large_airport",
"RSW": "large_airport",
"SAN": "large_airport",
"SAT": "large_airport",
"SAV": "large_airport",
"SBA": "medium_airport",
"SBN": "large_airport",
"SBP": "medium_airport",
"SCC": "medium_airport",
"UNV": "medium_airport",
"SDF": "large_airport",
"SEA": "large_airport",
"SFO": "large_airport",
"SGF": "large_airport",
"SGU": "medium_airport",
"SHV": "medium_airport",
"SIT": "medium_airport",
"SJC": "large_airport",
"SJT": "medium_airport",
"SJU": "large_airport",
"SLC": "large_airport",
"SMF": "large_airport",
"SMX": "medium_airport",
"SNA": "large_airport",
"SPI": "large_airport",
"SPS": "large_airport",
"SRQ": "large_airport",
"STL": "large_airport",
"STT": "medium_airport",
"STX": "medium_airport",
"SUN": "medium_airport",
"SWF": "medium_airport",
"SYR": "large_airport",
"TLH": "large_airport",
"TOL": "large_airport",
"TPA": "large_airport",
"TRI": "large_airport",
"TUL": "large_airport",
"TUP": "medium_airport",
"TUS": "large_airport",
"TVC": "medium_airport",
"TWF": "medium_airport",
"TXK": "medium_airport",
"TYR": "medium_airport",
"TYS": "large_airport",
"VLD": "medium_airport",
"VPS": "large_airport",
"WRG": "medium_airport",
"XNA": "medium_airport",
"YAK": "medium_airport",
"NYL": "medium_airport"
                    }
            }
        }
    }
}

Now run this cell to send the object to the endpoint.

In [None]:
postLookup(lookup_definition)

Now run the following cell to see the status of the lookup request.

You will see a list of the processes waiting to load the lookup data together with a running countdown until the timeout is reached.

In [None]:
waitForLookup(lookup_tier, lookup_name, 30)

## Query a lookup table

Run the following cell, which contains a SQL statement where the `FROM` clause addresses the new `TABLE` in the `LOOKUP` schema.

In [None]:
sql='''
SELECT
    v AS "value",
    COUNT(*) AS "key_count"
FROM lookup."''' + lookup_name + '''"
GROUP BY 1
'''

display.sql(sql)

## JOIN between lookup and Druid tables

Run the cell below.

Along with a basic `GROUP BY` and filter on `__time`, the SQL statement contains:

* A reference to a `TABLE` in the `druid` schema called `example-flights-lookup`
* An explicit reference to a `TABLE` in the `lookup` schema - here referencing the `lookup_name` variable in this notebook
* A JOIN definition between the two tables.

In [None]:
sql='''
SELECT
    b.v AS "airportSize",
    COUNT(DISTINCT a.Origin) AS "airports",
    COUNT(*) AS "flights",
    SUM(a.Distance) AS "totalDistance"
FROM "example-flights-lookup" a
LEFT JOIN lookup."'''+lookup_name+'''" b ON a.Origin = b.k
WHERE TIME_IN_INTERVAL(__time,'2005-11-30T11:00:00/2015-11-30T08:00:00')
GROUP BY 1
'''

display.sql(sql)

Since the default schema for every query is `druid`, the FROM and JOIN clause in the statement above is equivalent to:

```sql
    FROM druid."example-flights-lookup" a
    JOIN lookup."'''+lookup_name+'''" b ON a.Origin = b.k
```

## Update a lookup table definition

Whereas you might use the `pollPeriod` in a [globally cached lookup definition](https://druid.apache.org/docs/latest/development/extensions-core/lookups-cached-global/) to automatically ingest and replace the lookup table data, as you have used a `map` in this notebook, you will need to submit a new lookup definition with a higher version number in order to update the table data.

Run the cell below to construct a new definition to send to the API.

Notice:

* The version is updated with the current date and time by using `now()`.
* The tier and name reuse the values from before.
* All airports that were "large" are now "massive".

> Given the size of the `map` data in the cell, consider collapsing the next cell once you have run it.

In [None]:
lookup_definition_version = datetime.datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ")

lookup_definition = {
    lookup_tier: {
        lookup_name: {
            "version": lookup_definition_version,  
            "lookupExtractorFactory": {
                "type": "map",
                    "map": {
"ABE": "medium_airport",
"ABI": "medium_airport",
"ABQ": "massive_airport",
"ABY": "medium_airport",
"ACT": "medium_airport",
"ACV": "medium_airport",
"ACY": "medium_airport",
"ADK": "medium_airport",
"ADQ": "medium_airport",
"AEX": "medium_airport",
"AGS": "massive_airport",
"AKN": "medium_airport",
"ALB": "medium_airport",
"AMA": "massive_airport",
"ANC": "massive_airport",
"APF": "medium_airport",
"ATL": "massive_airport",
"ATW": "small_airport",
"AUS": "massive_airport",
"AVL": "massive_airport",
"AVP": "medium_airport",
"AZO": "medium_airport",
"BDL": "massive_airport",
"BET": "medium_airport",
"BFL": "medium_airport",
"BGM": "medium_airport",
"BGR": "massive_airport",
"BHM": "massive_airport",
"BIL": "massive_airport",
"BIS": "medium_airport",
"BMI": "massive_airport",
"BNA": "massive_airport",
"BOI": "massive_airport",
"BOS": "massive_airport",
"BQK": "medium_airport",
"BQN": "medium_airport",
"BRO": "medium_airport",
"BRW": "medium_airport",
"BTM": "medium_airport",
"BTR": "medium_airport",
"BTV": "medium_airport",
"BUF": "massive_airport",
"BUR": "medium_airport",
"BWI": "massive_airport",
"BZN": "medium_airport",
"CAE": "massive_airport",
"CAK": "medium_airport",
"CDC": "medium_airport",
"CDV": "medium_airport",
"CEC": "medium_airport",
"CHA": "massive_airport",
"CHO": "medium_airport",
"CHS": "massive_airport",
"CIC": "small_airport",
"CID": "massive_airport",
"CRQ": "medium_airport",
"CLE": "massive_airport",
"CLL": "medium_airport",
"CLT": "massive_airport",
"CMH": "massive_airport",
"CMI": "medium_airport",
"COD": "medium_airport",
"COS": "massive_airport",
"CPR": "medium_airport",
"CRP": "massive_airport",
"CRW": "massive_airport",
"CSG": "medium_airport",
"CVG": "massive_airport",
"CWA": "medium_airport",
"DAB": "massive_airport",
"DAL": "massive_airport",
"DAY": "massive_airport",
"DBQ": "massive_airport",
"DCA": "massive_airport",
"DEN": "massive_airport",
"DFW": "massive_airport",
"DHN": "medium_airport",
"DLG": "medium_airport",
"DLH": "massive_airport",
"DSM": "massive_airport",
"DTW": "massive_airport",
"EGE": "medium_airport",
"EKO": "medium_airport",
"ELP": "medium_airport",
"ERI": "massive_airport",
"EUG": "medium_airport",
"EVV": "medium_airport",
"EWR": "massive_airport",
"EYW": "medium_airport",
"FAI": "massive_airport",
"FAR": "medium_airport",
"FAT": "medium_airport",
"FAY": "medium_airport",
"GPI": "medium_airport",
"FLL": "massive_airport",
"FLO": "medium_airport",
"FNT": "medium_airport",
"FSD": "medium_airport",
"FSM": "massive_airport",
"FWA": "massive_airport",
"GEG": "massive_airport",
"GFK": "medium_airport",
"GGG": "medium_airport",
"GJT": "medium_airport",
"GNV": "medium_airport",
"GPT": "massive_airport",
"GRB": "massive_airport",
"GRK": "medium_airport",
"GRR": "medium_airport",
"GSO": "massive_airport",
"GSP": "massive_airport",
"GTF": "medium_airport",
"GTR": "medium_airport",
"HDN": "small_airport",
"HKY": "medium_airport",
"HLN": "medium_airport",
"HNL": "massive_airport",
"HOU": "massive_airport",
"HPN": "medium_airport",
"HRL": "medium_airport",
"HSV": "massive_airport",
"HTS": "massive_airport",
"HVN": "medium_airport",
"IAD": "massive_airport",
"IAH": "massive_airport",
"ICT": "massive_airport",
"IDA": "medium_airport",
"ILM": "medium_airport",
"IND": "massive_airport",
"IPL": "medium_airport",
"ISO": "medium_airport",
"ISP": "medium_airport",
"ITO": "medium_airport",
"IYK": "small_airport",
"JAC": "medium_airport",
"JAN": "massive_airport",
"JAX": "massive_airport",
"JFK": "massive_airport",
"JNU": "medium_airport",
"KOA": "medium_airport",
"KTN": "medium_airport",
"LAN": "medium_airport",
"LAS": "massive_airport",
"LAW": "medium_airport",
"LAX": "massive_airport",
"LBB": "massive_airport",
"LCH": "medium_airport",
"LEX": "massive_airport",
"LFT": "massive_airport",
"LGA": "massive_airport",
"LGB": "medium_airport",
"LIH": "medium_airport",
"LIT": "massive_airport",
"LNK": "medium_airport",
"LRD": "medium_airport",
"LSE": "medium_airport",
"LWS": "medium_airport",
"LYH": "medium_airport",
"MAF": "medium_airport",
"MBS": "massive_airport",
"MCI": "massive_airport",
"MCN": "medium_airport",
"MCO": "massive_airport",
"MDT": "medium_airport",
"MDW": "massive_airport",
"MEI": "medium_airport",
"MEM": "massive_airport",
"MFE": "medium_airport",
"MFR": "medium_airport",
"MGM": "massive_airport",
"MHT": "massive_airport",
"MIA": "massive_airport",
"MKE": "massive_airport",
"MLB": "medium_airport",
"MLI": "massive_airport",
"MLU": "massive_airport",
"MOB": "massive_airport",
"MOD": "medium_airport",
"MOT": "medium_airport",
"SAW": "medium_airport",
"MRY": "medium_airport",
"MSN": "massive_airport",
"MSO": "medium_airport",
"MSP": "massive_airport",
"MSY": "massive_airport",
"MTJ": "medium_airport",
"MYR": "medium_airport",
"OAK": "massive_airport",
"OGG": "medium_airport",
"OKC": "massive_airport",
"OMA": "massive_airport",
"OME": "medium_airport",
"ONT": "massive_airport",
"ORD": "massive_airport",
"ORF": "massive_airport",
"OTZ": "medium_airport",
"OXR": "medium_airport",
"PBI": "massive_airport",
"PDX": "massive_airport",
"PHF": "massive_airport",
"PHL": "massive_airport",
"PHX": "massive_airport",
"PIA": "massive_airport",
"PIH": "medium_airport",
"PIT": "massive_airport",
"PNS": "medium_airport",
"PSC": "medium_airport",
"PSE": "medium_airport",
"PSG": "medium_airport",
"PSP": "medium_airport",
"PVD": "massive_airport",
"PWM": "massive_airport",
"RAP": "medium_airport",
"RDD": "medium_airport",
"RDM": "medium_airport",
"RDU": "massive_airport",
"RIC": "massive_airport",
"RNO": "massive_airport",
"ROA": "massive_airport",
"ROC": "massive_airport",
"RST": "massive_airport",
"RSW": "massive_airport",
"SAN": "massive_airport",
"SAT": "massive_airport",
"SAV": "massive_airport",
"SBA": "medium_airport",
"SBN": "massive_airport",
"SBP": "medium_airport",
"SCC": "medium_airport",
"UNV": "medium_airport",
"SDF": "massive_airport",
"SEA": "massive_airport",
"SFO": "massive_airport",
"SGF": "massive_airport",
"SGU": "medium_airport",
"SHV": "medium_airport",
"SIT": "medium_airport",
"SJC": "massive_airport",
"SJT": "medium_airport",
"SJU": "massive_airport",
"SLC": "massive_airport",
"SMF": "massive_airport",
"SMX": "medium_airport",
"SNA": "massive_airport",
"SPI": "massive_airport",
"SPS": "massive_airport",
"SRQ": "massive_airport",
"STL": "massive_airport",
"STT": "medium_airport",
"STX": "medium_airport",
"SUN": "medium_airport",
"SWF": "medium_airport",
"SYR": "massive_airport",
"TLH": "massive_airport",
"TOL": "massive_airport",
"TPA": "massive_airport",
"TRI": "massive_airport",
"TUL": "massive_airport",
"TUP": "medium_airport",
"TUS": "massive_airport",
"TVC": "medium_airport",
"TWF": "medium_airport",
"TXK": "medium_airport",
"TYR": "medium_airport",
"TYS": "massive_airport",
"VLD": "medium_airport",
"VPS": "massive_airport",
"WRG": "medium_airport",
"XNA": "medium_airport",
"YAK": "medium_airport",
"NYL": "medium_airport"
                    }
            }
        }
    }
}

Use the next cell to post this new definition to the API.

In [None]:
postLookup(lookup_definition)

Now run the next cell to keep an eye on the distribution process.

In [None]:
waitForLookup(lookup_tier, lookup_name, 30)

With the table data updated, run the next cell to see the updated data.

In [None]:
sql='''
SELECT
    b.v AS "airportSize",
    COUNT(DISTINCT a.Origin) AS "airports",
    COUNT(*) AS "flights",
    SUM(a.Distance) AS "totalDistance"
FROM "example-flights-lookup" a
LEFT JOIN lookup."'''+lookup_name+'''" b ON a.Origin = b.k
WHERE TIME_IN_INTERVAL(__time,'2005-11-30T11:00:00/2015-11-30T08:00:00')
GROUP BY 1
'''

display.sql(sql)

## Join to multiple lookups

Run the following cell to create a new table definition, again using a `map` to inline the data direct to the API.

In [None]:
lookup2_name = "example-flights-airportnames"
lookup2_version = datetime.datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ")

lookup2_definition = {
    lookup_tier: {
        lookup2_name: {
            "version": lookup2_version,  
            "lookupExtractorFactory": {
                "type": "map",
                    "map": {
"ABE": "Lehigh Valley International Airport",
"ABI": "Abilene Regional Airport",
"ABQ": "Albuquerque International Sunport",
"ABY": "Southwest Georgia Regional Airport",
"ACT": "Waco Regional Airport",
"ACV": "California Redwood Coast-Humboldt County Airport",
"ACY": "Atlantic City International Airport",
"ADK": "Adak Airport",
"ADQ": "Kodiak Airport",
"AEX": "Alexandria International Airport",
"AGS": "Augusta Regional At Bush Field",
"AKN": "King Salmon Airport",
"ALB": "Albany International Airport",
"AMA": "Rick Husband Amarillo International Airport",
"ANC": "Ted Stevens Anchorage International Airport",
"APF": "Naples Municipal Airport",
"ATL": "Hartsfield Jackson Atlanta International Airport",
"ATW": "Appleton International Airport",
"AUS": "Austin Bergstrom International Airport",
"AVL": "Asheville Regional Airport",
"AVP": "Wilkes Barre Scranton International Airport",
"AZO": "Kalamazoo Battle Creek International Airport",
"BDL": "Bradley International Airport",
"BET": "Bethel Airport",
"BFL": "Meadows Field",
"BGM": "Greater Binghamton/Edwin A Link field",
"BGR": "Bangor International Airport",
"BHM": "Birmingham-Shuttlesworth International Airport",
"BIL": "Billings Logan International Airport",
"BIS": "Bismarck Municipal Airport",
"BMI": "Central Illinois Regional Airport at Bloomington-Normal",
"BNA": "Nashville International Airport",
"BOI": "Boise Air Terminal/Gowen Field",
"BOS": "General Edward Lawrence Logan International Airport",
"BQK": "Brunswick Golden Isles Airport",
"BQN": "Rafael Hernandez Airport",
"BRO": "Brownsville South Padre Island International Airport",
"BRW": "Wiley Post Will Rogers Memorial Airport",
"BTM": "Bert Mooney Airport",
"BTR": "Baton Rouge Metropolitan Airport",
"BTV": "Burlington International Airport",
"BUF": "Buffalo Niagara International Airport",
"BUR": "Bob Hope Airport",
"BWI": "Baltimore/Washington International Thurgood Marshall Airport",
"BZN": "Gallatin Field",
"CAE": "Columbia Metropolitan Airport",
"CAK": "Akron Canton Regional Airport",
"CDC": "Cedar City Regional Airport",
"CDV": "Merle K (Mudhole) Smith Airport",
"CEC": "Jack Mc Namara Field Airport",
"CHA": "Lovell Field",
"CHO": "Charlottesville Albemarle Airport",
"CHS": "Charleston Air Force Base-International Airport",
"CIC": "Chico Municipal Airport",
"CID": "The Eastern Iowa Airport",
"CLD": "Mc Clellan-Palomar Airport",
"CLE": "Cleveland Hopkins International Airport",
"CLL": "Easterwood Field",
"CLT": "Charlotte Douglas International Airport",
"CMH": "John Glenn Columbus International Airport",
"CMI": "University of Illinois Willard Airport",
"COD": "Yellowstone Regional Airport",
"COS": "City of Colorado Springs Municipal Airport",
"CPR": "Casper-Natrona County International Airport",
"CRP": "Corpus Christi International Airport",
"CRW": "Yeager Airport",
"CSG": "Columbus Metropolitan Airport",
"CVG": "Cincinnati Northern Kentucky International Airport",
"CWA": "Central Wisconsin Airport",
"DAB": "Daytona Beach International Airport",
"DAL": "Dallas Love Field",
"DAY": "James M Cox Dayton International Airport",
"DBQ": "Dubuque Regional Airport",
"DCA": "Ronald Reagan Washington National Airport",
"DEN": "Denver International Airport",
"DFW": "Dallas Fort Worth International Airport",
"DHN": "Dothan Regional Airport",
"DLG": "Dillingham Airport",
"DLH": "Duluth International Airport",
"DSM": "Des Moines International Airport",
"DTW": "Detroit Metropolitan Wayne County Airport",
"EGE": "Eagle County Regional Airport",
"EKO": "Elko Regional Airport",
"ELP": "El Paso International Airport",
"ERI": "Erie International Tom Ridge Field",
"EUG": "Mahlon Sweet Field",
"EVV": "Evansville Regional Airport",
"EWR": "Newark Liberty International Airport",
"EYW": "Key West International Airport",
"FAI": "Fairbanks International Airport",
"FAR": "Hector International Airport",
"FAT": "Fresno Yosemite International Airport",
"FAY": "Fayetteville Regional Grannis Field",
"FCA": "Glacier Park International Airport",
"FLL": "Fort Lauderdale Hollywood International Airport",
"FLO": "Florence Regional Airport",
"FNT": "Bishop International Airport",
"FSD": "Joe Foss Field Airport",
"FSM": "Fort Smith Regional Airport",
"FWA": "Fort Wayne International Airport",
"GEG": "Spokane International Airport",
"GFK": "Grand Forks International Airport",
"GGG": "East Texas Regional Airport",
"GJT": "Grand Junction Regional Airport",
"GNV": "Gainesville Regional Airport",
"GPT": "Gulfport Biloxi International Airport",
"GRB": "Austin Straubel International Airport",
"GRK": "Robert Gray Army Air Field Airport",
"GRR": "Gerald R. Ford International Airport",
"GSO": "Piedmont Triad International Airport",
"GSP": "Greenville Spartanburg International Airport",
"GTF": "Great Falls International Airport",
"GTR": "Golden Triangle Regional Airport",
"HDN": "Yampa Valley Airport",
"HKY": "Hickory Regional Airport",
"HLN": "Helena Regional Airport",
"HNL": "Daniel K Inouye International Airport",
"HOU": "William P Hobby Airport",
"HPN": "Westchester County Airport",
"HRL": "Valley International Airport",
"HSV": "Huntsville International Carl T Jones Field",
"HTS": "Tri-State/Milton J. Ferguson Field",
"HVN": "Tweed New Haven Airport",
"IAD": "Washington Dulles International Airport",
"IAH": "George Bush Intercontinental Houston Airport",
"ICT": "Wichita Eisenhower National Airport",
"IDA": "Idaho Falls Regional Airport",
"ILM": "Wilmington International Airport",
"IND": "Indianapolis International Airport",
"IPL": "Imperial County Airport",
"ISO": "Kinston Regional Jetport At Stallings Field",
"ISP": "Long Island Mac Arthur Airport",
"ITO": "Hilo International Airport",
"IYK": "Inyokern Airport",
"JAC": "Jackson Hole Airport",
"JAN": "Jackson-Medgar Wiley Evers International Airport",
"JAX": "Jacksonville International Airport",
"JFK": "John F Kennedy International Airport",
"JNU": "Juneau International Airport",
"KOA": "Ellison Onizuka Kona International At Keahole Airport",
"KTN": "Ketchikan International Airport",
"LAN": "Capital City Airport",
"LAS": "McCarran International Airport",
"LAW": "Lawton Fort Sill Regional Airport",
"LAX": "Los Angeles International Airport",
"LBB": "Lubbock Preston Smith International Airport",
"LCH": "Lake Charles Regional Airport",
"LEX": "Blue Grass Airport",
"LFT": "Lafayette Regional Airport",
"LGA": "La Guardia Airport",
"LGB": "Long Beach /Daugherty Field/ Airport",
"LIH": "Lihue Airport",
"LIT": "Bill & Hillary Clinton National Airport/Adams Field",
"LNK": "Lincoln Airport",
"LRD": "Laredo International Airport",
"LSE": "La Crosse Municipal Airport",
"LWS": "Lewiston Nez Perce County Airport",
"LYH": "Lynchburg Regional Preston Glenn Field",
"MAF": "Midland International Airport",
"MBS": "MBS International Airport",
"MCI": "Kansas City International Airport",
"MCN": "Middle Georgia Regional Airport",
"MCO": "Orlando International Airport",
"MDT": "Harrisburg International Airport",
"MDW": "Chicago Midway International Airport",
"MEI": "Key Field",
"MEM": "Memphis International Airport",
"MFE": "Mc Allen Miller International Airport",
"MFR": "Rogue Valley International Medford Airport",
"MGM": "Montgomery Regional (Dannelly Field) Airport",
"MHT": "Manchester-Boston Regional Airport",
"MIA": "Miami International Airport",
"MKE": "General Mitchell International Airport",
"MLB": "Melbourne International Airport",
"MLI": "Quad City International Airport",
"MLU": "Monroe Regional Airport",
"MOB": "Mobile Regional Airport",
"MOD": "Modesto City Co-Harry Sham Field",
"MOT": "Minot International Airport",
"MQT": "Sawyer International Airport",
"MRY": "Monterey Peninsula Airport",
"MSN": "Dane County Regional Truax Field",
"MSO": "Missoula International Airport",
"MSP": "Minneapolis-St Paul International/Wold-Chamberlain Airport",
"MSY": "Louis Armstrong New Orleans International Airport",
"MTJ": "Montrose Regional Airport",
"MYR": "Myrtle Beach International Airport",
"OAK": "Metropolitan Oakland International Airport",
"OGG": "Kahului Airport",
"OKC": "Will Rogers World Airport",
"OMA": "Eppley Airfield",
"OME": "Nome Airport",
"ONT": "Ontario International Airport",
"ORD": "Chicago O'Hare International Airport",
"ORF": "Norfolk International Airport",
"OTZ": "Ralph Wien Memorial Airport",
"OXR": "Oxnard Airport",
"PBI": "Palm Beach International Airport",
"PDX": "Portland International Airport",
"PHF": "Newport News Williamsburg International Airport",
"PHL": "Philadelphia International Airport",
"PHX": "Phoenix Sky Harbor International Airport",
"PIA": "General Wayne A. Downing Peoria International Airport",
"PIH": "Pocatello Regional Airport",
"PIT": "Pittsburgh International Airport",
"PNS": "Pensacola International Airport",
"PSC": "Tri Cities Airport",
"PSE": "Mercedita Airport",
"PSG": "Petersburg James A Johnson Airport",
"PSP": "Palm Springs International Airport",
"PVD": "Theodore Francis Green State Airport",
"PWM": "Portland International Jetport",
"RAP": "Rapid City Regional Airport",
"RDD": "Redding Municipal Airport",
"RDM": "Roberts Field",
"RDU": "Raleigh Durham International Airport",
"RIC": "Richmond International Airport",
"RNO": "Reno Tahoe International Airport",
"ROA": "RoanokeâBlacksburg Regional Airport",
"ROC": "Greater Rochester International Airport",
"RST": "Rochester International Airport",
"RSW": "Southwest Florida International Airport",
"SAN": "San Diego International Airport",
"SAT": "San Antonio International Airport",
"SAV": "Savannah Hilton Head International Airport",
"SBA": "Santa Barbara Municipal Airport",
"SBN": "South Bend Regional Airport",
"SBP": "San Luis County Regional Airport",
"SCC": "Deadhorse Airport",
"SCE": "University Park Airport",
"SDF": "Louisville Muhammad Ali International Airport",
"SEA": "Seattle Tacoma International Airport",
"SFO": "San Francisco International Airport",
"SGF": "Springfield Branson National Airport",
"SGU": "St George Municipal Airport",
"SHV": "Shreveport Regional Airport",
"SIT": "Sitka Rocky Gutierrez Airport",
"SJC": "Norman Y. Mineta San Jose International Airport",
"SJT": "San Angelo Regional Mathis Field",
"SJU": "Luis Munoz Marin International Airport",
"SLC": "Salt Lake City International Airport",
"SMF": "Sacramento International Airport",
"SMX": "Santa Maria Pub/Capt G Allan Hancock Field",
"SNA": "John Wayne Airport-Orange County Airport",
"SPI": "Abraham Lincoln Capital Airport",
"SPS": "Sheppard Air Force Base-Wichita Falls Municipal Airport",
"SRQ": "Sarasota Bradenton International Airport",
"STL": "St Louis Lambert International Airport",
"STT": "Cyril E. King Airport",
"STX": "Henry E Rohlsen Airport",
"SUN": "Friedman Memorial Airport",
"SWF": "New York Stewart International Airport",
"SYR": "Syracuse Hancock International Airport",
"TLH": "Tallahassee Regional Airport",
"TOL": "Toledo Express Airport",
"TPA": "Tampa International Airport",
"TRI": "Tri-Cities Regional TN/VA Airport",
"TUL": "Tulsa International Airport",
"TUP": "Tupelo Regional Airport",
"TUS": "Tucson International Airport / Morris Air National Guard Base",
"TVC": "Cherry Capital Airport",
"TWF": "Joslin Field Magic Valley Regional Airport",
"TXK": "Texarkana Regional Webb Field",
"TYR": "Tyler Pounds Regional Airport",
"TYS": "McGhee Tyson Airport",
"VLD": "Valdosta Regional Airport",
"VPS": "Destin-Ft Walton Beach Airport",
"WRG": "Wrangell Airport",
"XNA": "Northwest Arkansas Regional Airport",
"YAK": "Yakutat Airport",
"YUM": "Yuma MCAS/Yuma International Airport"
                    }
            }
        }
    }
}

postLookup(lookup2_definition)

As before, run the next cell to wait for the table to be broadcast around the cluster.

In [None]:
waitForLookup(lookup_tier, lookup2_name, 30)

Run the following cell to see the results of a SQL query using both lookups.

In [None]:
sql='''
SELECT
    b.v AS "airportSize",
    c.v AS "airportName",
    COUNT(*) AS "flights",
    SUM(a.Distance) AS "totalDistance"
FROM "example-flights-lookup" a
JOIN lookup."'''+lookup_name+'''" b ON a.Origin = b.k
JOIN lookup."'''+lookup2_name+'''" c ON a.Origin = c.k
WHERE TIME_IN_INTERVAL(__time,'2005-11-30T11:00:00/2015-11-30T08:00:00')
AND b.v = 'small_airport'
GROUP BY 1,2
'''

display.sql(sql)

Let's run a query on the `INFORMATION_SCHEMA` to see the two tables in the lookup schema. 

In [None]:
sql='''
SELECT
  "TABLE_NAME"
FROM "INFORMATION_SCHEMA"."TABLES"
WHERE "TABLE_SCHEMA" = 'lookup'
'''

display.sql(sql)

Run the following cell to see how the `LOOKUP` function can be used as an alternative to `JOIN`.

This function is available as both a [SQL](https://druid.apache.org/docs/latest/querying/sql-functions#lookup) and [native](https://druid.apache.org/docs/latest/querying/math-expr#string-functions) expression.

In [None]:
sql='''
SELECT
    LOOKUP("Origin", \'''' + lookup_name + '''\') AS "originAirportSize",
    LOOKUP("Dest", \'''' + lookup_name + '''\') AS "destinationAirportSize",
    COUNT(*) AS "flights",
    SUM(Distance) AS "totalDistance"
FROM "example-flights-lookup"
WHERE TIME_IN_INTERVAL(__time,'2005-11-30T11:00:00/2015-11-30T08:00:00')
GROUP BY 1, 2
'''

display.sql(sql)

## Ingestion with lookups

Use the lookup functions during ingestion to enrich your data ahead of time. This is an important technique to apply to improve query efficiency when you are sure that the results will be the same at query time as they would be at ingestion time.

For this notebook, the ['druid.lookup.enableLookupSyncOnStartup'](https://druid.apache.org/docs/latest/querying/lookups/#saving-configuration-across-restarts) setting is `true` so that ingestion processes will pull the lookup definitions when they start up, making them immediately available as they carry out ingestion work.

### Use the `LOOKUP` function to a lookup table in SQL-based ingestion

Review the cell below to see how the `LOOKUP` function has been used to dereference both the origin and destination airports to their sizes, and to add this as a new field in the table.

In [None]:
sql='''
REPLACE INTO "example-flights-enhanced" OVERWRITE ALL
WITH "ext" AS (SELECT *
FROM TABLE(
  EXTERN(
    '{"type":"http","uris":["https://static.imply.io/example-data/flight_on_time/flights/On_Time_Reporting_Carrier_On_Time_Performance_(1987_present)_2005_11.csv.zip"]}',
    '{"type":"csv","findColumnsFromHeader":true}'
  )
) EXTEND ("depaturetime" VARCHAR, "arrivalime" VARCHAR, "Year" BIGINT, "Quarter" BIGINT, "Month" BIGINT, "DayofMonth" BIGINT, "DayOfWeek" BIGINT, "FlightDate" VARCHAR, "Reporting_Airline" VARCHAR, "DOT_ID_Reporting_Airline" BIGINT, "IATA_CODE_Reporting_Airline" VARCHAR, "Tail_Number" VARCHAR, "Flight_Number_Reporting_Airline" BIGINT, "OriginAirportID" BIGINT, "OriginAirportSeqID" BIGINT, "OriginCityMarketID" BIGINT, "Origin" VARCHAR, "OriginCityName" VARCHAR, "OriginState" VARCHAR, "OriginStateFips" BIGINT, "OriginStateName" VARCHAR, "OriginWac" BIGINT, "DestAirportID" BIGINT, "DestAirportSeqID" BIGINT, "DestCityMarketID" BIGINT, "Dest" VARCHAR, "DestCityName" VARCHAR, "DestState" VARCHAR, "DestStateFips" BIGINT, "DestStateName" VARCHAR, "DestWac" BIGINT, "CRSDepTime" BIGINT, "DepTime" BIGINT, "DepDelay" BIGINT, "DepDelayMinutes" BIGINT, "DepDel15" BIGINT, "DepartureDelayGroups" BIGINT, "DepTimeBlk" VARCHAR, "TaxiOut" BIGINT, "WheelsOff" BIGINT, "WheelsOn" BIGINT, "TaxiIn" BIGINT, "CRSArrTime" BIGINT, "ArrTime" BIGINT, "ArrDelay" BIGINT, "ArrDelayMinutes" BIGINT, "ArrDel15" BIGINT, "ArrivalDelayGroups" BIGINT, "ArrTimeBlk" VARCHAR, "Cancelled" BIGINT, "CancellationCode" VARCHAR, "Diverted" BIGINT, "CRSElapsedTime" BIGINT, "ActualElapsedTime" BIGINT, "AirTime" BIGINT, "Flights" BIGINT, "Distance" BIGINT, "DistanceGroup" BIGINT, "CarrierDelay" BIGINT, "WeatherDelay" BIGINT, "NASDelay" BIGINT, "SecurityDelay" BIGINT, "LateAircraftDelay" BIGINT, "FirstDepTime" VARCHAR, "TotalAddGTime" VARCHAR, "LongestAddGTime" VARCHAR, "DivAirportLandings" VARCHAR, "DivReachedDest" VARCHAR, "DivActualElapsedTime" VARCHAR, "DivArrDelay" VARCHAR, "DivDistance" VARCHAR, "Div1Airport" VARCHAR, "Div1AirportID" VARCHAR, "Div1AirportSeqID" VARCHAR, "Div1WheelsOn" VARCHAR, "Div1TotalGTime" VARCHAR, "Div1LongestGTime" VARCHAR, "Div1WheelsOff" VARCHAR, "Div1TailNum" VARCHAR, "Div2Airport" VARCHAR, "Div2AirportID" VARCHAR, "Div2AirportSeqID" VARCHAR, "Div2WheelsOn" VARCHAR, "Div2TotalGTime" VARCHAR, "Div2LongestGTime" VARCHAR, "Div2WheelsOff" VARCHAR, "Div2TailNum" VARCHAR, "Div3Airport" VARCHAR, "Div3AirportID" VARCHAR, "Div3AirportSeqID" VARCHAR, "Div3WheelsOn" VARCHAR, "Div3TotalGTime" VARCHAR, "Div3LongestGTime" VARCHAR, "Div3WheelsOff" VARCHAR, "Div3TailNum" VARCHAR, "Div4Airport" VARCHAR, "Div4AirportID" VARCHAR, "Div4AirportSeqID" VARCHAR, "Div4WheelsOn" VARCHAR, "Div4TotalGTime" VARCHAR, "Div4LongestGTime" VARCHAR, "Div4WheelsOff" VARCHAR, "Div4TailNum" VARCHAR, "Div5Airport" VARCHAR, "Div5AirportID" VARCHAR, "Div5AirportSeqID" VARCHAR, "Div5WheelsOn" VARCHAR, "Div5TotalGTime" VARCHAR, "Div5LongestGTime" VARCHAR, "Div5WheelsOff" VARCHAR, "Div5TailNum" VARCHAR, "Unnamed: 109" VARCHAR))
SELECT
    FLOOR(TIME_PARSE("depaturetime") TO HOUR) as __time,
    LOOKUP("Origin", 'example-flights-airportsizes') AS "originAirportSize",
    LOOKUP("Dest", 'example-flights-airportsizes') AS "destinationAirportSize",
    COUNT(*) AS "flights",
    SUM(Distance) AS "totalDistance"
FROM "ext"
GROUP BY 1, 2, 3
PARTITIONED BY DAY
'''

display.run_task(sql)
sql_client.wait_until_ready('example-flights-enhanced')
display.table('example-flights-enhanced')

Note that this function is available in both SQL and JSON-based ingestion.

Run the next cell to see how many flights were recorded in the data by airport size.

In [None]:
sql = '''
SELECT originAirportSize,
   count(*) AS "flights"
FROM "example-flights-enhanced" 
WHERE TIME_IN_INTERVAL(__time,'2005-11-30T11:00:00/PT4H')
GROUP BY 1
'''
display.sql(sql)

### Use the `JOIN` function to a lookup table in SQL-based ingestion

The following cell performs the same operation by introducing the `JOIN` operator.

Running this cell will perform the ingestion and then display the same results as above.

In [None]:
sql='''
REPLACE INTO "example-flights-enhanced" OVERWRITE ALL
WITH "ext" AS (SELECT *
FROM TABLE(
  EXTERN(
    '{"type":"http","uris":["https://static.imply.io/example-data/flight_on_time/flights/On_Time_Reporting_Carrier_On_Time_Performance_(1987_present)_2005_11.csv.zip"]}',
    '{"type":"csv","findColumnsFromHeader":true}'
  )
) EXTEND ("depaturetime" VARCHAR, "arrivalime" VARCHAR, "Year" BIGINT, "Quarter" BIGINT, "Month" BIGINT, "DayofMonth" BIGINT, "DayOfWeek" BIGINT, "FlightDate" VARCHAR, "Reporting_Airline" VARCHAR, "DOT_ID_Reporting_Airline" BIGINT, "IATA_CODE_Reporting_Airline" VARCHAR, "Tail_Number" VARCHAR, "Flight_Number_Reporting_Airline" BIGINT, "OriginAirportID" BIGINT, "OriginAirportSeqID" BIGINT, "OriginCityMarketID" BIGINT, "Origin" VARCHAR, "OriginCityName" VARCHAR, "OriginState" VARCHAR, "OriginStateFips" BIGINT, "OriginStateName" VARCHAR, "OriginWac" BIGINT, "DestAirportID" BIGINT, "DestAirportSeqID" BIGINT, "DestCityMarketID" BIGINT, "Dest" VARCHAR, "DestCityName" VARCHAR, "DestState" VARCHAR, "DestStateFips" BIGINT, "DestStateName" VARCHAR, "DestWac" BIGINT, "CRSDepTime" BIGINT, "DepTime" BIGINT, "DepDelay" BIGINT, "DepDelayMinutes" BIGINT, "DepDel15" BIGINT, "DepartureDelayGroups" BIGINT, "DepTimeBlk" VARCHAR, "TaxiOut" BIGINT, "WheelsOff" BIGINT, "WheelsOn" BIGINT, "TaxiIn" BIGINT, "CRSArrTime" BIGINT, "ArrTime" BIGINT, "ArrDelay" BIGINT, "ArrDelayMinutes" BIGINT, "ArrDel15" BIGINT, "ArrivalDelayGroups" BIGINT, "ArrTimeBlk" VARCHAR, "Cancelled" BIGINT, "CancellationCode" VARCHAR, "Diverted" BIGINT, "CRSElapsedTime" BIGINT, "ActualElapsedTime" BIGINT, "AirTime" BIGINT, "Flights" BIGINT, "Distance" BIGINT, "DistanceGroup" BIGINT, "CarrierDelay" BIGINT, "WeatherDelay" BIGINT, "NASDelay" BIGINT, "SecurityDelay" BIGINT, "LateAircraftDelay" BIGINT, "FirstDepTime" VARCHAR, "TotalAddGTime" VARCHAR, "LongestAddGTime" VARCHAR, "DivAirportLandings" VARCHAR, "DivReachedDest" VARCHAR, "DivActualElapsedTime" VARCHAR, "DivArrDelay" VARCHAR, "DivDistance" VARCHAR, "Div1Airport" VARCHAR, "Div1AirportID" VARCHAR, "Div1AirportSeqID" VARCHAR, "Div1WheelsOn" VARCHAR, "Div1TotalGTime" VARCHAR, "Div1LongestGTime" VARCHAR, "Div1WheelsOff" VARCHAR, "Div1TailNum" VARCHAR, "Div2Airport" VARCHAR, "Div2AirportID" VARCHAR, "Div2AirportSeqID" VARCHAR, "Div2WheelsOn" VARCHAR, "Div2TotalGTime" VARCHAR, "Div2LongestGTime" VARCHAR, "Div2WheelsOff" VARCHAR, "Div2TailNum" VARCHAR, "Div3Airport" VARCHAR, "Div3AirportID" VARCHAR, "Div3AirportSeqID" VARCHAR, "Div3WheelsOn" VARCHAR, "Div3TotalGTime" VARCHAR, "Div3LongestGTime" VARCHAR, "Div3WheelsOff" VARCHAR, "Div3TailNum" VARCHAR, "Div4Airport" VARCHAR, "Div4AirportID" VARCHAR, "Div4AirportSeqID" VARCHAR, "Div4WheelsOn" VARCHAR, "Div4TotalGTime" VARCHAR, "Div4LongestGTime" VARCHAR, "Div4WheelsOff" VARCHAR, "Div4TailNum" VARCHAR, "Div5Airport" VARCHAR, "Div5AirportID" VARCHAR, "Div5AirportSeqID" VARCHAR, "Div5WheelsOn" VARCHAR, "Div5TotalGTime" VARCHAR, "Div5LongestGTime" VARCHAR, "Div5WheelsOff" VARCHAR, "Div5TailNum" VARCHAR, "Unnamed: 109" VARCHAR))
SELECT
    FLOOR(TIME_PARSE("depaturetime") TO HOUR) as __time,
    b.v AS "originAirportSize",
    c.v AS "destinationAirportSize",
    COUNT(*) AS "flights",
    SUM(Distance) AS "totalDistance"
FROM "ext" a
LEFT JOIN lookup."'''+lookup_name+'''" b ON a.Origin = b.k
LEFT JOIN lookup."'''+lookup_name+'''" c ON a.Dest = c.k
GROUP BY 1, 2, 3
PARTITIONED BY DAY
'''

display.run_task(sql)
sql_client.wait_until_ready('example-flights-enhanced')
display.table('example-flights-enhanced')
sql = '''
SELECT originAirportSize,
   count(*) AS "flights"
FROM "example-flights-enhanced" 
WHERE TIME_IN_INTERVAL(__time,'2005-11-30T11:00:00/PT4H')
GROUP BY 1
'''
display.sql(sql)

## Clean up

Run the following cell to drop the example table and to call the lookup API to delete the two lookups that you created.

In [None]:
druid.datasources.drop("example-flights-lookup")
druid.datasources.drop("example-flights-enhanced")
x = requests.delete(druid_host + '/druid/coordinator/v1/lookups/config/'+lookup_tier+'/'+lookup_name)
x = requests.delete(druid_host + '/druid/coordinator/v1/lookups/config/'+lookup_tier+'/'+lookup2_name)

## Summary

* [Lookups](https://druid.apache.org/docs/latest/querying/lookups) are key/value tables that exist in the lookup schema
* Druid surfaces [APIs](https://druid.apache.org/docs/latest/api-reference/lookups-api) to allow for the management of lookups, including their definition and deletion
* Lookup definitions are checked and propograted on a [configurable cycle](https://druid.apache.org/docs/latest/querying/lookups/#configuration)
* Data in lookups can be updated, providing a mechanism for [updating Druid table data](https://druid.apache.org/docs/latest/data-management/update#lookups)
* Lookup tables can be used in `JOIN` operations and via [SQL](https://druid.apache.org/docs/latest/querying/sql-scalar#string-functions) and [native](https://druid.apache.org/docs/latest/querying/math-expr#string-functions) lookup functions

## Learn more

* Try using the native lookup function enrich data from Apache Kafka
* Compare query performance with pre-enrichment at ingestion time versus query time on your own data