d-sandbox

<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px">
</div>

# 2.6 Adaptive Query Execution

## ![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) In this notebook you:<br>
* Examine the Physical Plan that is generated for your queries
* Enable Adaptive Query Execution (AQE) to reduce the runtime of your queries

In [0]:
%run ../Includes/Classroom-Setup

Let's make sure our data is accessible.

In [0]:
%sql
USE databricks;

DESCRIBE fireCalls

col_name,data_type,comment
Call Number,int,
Unit ID,string,
Incident Number,int,
Call Type,string,
Call Date,string,
Watch Date,string,
Received DtTm,string,
Entry DtTm,string,
Dispatch DtTm,string,
Response DtTm,string,


## Examining Physical Plans

Let's take a look at the shuffle partitions.

In [0]:
%sql
SET spark.sql.adaptive.enabled = FALSE

key,value
spark.sql.adaptive.enabled,False


In [0]:
%sql
SELECT `call type`, count(*) AS count
FROM firecalls
GROUP BY `call type`
ORDER BY count DESC

call type,count
Medical Incident,156374
Structure Fire,31329
Alarms,26090
Traffic Collision,9749
Other,3799
Citizen Assist / Service Call,3600
Outside Fire,2940
Vehicle Fire,1101
Water Rescue,1096
Gas Leak (Natural and LP Gases),888


In [0]:
%sql
SET spark.sql.adaptive.enabled = TRUE

key,value
spark.sql.adaptive.enabled,True


The Adaptive Query Execution (AQE) frameworm is to dynamically coalescing shuffle partitions and it significantly reduces the query execution time.

In [0]:
%sql
SELECT `call type`, count(*) AS count
FROM firecalls
GROUP BY `call type`
ORDER BY count DESC

call type,count
Medical Incident,156374
Structure Fire,31329
Alarms,26090
Traffic Collision,9749
Other,3799
Citizen Assist / Service Call,3600
Outside Fire,2940
Vehicle Fire,1101
Water Rescue,1096
Gas Leak (Natural and LP Gases),888


Now create the table `fireCallsParquet`.

In [0]:
%sql
CREATE OR REPLACE TEMPORARY VIEW fireCallsParquet
USING Parquet 
OPTIONS (
    path "/mnt/davis/fire-calls/fire-calls-8p.parquet")

We can join these two datasets and examine the physical plan.

Note that `fireCalls` is a much smaller dataset with 240,316 records vs `fireCallsParquet` which contains 4,799,622 records.

For the following query it took almost 1 minute.

In [0]:
%sql
SELECT * 
FROM fireCalls 
JOIN fireCallsParquet on fireCalls.`Call Number` = fireCallsParquet.`Call_Number`

Call Number,Unit ID,Incident Number,Call Type,Call Date,Watch Date,Received DtTm,Entry DtTm,Dispatch DtTm,Response DtTm,On Scene DtTm,Transport DtTm,Hospital DtTm,Call Final Disposition,Available DtTm,Address,City,Zipcode of Incident,Battalion,Station Area,Box,Original Priority,Priority,Final Priority,ALS Unit,Call Type Group,Number of Alarms,Unit Type,Unit sequence in call dispatch,Fire Prevention District,Supervisor District,Neighborhooods - Analysis Boundaries,Location,RowID,Call_Number,Unit_ID,Incident_Number,Call_Type,Call_Date,Watch_Date,Received_DtTm,Entry_DtTm,Dispatch_DtTm,Response_DtTm,On_Scene_DtTm,Transport_DtTm,Hospital_DtTm,Call_Final_Disposition,Available_DtTm,Address.1,City.1,Zipcode_of_Incident,Battalion.1,Station_Area,Box.1,Original_Priority,Priority.1,Final_Priority,ALS_Unit,Call_Type_Group,Number_of_Alarms,Unit_Type,Unit_sequence_in_call_dispatch,Fire_Prevention_District,Supervisor_District,Neighborhooods_-_Analysis_Boundaries,Location.1,RowID.1
1030118,E08,30625,Medical Incident,04/12/2000,04/12/2000,04/12/2000 09:27:45 PM,04/12/2000 09:28:58 PM,04/12/2000 09:29:21 PM,04/12/2000 09:31:26 PM,04/12/2000 09:32:34 PM,,,Other,04/12/2000 09:45:28 PM,4TH ST/CHANNEL ST,SF,,B03,8,2226,3,3,3,False,,1,ENGINE,1,3,6,,"(37.7750268633971, -122.392346204303)",001030118-E08,1030118,M08,30625,Medical Incident,04/12/2000,04/12/2000,04/12/2000 09:27:45 PM,04/12/2000 09:28:58 PM,04/12/2000 09:29:21 PM,,04/12/2000 09:35:29 PM,04/12/2000 10:00:17 PM,04/12/2000 10:13:28 PM,Other,04/12/2000 10:30:34 PM,4TH ST/CHANNEL ST,SF,,B03,8,2226,3,3,3,True,,1,MEDIC,2,3,6,,"(37.7750268633971, -122.392346204303)",001030118-M08
1030118,E08,30625,Medical Incident,04/12/2000,04/12/2000,04/12/2000 09:27:45 PM,04/12/2000 09:28:58 PM,04/12/2000 09:29:21 PM,04/12/2000 09:31:26 PM,04/12/2000 09:32:34 PM,,,Other,04/12/2000 09:45:28 PM,4TH ST/CHANNEL ST,SF,,B03,8,2226,3,3,3,False,,1,ENGINE,1,3,6,,"(37.7750268633971, -122.392346204303)",001030118-E08,1030118,E08,30625,Medical Incident,04/12/2000,04/12/2000,04/12/2000 09:27:45 PM,04/12/2000 09:28:58 PM,04/12/2000 09:29:21 PM,04/12/2000 09:31:26 PM,04/12/2000 09:32:34 PM,,,Other,04/12/2000 09:45:28 PM,4TH ST/CHANNEL ST,SF,,B03,8,2226,3,3,3,False,,1,ENGINE,1,3,6,,"(37.7750268633971, -122.392346204303)",001030118-E08
1040061,M43,30749,Medical Incident,04/13/2000,04/12/2000,04/13/2000 07:51:29 AM,04/13/2000 07:55:35 AM,04/13/2000 07:55:54 AM,04/13/2000 07:59:58 AM,,04/13/2000 08:16:30 AM,04/13/2000 08:28:37 AM,Other,04/13/2000 09:26:54 AM,200 Block of MADRID ST,SF,94112.0,B09,43,613,3,3,3,True,,1,MEDIC,3,9,11,Excelsior,"(37.7255316247491, -122.429925994016)",001040061-M43,1040061,E43,30749,Medical Incident,04/13/2000,04/12/2000,04/13/2000 07:51:29 AM,04/13/2000 07:55:35 AM,04/13/2000 07:55:54 AM,04/13/2000 07:57:24 AM,04/13/2000 08:00:38 AM,,,Other,04/13/2000 08:12:09 AM,200 Block of MADRID ST,SF,94112.0,B09,43,613,3,3,3,False,,1,ENGINE,1,9,11,Excelsior,"(37.7255316247491, -122.429925994016)",001040061-E43
1040061,M43,30749,Medical Incident,04/13/2000,04/12/2000,04/13/2000 07:51:29 AM,04/13/2000 07:55:35 AM,04/13/2000 07:55:54 AM,04/13/2000 07:59:58 AM,,04/13/2000 08:16:30 AM,04/13/2000 08:28:37 AM,Other,04/13/2000 09:26:54 AM,200 Block of MADRID ST,SF,94112.0,B09,43,613,3,3,3,True,,1,MEDIC,3,9,11,Excelsior,"(37.7255316247491, -122.429925994016)",001040061-M43,1040061,M43,30749,Medical Incident,04/13/2000,04/12/2000,04/13/2000 07:51:29 AM,04/13/2000 07:55:35 AM,04/13/2000 07:55:54 AM,04/13/2000 07:59:58 AM,,04/13/2000 08:16:30 AM,04/13/2000 08:28:37 AM,Other,04/13/2000 09:26:54 AM,200 Block of MADRID ST,SF,94112.0,B09,43,613,3,3,3,True,,1,MEDIC,3,9,11,Excelsior,"(37.7255316247491, -122.429925994016)",001040061-M43
1040143,M43,30832,Medical Incident,04/13/2000,04/13/2000,04/13/2000 01:01:56 PM,04/13/2000 01:04:12 PM,04/13/2000 01:13:08 PM,,04/13/2000 01:29:19 PM,04/13/2000 01:36:34 PM,,Other,04/13/2000 01:16:20 PM,2500 Block of OCEAN AVE,SF,94132.0,B08,19,8452,1,1,2,True,,1,MEDIC,1,8,7,West of Twin Peaks,"(37.7314853147957, -122.472647880057)",001040143-M43,1040143,M43,30832,Medical Incident,04/13/2000,04/13/2000,04/13/2000 01:01:56 PM,04/13/2000 01:04:12 PM,04/13/2000 01:13:08 PM,,04/13/2000 01:29:19 PM,04/13/2000 01:36:34 PM,,Other,04/13/2000 01:16:20 PM,2500 Block of OCEAN AVE,SF,94132.0,B08,19,8452,1,1,2,True,,1,MEDIC,1,8,7,West of Twin Peaks,"(37.7314853147957, -122.472647880057)",001040143-M43
1040312,RS2,30989,Medical Incident,04/13/2000,04/13/2000,04/13/2000 11:13:22 PM,04/13/2000 11:13:58 PM,04/13/2000 11:15:07 PM,04/13/2000 11:18:32 PM,,,,Other,04/13/2000 11:19:03 PM,18TH ST/SANCHEZ ST,SF,94114.0,B05,6,5252,3,3,3,False,,1,RESCUE SQUAD,2,5,8,Castro/Upper Market,"(37.7611554774347, -122.430573567918)",001040312-RS2,1040312,M12,30989,Medical Incident,04/13/2000,04/13/2000,04/13/2000 11:13:22 PM,04/13/2000 11:13:58 PM,04/13/2000 11:15:07 PM,,,,,Other,04/13/2000 11:19:08 PM,18TH ST/SANCHEZ ST,SF,94114.0,B05,6,5252,3,3,3,True,,1,MEDIC,1,5,8,Castro/Upper Market,"(37.7611554774347, -122.430573567918)",001040312-M12
1040312,RS2,30989,Medical Incident,04/13/2000,04/13/2000,04/13/2000 11:13:22 PM,04/13/2000 11:13:58 PM,04/13/2000 11:15:07 PM,04/13/2000 11:18:32 PM,,,,Other,04/13/2000 11:19:03 PM,18TH ST/SANCHEZ ST,SF,94114.0,B05,6,5252,3,3,3,False,,1,RESCUE SQUAD,2,5,8,Castro/Upper Market,"(37.7611554774347, -122.430573567918)",001040312-RS2,1040312,E07,30989,Medical Incident,04/13/2000,04/13/2000,04/13/2000 11:13:22 PM,04/13/2000 11:13:58 PM,04/13/2000 11:15:07 PM,04/13/2000 11:16:20 PM,,,,Other,04/13/2000 11:18:58 PM,18TH ST/SANCHEZ ST,SF,94114.0,B05,6,5252,3,3,3,False,,1,ENGINE,3,5,8,Castro/Upper Market,"(37.7611554774347, -122.430573567918)",001040312-E07
1040312,RS2,30989,Medical Incident,04/13/2000,04/13/2000,04/13/2000 11:13:22 PM,04/13/2000 11:13:58 PM,04/13/2000 11:15:07 PM,04/13/2000 11:18:32 PM,,,,Other,04/13/2000 11:19:03 PM,18TH ST/SANCHEZ ST,SF,94114.0,B05,6,5252,3,3,3,False,,1,RESCUE SQUAD,2,5,8,Castro/Upper Market,"(37.7611554774347, -122.430573567918)",001040312-RS2,1040312,RS2,30989,Medical Incident,04/13/2000,04/13/2000,04/13/2000 11:13:22 PM,04/13/2000 11:13:58 PM,04/13/2000 11:15:07 PM,04/13/2000 11:18:32 PM,,,,Other,04/13/2000 11:19:03 PM,18TH ST/SANCHEZ ST,SF,94114.0,B05,6,5252,3,3,3,False,,1,RESCUE SQUAD,2,5,8,Castro/Upper Market,"(37.7611554774347, -122.430573567918)",001040312-RS2
1050175,M01,31171,Medical Incident,04/14/2000,04/14/2000,04/14/2000 02:14:20 PM,04/14/2000 02:14:49 PM,04/14/2000 02:15:09 PM,04/14/2000 02:17:46 PM,04/14/2000 02:20:44 PM,,,Other,04/14/2000 02:24:49 PM,200 Block of EDDY ST,SF,94102.0,B03,1,1453,3,3,3,True,,1,MEDIC,1,3,6,Tenderloin,"(37.7839825587491, -122.411762530716)",001050175-M01,1050175,M01,31171,Medical Incident,04/14/2000,04/14/2000,04/14/2000 02:14:20 PM,04/14/2000 02:14:49 PM,04/14/2000 02:15:09 PM,04/14/2000 02:17:46 PM,04/14/2000 02:20:44 PM,,,Other,04/14/2000 02:24:49 PM,200 Block of EDDY ST,SF,94102.0,B03,1,1453,3,3,3,True,,1,MEDIC,1,3,6,Tenderloin,"(37.7839825587491, -122.411762530716)",001050175-M01
1050175,M01,31171,Medical Incident,04/14/2000,04/14/2000,04/14/2000 02:14:20 PM,04/14/2000 02:14:49 PM,04/14/2000 02:15:09 PM,04/14/2000 02:17:46 PM,04/14/2000 02:20:44 PM,,,Other,04/14/2000 02:24:49 PM,200 Block of EDDY ST,SF,94102.0,B03,1,1453,3,3,3,True,,1,MEDIC,1,3,6,Tenderloin,"(37.7839825587491, -122.411762530716)",001050175-M01,1050175,RS1,31171,Medical Incident,04/14/2000,04/14/2000,04/14/2000 02:14:20 PM,04/14/2000 02:14:49 PM,04/14/2000 02:15:09 PM,04/14/2000 02:18:01 PM,04/14/2000 02:20:52 PM,,,Other,04/14/2000 02:25:14 PM,200 Block of EDDY ST,SF,94102.0,B03,1,1453,3,3,3,False,,1,RESCUE SQUAD,2,3,6,Tenderloin,"(37.7839825587491, -122.411762530716)",001050175-RS1


#### Automatic and Manual broadcasting

- Depending on size of the data that is being loaded into Spark, Spark uses internal heuristics to decide how to join that data to other data.
- Automatic broadcast depends on `spark.sql.autoBroadcastJoinThreshold`
    - The setting configures the **maximum size in bytes** for a table that will be broadcast to all worker nodes when performing a join 
    - Default is 10MB

- A `broadcast` function can be used in Spark to instruct Catalyst that it should probably broadcast one of the tables that is being joined. 

If the `broadcast` hint isn't used, but one side of the join is small enough (i.e., its size is below the threshold), that data source will be read into
the Driver and broadcast to all Executors.

##The AQE Dynamically switching join strategies.<br>
Now take a look at the physical plan when we broadcast one of the datasets.  The broadcast join hint is going to operate like a SQL hint, but Spark will still parse this even though it is commented out.<br>

Here we are telling the to use the smaller table(**firecall**) to be broadcasted into the different executors.<br>

Using AQE the execution time significantly reduces from **1 minute** to **12 seconds**

In [0]:
%sql
SELECT /*+ BROADCAST(fireCalls) */ * 
FROM fireCalls 
JOIN fireCallsParquet on fireCalls.`Call Number` = fireCallsParquet.`Call_Number`

Call Number,Unit ID,Incident Number,Call Type,Call Date,Watch Date,Received DtTm,Entry DtTm,Dispatch DtTm,Response DtTm,On Scene DtTm,Transport DtTm,Hospital DtTm,Call Final Disposition,Available DtTm,Address,City,Zipcode of Incident,Battalion,Station Area,Box,Original Priority,Priority,Final Priority,ALS Unit,Call Type Group,Number of Alarms,Unit Type,Unit sequence in call dispatch,Fire Prevention District,Supervisor District,Neighborhooods - Analysis Boundaries,Location,RowID,Call_Number,Unit_ID,Incident_Number,Call_Type,Call_Date,Watch_Date,Received_DtTm,Entry_DtTm,Dispatch_DtTm,Response_DtTm,On_Scene_DtTm,Transport_DtTm,Hospital_DtTm,Call_Final_Disposition,Available_DtTm,Address.1,City.1,Zipcode_of_Incident,Battalion.1,Station_Area,Box.1,Original_Priority,Priority.1,Final_Priority,ALS_Unit,Call_Type_Group,Number_of_Alarms,Unit_Type,Unit_sequence_in_call_dispatch,Fire_Prevention_District,Supervisor_District,Neighborhooods_-_Analysis_Boundaries,Location.1,RowID.1
131020115,93,13034219,Traffic Collision,04/12/2013,04/12/2013,04/12/2013 10:33:46 AM,04/12/2013 10:35:23 AM,04/12/2013 10:36:05 AM,04/12/2013 10:36:38 AM,04/12/2013 10:39:31 AM,04/12/2013 11:06:09 AM,04/12/2013 11:09:59 AM,Other,04/12/2013 11:57:21 AM,MARKET ST/SANCHEZ ST,SF,94114.0,B02,6,5213,2,2,2,True,Non Life-threatening,1,MEDIC,1,5.0,8.0,Castro/Upper Market,"(37.7658679882367, -122.431025473299)",131020115-93,131020115,T06,13034219,Traffic Collision,04/12/2013,04/12/2013,04/12/2013 10:33:46 AM,04/12/2013 10:35:23 AM,04/12/2013 10:39:31 AM,,,,,Other,04/12/2013 10:59:47 AM,MARKET ST/SANCHEZ ST,SF,94114.0,B02,6,5213,2,2,2,False,Non Life-threatening,1,TRUCK,3,5.0,8.0,Castro/Upper Market,"(37.7658679882367, -122.431025473299)",131020115-T06
93060021,B04,9092346,Alarms,11/02/2009,11/01/2009,11/02/2009 02:18:57 AM,11/02/2009 02:21:01 AM,11/02/2009 02:21:08 AM,11/02/2009 02:22:46 AM,,,,Other,11/02/2009 02:26:10 AM,2600 Block of GREENWICH ST,SF,94123.0,B04,16,4166,3,3,3,False,,1,CHIEF,3,4.0,2.0,Marina,"(37.7979514351842, -122.443292158435)",093060021-B04,93060021,B04,9092346,Alarms,11/02/2009,11/01/2009,11/02/2009 02:18:57 AM,11/02/2009 02:21:01 AM,11/02/2009 02:21:08 AM,11/02/2009 02:22:46 AM,,,,Other,11/02/2009 02:26:10 AM,2600 Block of GREENWICH ST,SF,94123.0,B04,16,4166,3,3,3,False,,1,CHIEF,3,4.0,2.0,Marina,"(37.7979514351842, -122.443292158435)",093060021-B04
31470136,M36,3042085,Medical Incident,05/27/2003,05/27/2003,05/27/2003 10:20:47 AM,05/27/2003 10:23:27 AM,05/27/2003 10:24:33 AM,05/27/2003 10:25:53 AM,05/27/2003 10:27:43 AM,05/27/2003 10:37:56 AM,05/27/2003 10:42:17 AM,Other,05/27/2003 10:52:22 AM,8TH ST/HOWARD ST,SF,94103.0,B02,36,2335,3,3,3,True,,1,MEDIC,1,2.0,6.0,South of Market,"(37.7762213544451, -122.411606113878)",031470136-M36,31470136,M36,3042085,Medical Incident,05/27/2003,05/27/2003,05/27/2003 10:20:47 AM,05/27/2003 10:23:27 AM,05/27/2003 10:24:33 AM,05/27/2003 10:25:53 AM,05/27/2003 10:27:43 AM,05/27/2003 10:37:56 AM,05/27/2003 10:42:17 AM,Other,05/27/2003 10:52:22 AM,8TH ST/HOWARD ST,SF,94103.0,B02,36,2335,3,3,3,True,,1,MEDIC,1,2.0,6.0,South of Market,"(37.7762213544451, -122.411606113878)",031470136-M36
31470136,E36,3042085,Medical Incident,05/27/2003,05/27/2003,05/27/2003 10:20:47 AM,05/27/2003 10:23:27 AM,05/27/2003 10:24:33 AM,05/27/2003 10:25:37 AM,,,,Other,05/27/2003 10:27:44 AM,8TH ST/HOWARD ST,SF,94103.0,B02,36,2335,3,3,3,True,,1,ENGINE,2,2.0,6.0,South of Market,"(37.7762213544451, -122.411606113878)",031470136-E36,31470136,M36,3042085,Medical Incident,05/27/2003,05/27/2003,05/27/2003 10:20:47 AM,05/27/2003 10:23:27 AM,05/27/2003 10:24:33 AM,05/27/2003 10:25:53 AM,05/27/2003 10:27:43 AM,05/27/2003 10:37:56 AM,05/27/2003 10:42:17 AM,Other,05/27/2003 10:52:22 AM,8TH ST/HOWARD ST,SF,94103.0,B02,36,2335,3,3,3,True,,1,MEDIC,1,2.0,6.0,South of Market,"(37.7762213544451, -122.411606113878)",031470136-M36
112740172,B02,11090595,Medical Incident,10/01/2011,10/01/2011,10/01/2011 11:28:51 AM,10/01/2011 11:29:30 AM,10/01/2011 11:32:55 AM,,,,,Other,10/01/2011 11:36:45 AM,1000 Block of HOWARD ST,SF,94103.0,B03,1,2314,3,3,3,False,,1,CHIEF,1,3.0,6.0,South of Market,"(37.7781574042353, -122.409060995738)",112740172-B02,112740172,B02,11090595,Medical Incident,10/01/2011,10/01/2011,10/01/2011 11:28:51 AM,10/01/2011 11:29:30 AM,10/01/2011 11:32:55 AM,,,,,Other,10/01/2011 11:36:45 AM,1000 Block of HOWARD ST,SF,94103.0,B03,1,2314,3,3,3,False,,1,CHIEF,1,3.0,6.0,South of Market,"(37.7781574042353, -122.409060995738)",112740172-B02
90500238,E01,9014805,Medical Incident,02/19/2009,02/19/2009,02/19/2009 03:22:31 PM,02/19/2009 03:26:22 PM,02/19/2009 03:26:45 PM,02/19/2009 03:27:50 PM,,,,Other,02/19/2009 03:30:07 PM,100 Block of MCALLISTER ST,SF,94102.0,B02,3,1553,3,3,3,False,,1,ENGINE,3,2.0,6.0,Tenderloin,"(37.7808772585499, -122.414506898711)",090500238-E01,90500238,E05,9014805,Medical Incident,02/19/2009,02/19/2009,02/19/2009 03:22:31 PM,02/19/2009 03:26:22 PM,02/19/2009 03:26:45 PM,02/19/2009 03:27:02 PM,02/19/2009 03:29:39 PM,,,Other,02/19/2009 03:34:06 PM,100 Block of MCALLISTER ST,SF,94102.0,B02,3,1553,3,3,3,True,,1,ENGINE,1,2.0,6.0,Tenderloin,"(37.7808772585499, -122.414506898711)",090500238-E05
92090090,E01,9061910,Medical Incident,07/28/2009,07/28/2009,07/28/2009 08:36:11 AM,07/28/2009 08:37:40 AM,07/28/2009 08:38:00 AM,07/28/2009 08:39:33 AM,,,,Other,,700 Block of EDDY ST,SF,94109.0,B02,3,3163,3,3,3,True,,1,ENGINE,4,2.0,6.0,Tenderloin,"(37.7830520025585, -122.420010270255)",092090090-E01,92090090,E01,9061910,Medical Incident,07/28/2009,07/28/2009,07/28/2009 08:36:11 AM,07/28/2009 08:37:40 AM,07/28/2009 08:38:00 AM,07/28/2009 08:39:33 AM,,,,Other,,700 Block of EDDY ST,SF,94109.0,B02,3,3163,3,3,3,True,,1,ENGINE,4,2.0,6.0,Tenderloin,"(37.7830520025585, -122.420010270255)",092090090-E01
120520189,84,12017182,Medical Incident,02/21/2012,02/21/2012,02/21/2012 01:21:43 PM,02/21/2012 01:24:06 PM,02/21/2012 01:25:22 PM,02/21/2012 01:25:37 PM,02/21/2012 01:32:29 PM,02/21/2012 01:54:43 PM,02/21/2012 02:09:10 PM,Code 2 Transport,02/21/2012 02:30:53 PM,500 Block of BAKER ST,SF,94117.0,B05,21,4243,3,2,2,True,Non Life-threatening,1,MEDIC,2,5.0,5.0,Hayes Valley,"(37.775978571838, -122.441325869621)",120520189-84,120520189,84,12017182,Medical Incident,02/21/2012,02/21/2012,02/21/2012 01:21:43 PM,02/21/2012 01:24:06 PM,02/21/2012 01:25:22 PM,02/21/2012 01:25:37 PM,02/21/2012 01:32:29 PM,02/21/2012 01:54:43 PM,02/21/2012 02:09:10 PM,Code 2 Transport,02/21/2012 02:30:53 PM,500 Block of BAKER ST,SF,94117.0,B05,21,4243,3,2,2,True,Non Life-threatening,1,MEDIC,2,5.0,5.0,Hayes Valley,"(37.775978571838, -122.441325869621)",120520189-84
140260145,E12,14008876,Medical Incident,01/26/2014,01/26/2014,01/26/2014 11:41:29 AM,01/26/2014 11:41:46 AM,01/26/2014 11:42:44 AM,01/26/2014 11:42:55 AM,,,,Other,01/26/2014 11:53:10 AM,1800 Block of WALLER ST,SF,94122.0,B05,12,4552,3,3,3,True,Potentially Life-Threatening,1,ENGINE,2,5.0,5.0,Golden Gate Park,"(37.768154162945, -122.454311620171)",140260145-E12,140260145,66,14008876,Medical Incident,01/26/2014,01/26/2014,01/26/2014 11:41:29 AM,01/26/2014 11:41:46 AM,01/26/2014 11:42:44 AM,,01/26/2014 11:43:42 AM,01/26/2014 12:22:46 PM,01/26/2014 12:45:54 PM,Code 2 Transport,01/26/2014 01:12:40 PM,1800 Block of WALLER ST,SF,94122.0,B05,12,4552,3,3,3,True,Potentially Life-Threatening,1,MEDIC,1,5.0,5.0,Golden Gate Park,"(37.768154162945, -122.454311620171)",140260145-66
20320202,M12,2009404,Medical Incident,02/01/2002,02/01/2002,02/01/2002 12:11:16 PM,02/01/2002 12:15:14 PM,02/01/2002 12:15:31 PM,02/01/2002 12:16:44 PM,02/01/2002 12:28:22 PM,02/01/2002 12:41:56 PM,02/01/2002 12:58:27 PM,Other,02/01/2002 01:26:17 PM,200 Block of EVELYN WAY,SF,94127.0,B09,39,8655,2,2,2,True,,1,MEDIC,1,9.0,7.0,West of Twin Peaks,"(37.7426820880698, -122.449417082278)",020320202-M12,20320202,M12,2009404,Medical Incident,02/01/2002,02/01/2002,02/01/2002 12:11:16 PM,02/01/2002 12:15:14 PM,02/01/2002 12:15:31 PM,02/01/2002 12:16:44 PM,02/01/2002 12:28:22 PM,02/01/2002 12:41:56 PM,02/01/2002 12:58:27 PM,Other,02/01/2002 01:26:17 PM,200 Block of EVELYN WAY,SF,94127.0,B09,39,8655,2,2,2,True,,1,MEDIC,1,9.0,7.0,West of Twin Peaks,"(37.7426820880698, -122.449417082278)",020320202-M12


You might be wondering, why didn't Adaptive Query Execution automatically broadcast the dataset? Well, the entire dataset is ~59 MiB (size taken if you cache the data) which exceeds the default threshold of 10 MB.

**The moral of the story**: if you identify a way to optimize your query (e.g. move the filter before the join, broadcast a table, etc), you should optimize it yourself instead of relying on Catalyst to optimize everything.

-sandbox
&copy; 2021 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="http://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="http://help.databricks.com/">Support</a>