## Tutorial: Python vs SQL, Temp View, Global View

##### Topics covered
* Spark vs SQL
* Dynamically fed in parameters

Loval Temp View: 
* local  
* disappears when session dies
* `df.createOrReplaceTempView("")`

Global Temp View: 
* Accessable by any notebook attached to a cluster,
* disappears when session dies
* must use global_temp.
* `df.createOrReplaceGlobalTempView("")`

##### Pros and cons
* spark: Creates dataframes, then use dataframe API to do whatever you want to do
* SQL: Can't use python variable in SQL statments, widgets, passing race year.  

In [0]:
%run "./includes/config_file_paths"

In [0]:
%run "./includes/common_functions"

In [0]:
race_results_df = spark.read.parquet(f"{presentation_folder_path}/race_results")

In [0]:
race_results_df.createTempView("v_race_results")

In [0]:
%sql
SELECT COUNT(1)
FROM v_race_results
WHERE race_year = 2020

count(1)
340


In [0]:
race_results_2019_df = spark.sql("SELECT * FROM v_race_results WHERE race_year = 2019") 

In [0]:
ezView(race_results_2019_df, 5, 5)

+---------+--------------------+-------------------+----------------+------------------+
|race_year|           race_name|          race_date|circuit_location|       driver_name|
+---------+--------------------+-------------------+----------------+------------------+
|     2019|  Bahrain Grand Prix|2019-03-31 15:10:00|          Sakhir|   Alexander Albon|
|     2019|Hungarian Grand Prix|2019-08-04 13:10:00|        Budapest|   Nico Hülkenberg|
|     2019|Singapore Grand Prix|2019-09-22 12:10:00|      Marina Bay|Antonio Giovinazzi|
|     2019|  Belgian Grand Prix|2019-09-01 13:10:00|             Spa|      Lance Stroll|
|     2019| Austrian Grand Prix|2019-06-30 13:10:00|       Spielburg|   Charles Leclerc|
+---------+--------------------+-------------------+----------------+------------------+



##### Dynamically Programmed
* make sure to specify f string `spark.sql(f"")`
* this is good for widgets

In [0]:
race_year = 2019 # Variable can be passed via widgets!
race_results_dynamic = spark.sql(f"SELECT * FROM v_race_results WHERE race_year = {race_year}")
ezView(race_results_dynamic, 5, 5)

+---------+--------------------+-------------------+----------------+------------------+
|race_year|           race_name|          race_date|circuit_location|       driver_name|
+---------+--------------------+-------------------+----------------+------------------+
|     2019|  Bahrain Grand Prix|2019-03-31 15:10:00|          Sakhir|   Alexander Albon|
|     2019|Hungarian Grand Prix|2019-08-04 13:10:00|        Budapest|   Nico Hülkenberg|
|     2019|Singapore Grand Prix|2019-09-22 12:10:00|      Marina Bay|Antonio Giovinazzi|
|     2019|  Belgian Grand Prix|2019-09-01 13:10:00|             Spa|      Lance Stroll|
|     2019| Austrian Grand Prix|2019-06-30 13:10:00|       Spielburg|   Charles Leclerc|
+---------+--------------------+-------------------+----------------+------------------+



### Global View

In [0]:
race_results_df.createOrReplaceGlobalTempView("gv_race_results")

[0;31m---------------------------------------------------------------------------[0m
[0;31mNameError[0m                                 Traceback (most recent call last)
File [0;32m<command-2216151148664787>:1[0m
[0;32m----> 1[0m [43mgv_race_results[49m[38;5;241m.[39mcreateOrReplaceGlobalTempView([38;5;124m"[39m[38;5;124mgv_race_results[39m[38;5;124m"[39m)

[0;31mNameError[0m: name 'gv_race_results' is not defined

In [0]:
%sql
SELECT * 
  FROM gv_race_results;

[0;31m---------------------------------------------------------------------------[0m
[0;31mAnalysisException[0m                         Traceback (most recent call last)
File [0;32m<command-2216151148664788>:7[0m
[1;32m      5[0m     display(df)
[1;32m      6[0m     [38;5;28;01mreturn[39;00m df
[0;32m----> 7[0m   _sqldf [38;5;241m=[39m [43m____databricks_percent_sql[49m[43m([49m[43m)[49m
[1;32m      8[0m [38;5;28;01mfinally[39;00m:
[1;32m      9[0m   [38;5;28;01mdel[39;00m ____databricks_percent_sql

File [0;32m<command-2216151148664788>:4[0m, in [0;36m____databricks_percent_sql[0;34m()[0m
[1;32m      2[0m [38;5;28;01mdef[39;00m [38;5;21m____databricks_percent_sql[39m():
[1;32m      3[0m   [38;5;28;01mimport[39;00m [38;5;21;01mbase64[39;00m
[0;32m----> 4[0m   df [38;5;241m=[39m [43mspark[49m[38;5;241;43m.[39;49m[43msql[49m[43m([49m[43mbase64[49m[38;5;241;43m.[39;49m[43mstandard_b64decode[49m[43m([49m[38;5;124;43m"[39;

In [0]:
%sql
SHOW TABLES;

database,tableName,isTemporary
,v_race_results,True


In [0]:
%sql
SHOW TABLES IN global_temp;

database,tableName,isTemporary
global_temp,gv_race_results,True
,v_race_results,True


In [0]:
%sql
SELECT race_year, race_name, race_date, circuit_location, driver_name
FROM global_temp.gv_race_results
LIMIT 5;


race_year,race_name,race_date,circuit_location,driver_name
2018,Spanish Grand Prix,2018-05-13T13:10:00.000+0000,Montmeló,Carlos Sainz
2018,British Grand Prix,2018-07-08T13:10:00.000+0000,Silverstone,Daniel Ricciardo
2019,Bahrain Grand Prix,2019-03-31T15:10:00.000+0000,Sakhir,Alexander Albon
2019,Hungarian Grand Prix,2019-08-04T13:10:00.000+0000,Budapest,Nico Hülkenberg
2019,Singapore Grand Prix,2019-09-22T12:10:00.000+0000,Marina Bay,Antonio Giovinazzi
