### Reading Spark Plans with Explain Tutorial

In [0]:
df = spark.table("workspace.default.movies")
df.show(5)

+--------------------+---------+------------+-----------+--------------------+-------+--------+---------+--------+--------+
|               title| industry|release_year|imdb_rating|              studio| budget| revenue|     unit|currency|language|
+--------------------+---------+------------+-----------+--------------------+-------+--------+---------+--------+--------+
|     Pather Panchali|Bollywood|        1955|        8.3|Government of Wes...|70000.0|100000.0|Thousands|     INR| Bengali|
|Doctor Strange in...|Hollywood|        2022|          7|      Marvel Studios|  200.0|   954.8| Millions|     USD| English|
|Thor: The Dark Wo...|Hollywood|        2013|        6.8|      Marvel Studios|  165.0|   644.8| Millions|     USD| English|
|     Thor: Ragnarok |Hollywood|        2017|        7.9|      Marvel Studios|  180.0|   854.0| Millions|     USD| English|
|Thor: Love and Th...|Hollywood|        2022|        6.8|      Marvel Studios|  250.0|   670.0| Millions|     USD| English|
+-------

### Print Logical and Physical Plan

In [0]:
from pyspark.sql import functions as F

df = spark.table("workspace.default.movies")
df_narrow = df.select("title","studio","imdb_rating").filter(F.col("release_year") >= 2010)


df_narrow.explain("extended")  # no Exchange

== Parsed Logical Plan ==
'Filter '`>=`('release_year, 2010)
+- 'Project ['title, 'studio, 'imdb_rating]
   +- 'UnresolvedRelation [workspace, default, movies], [], false

== Analyzed Logical Plan ==
title: string, studio: string, imdb_rating: string
Project [title#11630, studio#11634, imdb_rating#11633]
+- Filter (release_year#11632L >= cast(2010 as bigint))
   +- Project [title#11630, studio#11634, imdb_rating#11633, release_year#11632L]
      +- SubqueryAlias workspace.default.movies
         +- Relation workspace.default.movies[title#11630,industry#11631,release_year#11632L,imdb_rating#11633,studio#11634,budget#11635,revenue#11636,unit#11637,currency#11638,language#11639] parquet

== Optimized Logical Plan ==
Project [title#11630, studio#11634, imdb_rating#11633]
+- Filter (isnotnull(release_year#11632L) AND (release_year#11632L >= 2010))
   +- Relation workspace.default.movies[title#11630,industry#11631,release_year#11632L,imdb_rating#11633,studio#11634,budget#11635,revenue#11636,

### Print only Physical Plan

In [0]:
df_narrow.explain("formatted")

== Physical Plan ==
* ColumnarToRow (4)
+- PhotonResultStage (3)
   +- PhotonProject (2)
      +- PhotonScan parquet workspace.default.movies (1)


(1) PhotonScan parquet workspace.default.movies
Output [4]: [title#11701, release_year#11703L, imdb_rating#11704, studio#11705]
DictionaryFilters: [(release_year#11703L >= 2010)]
Location: PreparedDeltaFileIndex [s3://dbstorage-prod-ftgok/uc/79a99d11-bc4e-43f0-a401-b12e20be6025/fba37f23-14f9-4927-9ade-961e9f768757/__unitystorage/catalogs/175a67df-9974-43f3-a33a-f690ccef30f2/tables/bd69b1e4-307f-4ceb-8369-e74cf80f21d7]
ReadSchema: struct<title:string,release_year:bigint,imdb_rating:string,studio:string>
RequiredDataFilters: [isnotnull(release_year#11703L), (release_year#11703L >= 2010)]

(2) PhotonProject
Input [4]: [title#11701, release_year#11703L, imdb_rating#11704, studio#11705]
Arguments: [title#11701, studio#11705, imdb_rating#11704]

(3) PhotonResultStage
Input [3]: [title#11701, studio#11705, imdb_rating#11704]

(4) ColumnarToRow [co

In [0]:
display(df_narrow)

title,studio,imdb_rating
Doctor Strange in the Multiverse of Madness,Marvel Studios,7.0
Thor: The Dark World,Marvel Studios,6.8
Thor: Ragnarok,Marvel Studios,7.9
Thor: Love and Thunder,Marvel Studios,6.8
Interstellar,Warner Bros. Pictures,8.6
Parasite,,8.5
Avengers: Endgame,Marvel Studios,8.4
Avengers: Infinity War,Marvel Studios,8.4
Captain America: The First Avenger,Marvel Studios,6.9
Captain America: The Winter Soldier,Marvel Studios,7.8
