## Spark的执行计划
Spark不会立即执行transformation直达需要运行action, 但是每个transformation都会记录下执行计划。执行计划是spark做到失败恢复的基础功能， 也是spark用于优化transformation的地方。
Spark对Dataframe有特别的优化，在响应一个action前spark会经历如下步骤：

- 解析逻辑执行计划
- 分析逻辑执行计划，并且解析所有表，列和函数
- 优化逻辑执行计划
- 通过把所有步骤映射到rdd操作创建物理的执行计划

### dataframe和rdd的关系
Dataframe是建立在rdd上的，虽然实际使用时rdd只有在最后一步才会生成。具体是，dataframe首先会在高层抽象收集所有的transformation，然后rdd在最后一步执行。
实际使用时，我们是可以访问`dataframe.rdd`,但是不是很建议这么用，因为这一步会创建实际的物理执行计划，这样会对性能有影响。


## 使用explain查看实际例子的执行计划

In [2]:
from pyspark.sql.functions import *
from pyspark.sql import SparkSession
import matplotlib.pyplot as plt
spark = (
    SparkSession.builder.appName("Oncosolve")
    .master('local[*]')
    .config("spark.executor.memory", "30g")
    .config("spark.driver.memory", "20g")
    .config("spark.sql.execution.arrow.pyspark.enabled", "true")
    .getOrCreate()
)

In [3]:
%matplotlib inline

In [9]:
series = spark.read.option('header', True).option("inferSchema", True) \
        .csv('data/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
series.show()
series.explain(True)

+--------------------+-------------------+---------+----------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+------+------+------+------+------+------+------+------+------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+------+------+------+------+------+------+------+------+------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+------+------+------+------+------+------+------+------+------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+------+------+------+------+------+------+------+------+------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-----

从上面的执行解释看到，执行计划分为4部：解析->分析->优化->物理执行