In [1]:
from pyspark.sql import SparkSession

In [None]:
spark = SparkSession.builder.appName("TempTable Basics").getOrCreate()

In [3]:
df = spark.read.csv("./dataset/orders_wh.csv", header=True, inferSchema=True)
df.show(5)

+--------+-------------------+-----------+---------------+
|order_id|         order_date|customer_id|   order_status|
+--------+-------------------+-----------+---------------+
|       1|2013-07-25 00:00:00|      11599|         CLOSED|
|       2|2013-07-25 00:00:00|        256|PENDING_PAYMENT|
|       3|2013-07-25 00:00:00|      12111|       COMPLETE|
|       4|2013-07-25 00:00:00|       8827|         CLOSED|
|       5|2013-07-25 00:00:00|      11318|       COMPLETE|
+--------+-------------------+-----------+---------------+
only showing top 5 rows



**`pyspark.sql.DataFrame.createTempView` in PySpark**

- The `createTempView` method in PySpark's DataFrame API is used to create a **temporary view** of a DataFrame. This makes it possible to execute SQL queries on the DataFrame's contents.

**Why Use `createTempView`**
- Leverages the power of **SQL** for analyzing or transforming data stored in a DataFrame.
- Combines the flexibility of SQL with the scalability of Spark.

**Key Features**
- Temporary views exist only during the Spark session.
- Allows seamless switching between SQL and PySpark's DataFrame API.


In [4]:
df.createOrReplaceTempView("orders")

In [5]:
df_SQLquery = spark.sql("SELECT * FROM orders")
df_SQLquery.show(5)

+--------+-------------------+-----------+---------------+
|order_id|         order_date|customer_id|   order_status|
+--------+-------------------+-----------+---------------+
|       1|2013-07-25 00:00:00|      11599|         CLOSED|
|       2|2013-07-25 00:00:00|        256|PENDING_PAYMENT|
|       3|2013-07-25 00:00:00|      12111|       COMPLETE|
|       4|2013-07-25 00:00:00|       8827|         CLOSED|
|       5|2013-07-25 00:00:00|      11318|       COMPLETE|
+--------+-------------------+-----------+---------------+
only showing top 5 rows



In [6]:
df_readTable = spark.read.table("orders")
df_readTable.show(5)

+--------+-------------------+-----------+---------------+
|order_id|         order_date|customer_id|   order_status|
+--------+-------------------+-----------+---------------+
|       1|2013-07-25 00:00:00|      11599|         CLOSED|
|       2|2013-07-25 00:00:00|        256|PENDING_PAYMENT|
|       3|2013-07-25 00:00:00|      12111|       COMPLETE|
|       4|2013-07-25 00:00:00|       8827|         CLOSED|
|       5|2013-07-25 00:00:00|      11318|       COMPLETE|
+--------+-------------------+-----------+---------------+
only showing top 5 rows

