## Creating Spark DataFrames

PySpark SQL DataFrame is a distributed collection of data organized into named columns. Under the hood, DataFrames are built on top of RDDs

### `rdd.toDF()`
The `toDF()` method is used to convert an RDD to DataFrame. The method is available on RDD of Row objects.

In [None]:
# Create an RDD from a list
hrly_views_rdd  = spark.sparkContext.parallelize([
    ["Betty_White" , 288886],
    ["Main_Page", 139564],
    ["New_Year's_Day", 7892],
    ["ABBA", 8154]
])

# Convert RDD to DataFrame
hrly_views_df = hrly_views_rdd\
    .toDF(["article_title", "view_count"])

### `DataFrame.show()`

The `show()` method is used to display the content of the DataFrame. By default, it shows the first 20 rows.

In [None]:
hrly_views_df.show(4, truncate=False)

```text
+--------------+-----------+
| article_title| view_count|
+--------------+-----------+
|   Betty_White|     288886|
|     Main_Page|     139564|
|New_Year's_Day|       7892|
|          ABBA|       8154|
+--------------+-----------+
```

### `DataFrame.rdd`

The `rdd` attribute is used to convert a DataFrame to RDD.

In [None]:
# Access DataFrame's underlying RDD
hrly_views_df_rdd = hrly_views_df.rdd

# Check object type
print(type(hrly_views_df_rdd)) 
# <class 'pyspark.rdd.RDD'>