## Introduction to Creating DataFrames in PySpark

### Links and Resources
- [createDataFrame](https://spark.apache.org/docs/3.5.3/api/python/reference/pyspark.sql/api/pyspark.sql.SparkSession.createDataFrame.html) 
- [Spark Data Types](https://spark.apache.org/docs/3.5.3/sql-ref-datatypes.html)
- [Data Types Class](https://spark.apache.org/docs/3.5.3/api/python/reference/pyspark.sql/data_types.html)

In this notebook, you will learn how to create DataFrames from Python data structures, specify schemas, and explore various methods to view and manipulate the schema.

In [None]:
# Define a simple data structure
data = [['John', 21]]
# Check the type of the data structure
print(type(data))

In [None]:
# Create a DataFrame from the data structure
df = spark.createDataFrame(data)
# Display the contents of the DataFrame
df.show()

In [None]:
# Verify the type of the DataFrame
print(type(df))

In [None]:
# Define a data structure with multiple records
data = [('John', 21), ('Amy', 25), ('Anita', 41), ('Rohan', 25), ('Maria', 37)]
# Create a DataFrame from the data structure
df = spark.createDataFrame(data)
# Display the contents of the DataFrame
df.show()

### Adding a Schema with Column Names

In [None]:
# Define column names
column_names = ['name', 'age']
# Create a DataFrame with the defined column names
df = spark.createDataFrame(data, column_names)
# Display the DataFrame
df.show()

In [None]:
# Define the schema using SQL-like syntax
schema = 'name string, age int'
# Create a DataFrame using the defined schema
df = spark.createDataFrame(data, schema)
# Display the DataFrame
df.show()

### Exploring Spark Data Types

In [None]:
# Import PySpark data types
from pyspark.sql.types import StructType, StructField, StringType, IntegerType

# Define a schema using StructType and StructField
schema = StructType([
    StructField('name', StringType(), True),
    StructField('age', IntegerType(), False)
])

# Create a DataFrame with the custom schema
df = spark.createDataFrame(data, schema=schema)
# Display the DataFrame
df.show()

In [None]:
# View the schema of the DataFrame
print(df.schema)

In [None]:
# View the data types of the DataFrame columns
print(df.dtypes)

In [None]:
# Display the schema using the printSchema() method
df.printSchema()

In [None]:
# Display basic statistics for the DataFrame
df.describe().show()

### Creating a DataFrame from a List of Dictionaries

In [None]:
# Define data as a list of dictionaries
data = [
    {'name': 'John', 'age': 21},
    {'name': 'Amy', 'age': 25},
    {'name': 'Anita', 'age': 41},
    {'name': 'Rohan', 'age': 25},
    {'name': 'Maria', 'age': 37}
]

# Create a DataFrame from the data
df = spark.createDataFrame(data)
# Display the contents of the DataFrame
df.show()

In [None]:
# Define a schema for the DataFrame
schema = StructType([
    StructField('name', StringType(), True),
    StructField('age', IntegerType(), False)
])

# Create a DataFrame using the defined schema
df = spark.createDataFrame(data, schema=schema)
# Display the DataFrame
df.show()