In PySpark, a DataFrame is a distributed collection of data organized into named columns. It's conceptually similar to a table in a relational database or a DataFrame in Pandas. PySpark DataFrames are designed to process large-scale data processing tasks across distributed computing environments.

To create a DataFrame from static values in PySpark, you typically use the createDataFrame function from the pyspark.sql module. Here's a basic example of how you can create a DataFrame from static values:


In [0]:
from pyspark.sql import SparkSession, Row

# Define static data
data = [
    Row(id=1, name='Alice', age=25),
    Row(id=2, name='Bob', age=30),
    Row(id=3, name='Charlie', age=35)
]

# Create a DataFrame
df = spark.createDataFrame(data)

# Show the DataFrame
df.show()


#1. Creating DataFrame from a List of Tuples 

In [0]:
# Define data as a list of tuples
data = [
    (1, "Alice", 25),
    (2, "Bob", 30),
    (3, "Charlie", 35)
]

# Define the schema (column names)
columns = ["id", "name", "age"]

# Create DataFrame
df = spark.createDataFrame(data, schema=columns)

# Show the DataFrame
df.show()


### Note:
Each tuple represents a row.

The columns list defines the DataFrame's schema.

This method is great for quick testing and small, static datasets.

# 2. Creating DataFrame from List of Dictionaries

In [0]:
# Define data as a list of dictionaries
data = [
    {"id": 1, "name": "Alice", "age": 25},
    {"id": 2, "name": "Bob", "age": 30},
    {"id": 3, "name": "Charlie", "age": 35}
]

# Create DataFrame
df = spark.createDataFrame(data)

# Show the DataFrame
df.show()

### Note:

This method automatically infers the schema from the dictionary keys.

It’s particularly useful when you're working with JSON-like structures or data from APIs.