**Method 01**

In [0]:
data = [15, 25, 36, 44, 57, 65, 89, 95, 9]

df4 = spark.createDataFrame([(x,) for x in data], ["Numbers"])
display(df4)

Numbers
15
25
36
44
57
65
89
95
9


**✅ Why use (x,) and not just [x for x in data]?**

- **[(x,) for x in data]** creates a **list of 1-element tuples**:

      [(15,), (25,), (36,), ..., (9,)]

  - Each item is a **tuple with one element**, which maps correctly to the **one-column schema ["Numbers"]**. PySpark sees **each tuple as a row**.
  
  - **createDataFrame()** expects each element of the input **list to be a row**, which means a **tuple or list** representing the values in each column.

- So, when you write:

      [(x,) for x in data]

  - **[(x,)]** creates a **list of tuples**, each containing a **single value** — this is what PySpark expects when you give it a schema like ["Numbers"]

  - **each row** has **1 value** (for the column Numbers), so create it as a **tuple of one element**.

In [0]:
df4 = spark.createDataFrame([x for x in data], ["Numbers"])
display(df4)

[0;31m---------------------------------------------------------------------------[0m
[0;31mPySparkTypeError[0m                          Traceback (most recent call last)
File [0;32m<command-3931534879136477>, line 1[0m
[0;32m----> 1[0m df4 [38;5;241m=[39m spark[38;5;241m.[39mcreateDataFrame([x [38;5;28;01mfor[39;00m x [38;5;129;01min[39;00m data], [[38;5;124m"[39m[38;5;124mNumbers[39m[38;5;124m"[39m])
[1;32m      2[0m display(df4)

File [0;32m/databricks/spark/python/pyspark/instrumentation_utils.py:47[0m, in [0;36m_wrap_function.<locals>.wrapper[0;34m(*args, **kwargs)[0m
[1;32m     45[0m start [38;5;241m=[39m time[38;5;241m.[39mperf_counter()
[1;32m     46[0m [38;5;28;01mtry[39;00m:
[0;32m---> 47[0m     res [38;5;241m=[39m func([38;5;241m*[39margs, [38;5;241m*[39m[38;5;241m*[39mkwargs)
[1;32m     48[0m     logger[38;5;241m.[39mlog_success(
[1;32m     49[0m         module_name, class_name, function_name, time[38;5;241m.[39mperf_c

**❌ What happens with [x for x in data]?**

- **[x for x in data]** produces

      [15, 25, 36, 44, 57, 65, 89, 95, 9]

  - This is a **list of integers, not a list of rows/tuples**.
  - These are **integers, not tuples or rows**. Spark doesn't know how to treat an **int as a row**. It expects something like **(15,) or [15]** to match with the column definition **["Numbers"]**.
- If you try:

      df = spark.createDataFrame([15, 25, 36], ["Numbers"])

- You'll get an **error** like:

      TypeError: StructType can only be created from list or tuple, got <class 'int'>
      TypeError: StructType can not be applied to an int

  - Because Spark tries to interpret **15 as a row**, and it **can't match** it to the **schema**.

**✅ Method 02: Alternate using List**

- This works too, because each **[x]** is a **one-element list**, Spark can treat this as a **single-row** entry.

In [0]:
df4 = spark.createDataFrame([[x] for x in data], ["Numbers"])
display(df4)

Numbers
15
25
36
44
57
65
89
95
9


**✅ Method 03: Alternative using Row**

In [0]:
from pyspark.sql import Row

df = spark.createDataFrame([Row(Numbers=x) for x in data])
display(df)

Numbers
15
25
36
44
57
65
89
95
9


**Summary:**
| Expression	| Works?	 | Reason |
|-------------|----------|--------|
| [(x,) for x in data]	| ✅	| Tuple per row (1-element row). Spark accepts this.|
| [[x] for x in data]	| ✅	| List per row. Spark accepts this as well. |
| [x for x in data]	| ❌	| Just a list of integers — Spark can't treat plain ints as row data. |

**Method 04: List of tuples (single element)**

In [0]:
# Create a sample DataFrame
data = [(1,), (2,), (3,), (4,), (5,), (6,), (7,), (8,), (9,), (10,)]
df = spark.createDataFrame(data, ["id"])
display(df)

id
1
2
3
4
5
6
7
8
9
10
