In PySpark, the .dtypes attribute is a quick way to get column names and their data types in a DataFrame.

In [0]:
data = [
    (1, "Alice", 25, 50000.50, True),
    (2, "Bob",   30, 60000.00, False),
    (3, "Cathy", 28, 70000.75, True),
]

columns = ["id", "name", "age", "salary", "is_active"]

df = spark.createDataFrame(data, columns)

df.display()

id,name,age,salary,is_active
1,Alice,25,50000.5,True
2,Bob,30,60000.0,False
3,Cathy,28,70000.75,True


In [0]:
print(df.dtypes)

[('id', 'bigint'), ('name', 'string'), ('age', 'bigint'), ('salary', 'double'), ('is_active', 'boolean')]


In [0]:
col_names = [name for name, dtype in df.dtypes]
print(col_names)

['id', 'name', 'age', 'salary', 'is_active']


In [0]:
col_types = [dtype for name, dtype in df.dtypes]
print(col_types)

['bigint', 'string', 'bigint', 'double', 'boolean']


In [0]:
df.printSchema()

root
 |-- id: long (nullable = true)
 |-- name: string (nullable = true)
 |-- age: long (nullable = true)
 |-- salary: double (nullable = true)
 |-- is_active: boolean (nullable = true)



In [0]:
numeric_cols = [name for name, dtype in df.dtypes if dtype in ("int", "bigint", "double", "float")]
print("Numeric columns:", numeric_cols)

Numeric columns: ['id', 'age', 'salary']


### ✅ Summary

df.dtypes → list of (column, type) tuples.

df.printSchema() → pretty tree format.

You can filter/extract names or types programmatically.