Drop Column example is incorrect #14

jdfrost · 2024-02-24T18:07:00Z

regarding item: src/main/scala/com/sparkbyexamples/spark/dataframe/examples/DropColumn.scala

I am running these examples in Azure PySpark 3.3 and I noticed that df.drop('colname') does NOT drop the column from the df dataframe. It only removes it from the value returned by the current pyspark statement.

Try these three lines in pyspark:

df.drop("first_name").printSchema() #prints the schema without the first_name column, same as in your examples.

df.drop("first_name"). #run this without displaying output.
df.printSchema(). #prints the schema WITH the first_name column.

Conclusion: the df.drop('col') statement does NOT change the df dataframe.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drop Column example is incorrect #14

Drop Column example is incorrect #14

jdfrost commented Feb 24, 2024

Drop Column example is incorrect #14

Drop Column example is incorrect #14

Comments

jdfrost commented Feb 24, 2024