Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop Column example is incorrect #14

Open
jdfrost opened this issue Feb 24, 2024 · 0 comments
Open

Drop Column example is incorrect #14

jdfrost opened this issue Feb 24, 2024 · 0 comments

Comments

@jdfrost
Copy link

jdfrost commented Feb 24, 2024

regarding item: src/main/scala/com/sparkbyexamples/spark/dataframe/examples/DropColumn.scala

I am running these examples in Azure PySpark 3.3 and I noticed that df.drop('colname') does NOT drop the column from the df dataframe. It only removes it from the value returned by the current pyspark statement.

Try these three lines in pyspark:

df.drop("first_name").printSchema() #prints the schema without the first_name column, same as in your examples.

df.drop("first_name"). #run this without displaying output.
df.printSchema(). #prints the schema WITH the first_name column.

Conclusion: the df.drop('col') statement does NOT change the df dataframe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant