We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 12b8867 commit c5af5c9Copy full SHA for c5af5c9
Introduction to PySpark/II. Manipulating data.py
@@ -0,0 +1,27 @@
1
+"""
2
+pyspark.sql module, which provides optimized data queries to your Spark session.
3
+
4
+| Pyspark attributes : manipulation |
5
6
+ - .withColumn() -> create new column . <takes> ('new_column', old_col + 1)
7
+ - df.colName() -> extract column name
8
9
+| pyspark methods : manipulation |
10
11
+ - spark.table() -> create df containing values of table in the .catalog
12
13
+#|
14
15
+### Creating columns
16
+# Create the DataFrame flights
17
+flights = spark.table('flights') # create table 'name'
18
19
+# Show the head
20
+flights.show() # head() is default
21
22
+# Add duration_hrs from air_time
23
+flights = flights.withColumn('duration_hrs', flights.air_time /60)
24
25
26
27
+\ SQL in a nutshell /
0 commit comments