Skip to content

Commit c5af5c9

Browse files
authored
Create II. Manipulating data.py
1 parent 12b8867 commit c5af5c9

File tree

1 file changed

+27
-0
lines changed

1 file changed

+27
-0
lines changed
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
"""
2+
pyspark.sql module, which provides optimized data queries to your Spark session.
3+
4+
| Pyspark attributes : manipulation |
5+
6+
- .withColumn() -> create new column . <takes> ('new_column', old_col + 1)
7+
- df.colName() -> extract column name
8+
9+
| pyspark methods : manipulation |
10+
11+
- spark.table() -> create df containing values of table in the .catalog
12+
"""
13+
#|
14+
#|
15+
### Creating columns
16+
# Create the DataFrame flights
17+
flights = spark.table('flights') # create table 'name'
18+
19+
# Show the head
20+
flights.show() # head() is default
21+
22+
# Add duration_hrs from air_time
23+
flights = flights.withColumn('duration_hrs', flights.air_time /60)
24+
#|
25+
#|
26+
"""
27+
\ SQL in a nutshell /

0 commit comments

Comments
 (0)