Skip to content

Commit fe44b50

Browse files
authored
Update III. Getting started with machine learning pipelines.py
1 parent 09cce93 commit fe44b50

File tree

1 file changed

+9
-0
lines changed

1 file changed

+9
-0
lines changed

Introduction to PySpark/III. Getting started with machine learning pipelines.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,10 @@
7979
2 > encode w/ 'OneHotEncoder'.
8080
carr_encoder = OneHotEncoder(inputCol='carrier_index',outputCol='carrier_fact')
8181
- 'Pipeline' will take care of the rest.
82+
-----------------------
83+
> 'VectorAssembler' -> combine all of the columns containing our features into a single column
84+
inputCol= ['column_name1','c2','c3']
85+
outputCol= 'features'
8286
"""
8387
#|
8488
#|
@@ -104,3 +108,8 @@
104108
#|
105109
#|
106110
### Assemble a vector
111+
# Make a VectorAssembler
112+
vec_assembler = VectorAssembler(inputCols=["month", "air_time", "carrier_fact", "dest_fact", "plane_age"], outputCol='features')
113+
#|
114+
#|
115+
###

0 commit comments

Comments
 (0)