Skip to content

Commit 0130f76

Browse files
authored
Update III. Getting started with machine learning pipelines.py
1 parent d394bd0 commit 0130f76

File tree

1 file changed

+12
-2
lines changed

1 file changed

+12
-2
lines changed

Introduction to PySpark/III. Getting started with machine learning pipelines.py

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -74,8 +74,10 @@
7474
> pyspark.ml.features submodule
7575
'one-hot vectors' -> all elements are zero except for at most one element, which has a value of one (1).
7676
77-
- create a 'StringIndexer'
78-
- encode w/ 'OneHotEncoder'
77+
1 > create a 'StringIndexer'.
78+
carr_indexer = StringIndexer(inputCol='carrier',outputCol='carrier_index')
79+
2 > encode w/ 'OneHotEncoder'.
80+
carr_encoder = OneHotEncoder(inputCol='carrier_index',outputCol='carrier_fact')
7981
- 'Pipeline' will take care of the rest.
8082
"""
8183
#|
@@ -86,3 +88,11 @@
8688
#|
8789
#|
8890
### Carrier
91+
# Create a StringIndexer
92+
carr_indexer = StringIndexer(imputCol='carrier',outputCol='carrier_index')
93+
94+
# Create a OneHotEncoder
95+
carr_encoder = OneHotEncoder(imputCol='carrier_index',outputCol='carrier_fact')
96+
#|
97+
#|
98+
### Destination

0 commit comments

Comments
 (0)