Setting type to category #82
-
Hi In many cases I need to set the data type of object-columns to "category", like so; df[<object_cols>] = df[<object_cols>].astype("category") How would I go about doing this in pdpipe? I've tried different approaches using ColumnDtypeEnforcer, but to no avail... |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
OK. For a static column list, this works fine (at least for me), assuming dt_enforcer = pdp. ColumnDtypeEnforcer({'a': 'category', 'b': 'category'})
post_df = dt_enforcer(df) If you want the pipeline stage to convert all object columns to "category", to be determined on runtime, do this: dt_enforcer = pdp. ColumnDtypeEnforcer({pdp.cq.OfDtypes('object'): 'category'})
post_df = dt_enforcer(df) This also works fine for me locally. Notice that if you use the same pipeline object several times, and you do not want this column list to have a fit/predict logic (meaning, to be determined once on fit time and retain this list on transform), use dt_enforcer = pdp. ColumnDtypeEnforcer({pdp.cq.OfDtypes('object', fittable=False): 'category'})
post_df = dt_enforcer(df) That's it. Let me know if this does not answer your need. |
Beta Was this translation helpful? Give feedback.
OK.
ColumnDtypeEnforcer
should actually support this.For a static column list, this works fine (at least for me), assuming
'a'
and'b'
are the labels of you object columns:If you want the pipeline stage to convert all object columns to "category", to be determined on runtime, do this:
This also works fine for me locally. Notice that if you use the same pipeline object several times, and you do not want this column list to have a fit/predict logic (meaning, to be determined once on fi…