You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a user, I often encounter data with common formatting issues that need to be resolved, but which require relatively complex solutions to accommodate this. As an example, removing leading or trailing whitespace requires knowledge of a specific regular expression most commonly to resolve. We can dramatically improve the user experience for these common functions with a Data Cleansing component which encapsulates these as easy to understand descriptions of the operation, allowing multiple operations to be executed in sequence against an array of columns.
a rough outline of the proposed API is as follows:
These operations should execute in a consistent order, regardless of the order that the user selects them in. For example, if choosing {Punctuation, Special_Characters, Tabs}, this should execute in the same sequence as {Tabs, Special_Characters, Punctuation}.
These should operate on applicable columns within the selection, so Replace_Empty_With_Zero will apply to integer, decimal, float, but would not apply to a selected String field. This should not produce a warning.
The text was updated successfully, but these errors were encountered:
As a user, I often encounter data with common formatting issues that need to be resolved, but which require relatively complex solutions to accommodate this. As an example, removing leading or trailing whitespace requires knowledge of a specific regular expression most commonly to resolve. We can dramatically improve the user experience for these common functions with a Data Cleansing component which encapsulates these as easy to understand descriptions of the operation, allowing multiple operations to be executed in sequence against an array of columns.
a rough outline of the proposed API is as follows:
data_cleansing Vector (Text) Vector (Integer | Text | Regex) -> Table
data_cleansing self (operations=Data_Cleansing) (columns=self.column_names) =
This should support the following operations:
These operations should execute in a consistent order, regardless of the order that the user selects them in. For example, if choosing {Punctuation, Special_Characters, Tabs}, this should execute in the same sequence as {Tabs, Special_Characters, Punctuation}.
These should operate on applicable columns within the selection, so Replace_Empty_With_Zero will apply to integer, decimal, float, but would not apply to a selected String field. This should not produce a warning.
The text was updated successfully, but these errors were encountered: