-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
Closed as not planned
Labels
EnhancementNeeds TriageIssue that has not been reviewed by a pandas team memberIssue that has not been reviewed by a pandas team member
Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
I wish I could use pandas to anomyze a Series or DataFrame.
Feature Description
Add a new method to Series anonymize
or hide
that hashes the input and overrides the column name by default:
df.column_one.anonymize()
Via further parameters you can specify the hashing algorithm and if the old table should be kept:
DataFrame.anonymize(self, inplace=False, hash='md5', override=True)
I sketched (only sketched! :)) how a solution could look like:
def anonymize(self, column, replace=False, hash_algorithm=hashlib.md5):
def hide(str: str):
return hash_algorithm(str.encode('utf-8')).hexdigest()
if replace:
self[column] = self[column].apply(hide)
else:
self[replace] = self[column].apply(hide)
return df
Alternative Solutions
The package anonymizedf
does a similar job but is not integrated.
Additional Context
No response
Metadata
Metadata
Assignees
Labels
EnhancementNeeds TriageIssue that has not been reviewed by a pandas team memberIssue that has not been reviewed by a pandas team member