Skip to content

ENH:  #47977

@michaeldorner

Description

@michaeldorner

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I wish I could use pandas to anomyze a Series or DataFrame.

Feature Description

Add a new method to Series anonymize or hide that hashes the input and overrides the column name by default:

df.column_one.anonymize()

Via further parameters you can specify the hashing algorithm and if the old table should be kept:

DataFrame.anonymize(self, inplace=False, hash='md5', override=True)

I sketched (only sketched! :)) how a solution could look like:

def anonymize(self, column, replace=False, hash_algorithm=hashlib.md5):
    def hide(str: str):
        return hash_algorithm(str.encode('utf-8')).hexdigest()
          
    if replace:
        self[column] = self[column].apply(hide)
    else:
        self[replace] = self[column].apply(hide)
    return df 

Alternative Solutions

The package anonymizedf does a similar job but is not integrated.

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions