Skip to content

Why making feature names unique instead of aggregation? #3241

@VladimirShitov

Description

@VladimirShitov

What kind of feature would you like to request?

Additional function parameters / changed functionality / changed defaults?

Please describe your wishes

Hi scanpy team! I have a rather conceptual question. Since the beginning of the single-cell analysis era, one of the standard steps in preprocessing is making the feature names unique (e.g. with adata.var_names_make_unique()) by adding suffixes to their names. It is recommended in the scanpy tutorial and in the best practices book. It is clear how identical feature names make the following data processing challenging, but why are we handling it this way? Wouldn't it make more sense to aggregate features with identical names, summing the counts? From the biological point of view, the same gene name means the same feature, so why split it into several features and corrupt their names?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions