Skip to content

Feature suggestion: flexible hierarchical data (json) importer (will implement if interest exists) #12286

@tkluck

Description

@tkluck

Hi there,

I recently wrote a very flexible module for flattening hierarchical (json) data into CSV: https://github.com/tkluck/Text-CSV-Flatten.

It encodes the exact semantics of the flattening by a pattern string. For example, the pattern .<index>.* flattens in the same way as orient="records". The pattern .*.<index> flattens in the same way as orient="columns".

The module is quite new and has already been very useful for me: I maintain an internal reporting tool for my employer, in which we expose hierarchical data. This module allows my users to download it as CSV with a minimum of effort on my side (no boilerplate) and on theirs.

I realized that the same thing might be very useful for Pandas. Not only does the pattern flatten the hierarchical data into rows and columns, it also encodes which columns are supposed to be an index. In CSV, this information in lost in the output format, but in a DataFrame, that remains meaningful.

This module fulfils a very similar function to what io.json.read_json, io.json.json_normalize and io.json.nested_to_record do. In fact, I think it can reproduce any of their semantics by just tweaking the pattern string.

I wouldn't mind at all to spend some time on adding this to Pandas. If I do that, would you be interested in merging it into your next version?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions