-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
Description
Hi there,
I recently wrote a very flexible module for flattening hierarchical (json) data into CSV: https://github.com/tkluck/Text-CSV-Flatten.
It encodes the exact semantics of the flattening by a pattern string. For example, the pattern .<index>.*
flattens in the same way as orient="records"
. The pattern .*.<index>
flattens in the same way as orient="columns"
.
The module is quite new and has already been very useful for me: I maintain an internal reporting tool for my employer, in which we expose hierarchical data. This module allows my users to download it as CSV with a minimum of effort on my side (no boilerplate) and on theirs.
I realized that the same thing might be very useful for Pandas. Not only does the pattern flatten the hierarchical data into rows and columns, it also encodes which columns are supposed to be an index. In CSV, this information in lost in the output format, but in a DataFrame, that remains meaningful.
This module fulfils a very similar function to what io.json.read_json
, io.json.json_normalize
and io.json.nested_to_record
do. In fact, I think it can reproduce any of their semantics by just tweaking the pattern string.
I wouldn't mind at all to spend some time on adding this to Pandas. If I do that, would you be interested in merging it into your next version?