ENH: new orient setting for read_json to support common API format #39913

ryancasburn-KAI · 2021-02-19T17:28:42Z

Is your feature request related to a problem?

I see many APIs return results in the form:

{
"results": [
    {"a" :1,
     "b":2
    },
    {
     "a" :3,
     "b":4
    }
  ]
}

This format isn't directly supported by pandas. The data is in the "records" orient, but there is an extra layer. Currently to load this file I am using the requests module to load from https, then using the json module to strip out the outer layer, then feeding this data to pd.read_json as text. This feels like overkill, since pandas can read from https, but for this format (which is common for APIs) I need multiple other packages and lines of code.

This will change a three import, multi-line issue into a single import, single line solution.

Describe the solution you'd like

While I initially describe this as a new orient, I don't think that is the best way to implement this. I believe the read_json function should have a new parameter (such as "strip_layer") which will be the value of that outer layer. In the example above that would be "results". I make this suggestion as what is inside the outer layer could be several different orients, so we need to leave that as a possibility. This is something that happens first, then the data is processed.

API breaking implications

Need to consider what this means for chunking.

Additional context

My current code:

import pandas as pd
import requests
import json

data = requests.get(url).json()
data = data["results"]
data = json.dumps(data)
data = pd.read_json(data)

versus my desired code with this improvement:

import pandas as pd

data = pd.read_json(url, strip_layer="results")

Might I suggest this gets added to the IO Method Robustness/Input Types Project?

The text was updated successfully, but these errors were encountered:

attack68 · 2021-02-19T18:17:47Z

you don't have to dump and re-read, what about:

data = requests.get(url).json()
data = pd.DataFrame(data["results"])

But I see your point..

ryancasburn-KAI · 2021-02-19T18:35:07Z

True, I wasn't the most efficient in my example. Still need another package either way.

ryancasburn-KAI added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 19, 2021

attack68 added IO JSON read_json, to_json, json_normalize Styler conditional formatting using DataFrame.style and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 19, 2021

attack68 removed the Styler conditional formatting using DataFrame.style label Jul 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: new orient setting for read_json to support common API format #39913

ENH: new orient setting for read_json to support common API format #39913

ryancasburn-KAI commented Feb 19, 2021 •

edited

attack68 commented Feb 19, 2021

ryancasburn-KAI commented Feb 19, 2021

ENH: new orient setting for read_json to support common API format #39913

ENH: new orient setting for read_json to support common API format #39913

Comments

ryancasburn-KAI commented Feb 19, 2021 • edited

Is your feature request related to a problem?

Describe the solution you'd like

API breaking implications

Additional context

attack68 commented Feb 19, 2021

ryancasburn-KAI commented Feb 19, 2021

ryancasburn-KAI commented Feb 19, 2021 •

edited