BUG: Handle non-dict items in json_normalize with max_level #62848

parthava-adabala · 2025-10-25T21:51:48Z

closes BUG: json_normalize doesn't handle nan well when max_level=n #62829
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions. (No new arguments were added)
Added an entry in the latest doc/source/whatsnew/v3.0.0.rst file if fixing a bug or adding a new feature.

This PR fixes a bug in pd.json_normalize where an AttributeError was raised if max_level was set to an integer and the input data contained NaN or other non-dict items.

The fix involves two parts:

Updating the if any(...) check in json_normalize to correctly trigger nested_to_record.
Adding a check inside nested_to_record to handle non-dict items (like nan) by treating them as empty dicts, which prevents the AttributeError.

parthava-adabala · 2025-10-25T21:55:28Z

pre-commit.ci autofix

rhshadrach

Thanks for the PR!

rhshadrach · 2025-10-26T17:33:35Z

pandas/io/json/_normalize.py

+            new_ds.append({})
+            continue


The json_normalize function is type hinted as "dict or list of dicts". It seems to me if this is not adhered to, the method should raise instead of silently ignoring entries.

Thanks for the review @rhshadrach , I see it's violating the "dict or list of dicts" requirement.

So, ideally it should raise type error regardless of whether the max_level is set or not. In that case, I'm thinking of adding a new validation check at the top of the json_normalize function.

I've updated the PR based on your feedback.

The function now raises a TypeError if data is a list containing any non-dict items, which is enforced before either the max_level=None or max_level=0 paths are taken. I have also updated the respective tests and docs for what's new.

rhshadrach

Looks good - just a small request on simplifying the test.

Would like to get another eye here, don't love O(n) validation but I don't see a better approach. And the time it takes is 1% compared to the runtime without.

Timings

d = [{"id": 12, "size": 20} for _ in range(10_000)]
%timeit pd.json_normalize(d, max_level=0)
# 16.5 ms ± 127 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

def validate(data):
    for item in data:
        if not isinstance(item, dict):
            raise TypeError

%timeit validate(d)
# 167 μs ± 669 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

pandas/tests/io/json/test_normalize.py

pandas/io/json/_normalize.py

for more information, see https://pre-commit.ci

rhshadrach

lgtm

rhshadrach requested changes Oct 26, 2025

View reviewed changes

parthava-adabala force-pushed the bug-json-normalize branch from 7f68034 to 55d1419 Compare October 26, 2025 19:59

rhshadrach requested changes Oct 27, 2025

View reviewed changes

pandas/tests/io/json/test_normalize.py Outdated Show resolved Hide resolved

rhshadrach requested a review from mroeschke October 27, 2025 20:40

rhshadrach added Error Reporting Incorrect or improved errors from pandas IO JSON read_json, to_json, json_normalize labels Oct 27, 2025

mroeschke reviewed Oct 27, 2025

View reviewed changes

pandas/io/json/_normalize.py Show resolved Hide resolved

parthava-adabala and others added 3 commits October 27, 2025 16:16

BUG: Handle non-dict items in json_normalize with max_level

784ad59

[pre-commit.ci] auto fixes from pre-commit.com hooks

6ebda25

for more information, see https://pre-commit.ci

BUG: Raise TypeError for non-dict items in json_normalize

3ec415f

parthava-adabala force-pushed the bug-json-normalize branch from b64a372 to 3ec415f Compare October 27, 2025 21:19

rhshadrach approved these changes Oct 28, 2025

View reviewed changes

rhshadrach added this to the 3.0 milestone Oct 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: Handle non-dict items in json_normalize with max_level #62848

BUG: Handle non-dict items in json_normalize with max_level #62848

Uh oh!

parthava-adabala commented Oct 25, 2025

Uh oh!

parthava-adabala commented Oct 25, 2025

Uh oh!

rhshadrach left a comment

Uh oh!

rhshadrach Oct 26, 2025

Uh oh!

parthava-adabala Oct 26, 2025

Uh oh!

parthava-adabala Oct 26, 2025

Uh oh!

rhshadrach left a comment

Uh oh!

Uh oh!

Uh oh!

rhshadrach left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

BUG: Handle non-dict items in json_normalize with max_level #62848

Are you sure you want to change the base?

BUG: Handle non-dict items in json_normalize with max_level #62848

Uh oh!

Conversation

parthava-adabala commented Oct 25, 2025

Uh oh!

parthava-adabala commented Oct 25, 2025

Uh oh!

rhshadrach left a comment

Choose a reason for hiding this comment

Uh oh!

rhshadrach Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

parthava-adabala Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

parthava-adabala Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

rhshadrach left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rhshadrach left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants