-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REF: eliminate method _write() in json writers #36218
Conversation
pandas/io/json/_json.py
Outdated
class SeriesWriter(Writer): | ||
_default_orient = "index" | ||
if not index and orient == "split": | ||
self.obj = {"name": obj.name, "data": obj.values} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm why do we need to do this now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not know. I simply used the original implementation and moved whatever assignments were in _write
into __init__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh OK - I see what was done here. Let's not assign to self.obj
as that will mess with the idempotency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We assign to self.obj
at the moment of the writer initialization, not when calling a method. So, I would argue that it is not harmful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think it is confusing to change what self.obj
is for certain orients - would prefer not to do this
If there is a problem with updating |
Yea I don't think this is an improvement as it is better for the subclasses to override the implementation where appropriate rather than modifying the object that gets stored during initialization. Let's keep open for a few days as someone may disagree |
Would you consider another implementation? I created a new property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I think this is a better approach
pandas/io/json/_json.py
Outdated
) | ||
|
||
@property | ||
@abstractmethod | ||
def obj_to_write(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add an annotation for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@@ -33,7 +35,7 @@ | |||
# interface to/from | |||
def to_json( | |||
path_or_buf, | |||
obj, | |||
obj: NDFrame, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be FrameOrSeries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with FrameOrSeries
is that it does not allow changing the type of return value.
Inside the function to_json
there is a statement, which changes object from series to frame, which is not allowed.
if orient == "table" and isinstance(obj, Series):
obj = obj.to_frame(name=obj.name or "values")
mypy error:
pandas/io/json/_json.py:78: error: Incompatible types in assignment (expression has type "DataFrame", variable has type "FrameOrSeries") [assignment]
I tried to use FrameOrSeriesUnion
, which allows changing type from series to frame, but I get another mypy error.
Apparently for NDFrame
only FrameOrSeries
is a correct casting.
pandas/core/generic.py:2159: error: The erased type of self "Union[pandas.core.frame.DataFrame, pandas.core.series.Series]" is not a supertype of its class "pandas.core.generic.NDFrame" [misc]
So, I have no clue how to deal with typing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WillAyd i expected FrameOrSeriesUnion would work for this, am i missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ivanovmg can you merge master and update the error message here? I don't think that line number is reflective of the latest code base
I think the error message is stemming from a mixture of using a Union and a TypeVar (the latter being somewhere in generic). Ideally should be cleaned up but I don't think needs to hold up this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we go.
I merged master, then modified pandas/core/generic.py
.
line 2164:
def to_json(
self: FrameOrSeriesUnion,
path_or_buf: Optional[FilePathOrBuffer] = None,
And replaced NDFrame
with FrameOrSeriesUnion
in _json.py
.
The error would be:
pandas/core/generic.py:2164: error: The erased type of self "Union[pandas.core.frame.DataFrame, pandas.core.series.Series]" is not a supertype of its class "pandas.core.generic.NDFrame" [misc]
However, if I do not touch generic.py
, then the following error with be thrown.
pandas/core/generic.py:2437: error: Argument "obj" to "to_json" has incompatible type "NDFrame"; expected "Union[DataFrame, Series]" [arg-type]
@abstractmethod | ||
def obj_to_write(self) -> Union[NDFrame, Dict[str, Any]]: | ||
"""Object to write in JSON format.""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explicitly pass
here
are the keys in the Dict necessarily str, or can they be any Index entries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, actually the keys in the dictionary can be of IndexLabel
type.
In FrameWriter
the resulting dictionary keys can be tuples of strings in case of multicolumns.
In other cases the keys seem to be only strings.
nitpick about annotations, otherwise LGTM |
All checks are good after merging with master, except Windows:
@jbrockmendel, unfortunately, I cannot fully resolve the issue with typing. Please see my comment above in relation to |
Thanks @ivanovmg |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
Attempted to figure out what is wrong with the issue #36211.
I noticed that the private methods
_write
are defined in each child of classWriter
, which simply re-defineobj
and call the parent_write
method.It turns out that the source of the recursion does not lie there, but I refactored the module slightly by removing the mediator
_write
method. Instead I put the differences between the writers into their__init__
method.It seems that source of the problem with the recursion mentioned in the bug report is in
pandas/_libs/src/ujson/python/objToJSON.c
.