Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: Remove literal string input for read_xml #53809

Merged
merged 23 commits into from
Jul 11, 2023
Merged

DEPR: Remove literal string input for read_xml #53809

merged 23 commits into from
Jul 11, 2023

Conversation

rmhowe425
Copy link
Contributor

@rmhowe425 rmhowe425 commented Jun 22, 2023

@rmhowe425 rmhowe425 marked this pull request as draft June 22, 2023 23:48
@mroeschke mroeschke added Deprecate Functionality to remove in pandas IO XML read_xml, to_xml labels Jun 23, 2023
@rmhowe425 rmhowe425 marked this pull request as ready for review June 24, 2023 12:22
@rmhowe425
Copy link
Contributor Author

@mroeschke PR is ready for review!

@@ -298,13 +298,15 @@ Deprecations
- Deprecated constructing :class:`SparseArray` from scalar data, pass a sequence instead (:issue:`53039`)
- Deprecated falling back to filling when ``value`` is not specified in :meth:`DataFrame.replace` and :meth:`Series.replace` with non-dict-like ``to_replace`` (:issue:`33302`)
- Deprecated literal json input to :func:`read_json`. Wrap literal json string input in ``io.StringIO`` instead. (:issue:`53409`)
- Deprecated literal string input to :func:`read_xml`. Wrap literal string/bytes input in ``io.StringIO`` instead. (:issue:`53767`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Deprecated literal string input to :func:`read_xml`. Wrap literal string/bytes input in ``io.StringIO`` instead. (:issue:`53767`)
- Deprecated literal string input to :func:`read_xml`. Wrap literal string/bytes input in ``io.StringIO``/``io.BytesIO`` instead. (:issue:`53767`)

pandas/io/xml.py Outdated
@@ -894,6 +896,9 @@ def read_xml(
string or a path. The string can further be a URL. Valid URL schemes
include http, ftp, s3, and file.

.. deprecated:: 2.1.0
Passing html literal strings is deprecated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you mention wrapping in StringIO/BytesIO as an alternate?

pandas/io/xml.py Outdated
@@ -1119,6 +1125,15 @@ def read_xml(
"""
check_dtype_backend(dtype_backend)

if isinstance(path_or_buffer, str) and "\n" in path_or_buffer:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is \n a reliable way detect if it's literal xml?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use is_file_like (and in your other PRs)

Copy link
Contributor Author

@rmhowe425 rmhowe425 Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke Ah thanks for letting me know about is_file_like!

Are we okay with updating the detection logic to include both is_file_like and the check for "\n"? is_file_like doesn't help differentiate between raw xml input and a url so that's where checking for "\n" could be helpful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use is_url to detect urls.

@rmhowe425 rmhowe425 requested a review from mroeschke June 27, 2023 00:26
@rmhowe425
Copy link
Contributor Author

@mroeschke Pinging on green! I added multiple checks from common.py rather than just checking for \n 😅

pandas/io/xml.py Outdated
@@ -1119,6 +1127,22 @@ def read_xml(
"""
check_dtype_backend(dtype_backend)

if not any(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would want to check if the argument is a string and not any of these things.

For example, pd.read_xml(123) should not give a deprecation warning

@rmhowe425
Copy link
Contributor Author

@mroeschke Pinging on green

pandas/io/xml.py Outdated
@@ -1127,7 +1127,7 @@ def read_xml(
"""
check_dtype_backend(dtype_backend)

if not any(
if isinstance(path_or_buffer, str) and not any(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply for organization and not a must, can we move this warning logic to the next called method _parse written just before this main read_xml call? This moves it away from the docs lines. Also, in deprecated warning in docs, consider changing html to xml, two different types of documents.

Also, other unit tests other than your new one which reads literal strings are failing with this warning. You may have to check for Pathlike objects as well as string. Be sure to run pytest locally before pushing commit!

Copy link
Contributor Author

@rmhowe425 rmhowe425 Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke the documentation discrepancy is fixed here. Also, unit tests are passing and CI tests are green. Pathlike objects are already being checked in my conditional statement in xml.py

Not sure how you feel about moving the warning logic to _parse. This isn't something that we did in similar PRs. Not sure it matters?

Otherwise, I think this PR is ready for inspection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed this warning would be better suited in _parse as well.

@rmhowe425 rmhowe425 requested a review from mroeschke July 9, 2023 16:03
@rmhowe425
Copy link
Contributor Author

@mroeschke warning logic moved to _parse. Pinging on green

@mroeschke mroeschke added this to the 2.1 milestone Jul 11, 2023
@mroeschke mroeschke merged commit c68449a into pandas-dev:main Jul 11, 2023
36 checks passed
@mroeschke
Copy link
Member

Thanks again @rmhowe425

@rmhowe425 rmhowe425 deleted the dev/depr/literal-str-read_xml branch February 17, 2024 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas IO XML read_xml, to_xml
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DEPR]: Remove literal string/bytes input from read_excel, read_html, and read_xml
3 participants