Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: pd.read_table #21954

Merged
merged 2 commits into from Aug 2, 2018
Merged

Conversation

dahlbaek
Copy link
Contributor

@dahlbaek dahlbaek commented Jul 17, 2018

pd.read_table is deprecated and replaced by pd.read_csv.

  • closes DEPR: read_table #21948
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@WillAyd
Copy link
Member

WillAyd commented Jul 17, 2018

Initial comments:

  • Needs test(s)
  • Since we are replacing read_table with read_csv and a sep of \tyou should reroute the call to read_table to reflect this
  • Needs whatsnew note

@WillAyd WillAyd added this to the 0.24.0 milestone Jul 17, 2018
@WillAyd WillAyd added IO Data IO issues that don't fit into a more specific label Deprecate Functionality to remove in pandas labels Jul 17, 2018
@dahlbaek dahlbaek force-pushed the depr/read_table branch 2 times, most recently from 529c70f to 4f41cd0 Compare July 17, 2018 21:49
@pep8speaks
Copy link

pep8speaks commented Jul 17, 2018

Hello @dahlbaek! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 02, 2018 at 10:48 Hours UTC

@dahlbaek
Copy link
Contributor Author

Added a whatsnew note.

I'm not sure what to do about tests. pd.read_csv has been around for as long as I've been using pandas, so I would suspect that tests are already plentiful.

I changed the warning message so that it will reflect whether or not the user is supplying a value to sep. The result is now similar to what I proposed in #21948 but without the decorator, as suggested by @gfyoung. Result:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({"a": [1], "b": [2]})
   ...: df.to_csv("test.csv", sep="\t")

In [3]: pd.read_table("test.csv", index_col=0)
/home/dahlbaek/virtualenvs/pandas-dev/bin/ipython:1: FutureWarning: read_table is deprecated, use read_csv with sep='\t' instead.
  #!/home/dahlbaek/virtualenvs/pandas-dev/bin/python
Out[3]: 
   a  b
0  1  2

In [4]: pd.read_table("test.csv", "\t", index_col=0)
/home/dahlbaek/virtualenvs/pandas-dev/bin/ipython:1: FutureWarning: read_table is deprecated, use read_csv instead.
  #!/home/dahlbaek/virtualenvs/pandas-dev/bin/python
Out[4]: 
   a  b
0  1  2

In [5]: pd.read_table("test.csv", sep="\t", index_col=0)
/home/dahlbaek/virtualenvs/pandas-dev/bin/ipython:1: FutureWarning: read_table is deprecated, use read_csv instead.
  #!/home/dahlbaek/virtualenvs/pandas-dev/bin/python
Out[5]: 
   a  b
0  1  2

In [6]: pd.read_table("test.csv", delimiter="\t", index_col=0)
/home/dahlbaek/virtualenvs/pandas-dev/bin/ipython:1: FutureWarning: read_table is deprecated, use read_csv instead.
  #!/home/dahlbaek/virtualenvs/pandas-dev/bin/python
Out[6]: 
   a  b
0  1  2

In [7]: pd.read_csv("test.csv", "\t", index_col=0)
Out[7]: 
   a  b
0  1  2

In [8]: pd.read_csv("test.csv", sep="\t", index_col=0)
Out[8]: 
   a  b
0  1  2

In [9]: pd.read_csv("test.csv", delimiter="\t", index_col=0)
Out[9]: 
   a  b
0  1  2

@dahlbaek dahlbaek force-pushed the depr/read_table branch 2 times, most recently from 3e229e0 to 01b3b0f Compare July 17, 2018 22:03
@gfyoung
Copy link
Member

gfyoung commented Jul 17, 2018

We definitely have tests for read_table in the codebase. Modify them to assert that a warning is produced.

@dahlbaek dahlbaek force-pushed the depr/read_table branch 14 times, most recently from 2a659b6 to 15dfae8 Compare July 18, 2018 13:18
@dahlbaek
Copy link
Contributor Author

dahlbaek commented Jul 18, 2018

@gfyoung I modified the tests, but am unsure how to proceed with the stacklevel parameter. It seems to me stacklevel=2 gives the correct behaviour. Since the warning message is hardcoded into the function body, this will give the user the file/line of the call to the function producing the warning message, so the user can make suitable adjustments. Example:

$ cat test_warn.py 
import pandas as pd

pd.read_table("test.csv", index_col=0)
$ python test_warn.py 
test_warn.py:3: FutureWarning: read_table is deprecated, use read_csv with sep='\t' instead.
  pd.read_table("test.csv", index_col=0)

This is exactly the behaviour I would want as a user. Nevertheless, pandas.util.testing.assert_produces_warning complains that the stacklevel is not set correctly, which causes failed tests.

Please Advise.

@gfyoung
Copy link
Member

gfyoung commented Jul 18, 2018

@dahlbaek : Pass in check_stacklevel=False in your tests.

@dahlbaek
Copy link
Contributor Author

@dahlbaek : Pass in check_stacklevel=False in your tests.

Perfect, that's what I figured.

"instead.",
FutureWarning, stacklevel=2)
if sep is False:
sep = default_sep
Copy link
Member

@gfyoung gfyoung Jul 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this if condition here?

(yes, I see the other changes that you made for passing in default_sep. This question is more asking why the overall change was needed for this deprecation).

Copy link
Contributor Author

@dahlbaek dahlbaek Jul 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user is setting sep or delimiter, I want to suggest using read_csv, but if neither is set, I also want to suggest passing sep='\t'. This was the simplest solution I could come up with which handles the case correctly where the user is passing sep=',' to read_table.

More generally, the if clause is needed because the definition of read_table is entangled with the definition of read_csv. The cleaner solution would be to just have two separate definitions, but that solution was deemed not worth the effort.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm...I see. Okay, fair enough.

@dahlbaek
Copy link
Contributor Author

dahlbaek commented Jul 24, 2018

some comments. can you run the test suite and make sure you are catching all deprecation warnings for read_table.

I'm not seeing any FutureWarnings from read_table, but I am getting the following failed test:

(pandas-dev) $ pwd
/home/dahlbaek/git/pandas/pandas/tests
(pandas-dev) $ pytest test_errors.py
=============================== test session starts ===============================
platform linux -- Python 3.6.6, pytest-3.6.3, py-1.5.4, pluggy-0.6.0
rootdir: /home/dahlbaek/git/pandas, inifile: setup.cfg
plugins: xdist-1.22.2, forked-0.2, cov-2.5.1
collected 12 items                                                                

test_errors.py ..........F.                                                 [100%]

==================================== FAILURES =====================================
________________________________ test_error_rename ________________________________

    def test_error_rename():
        # see gh-12665
        from pandas.errors import ParserError
        from pandas.io.common import CParserError
    
        try:
            raise CParserError()
        except ParserError:
            pass
    
        try:
            raise ParserError()
        except CParserError:
            pass
    
        with catch_warnings(record=True):
            try:
>               raise ParserError()
E               pandas.errors.ParserError

test_errors.py:52: ParserError
======================= 1 failed, 11 passed in 0.07 seconds =======================
(pandas-dev) $ 

Not sure exactly what is happening here…

@gfyoung
Copy link
Member

gfyoung commented Jul 24, 2018

@dahlbaek : That's an odd test failure. Seems to be fine on Travis though. Also, the test logs don't seem to have any deprecation warnings related to read_table AFAICT.

@jreback
Copy link
Contributor

jreback commented Jul 24, 2018

@dahlbaek we don't usually run from the test dir itself, rather from the base dir e.t.
pytest path/to/test. running from the test dir might not find the fixtures.

@jreback
Copy link
Contributor

jreback commented Jul 24, 2018

@jorisvandenbossche @TomAugspurger any objections?

@dahlbaek dahlbaek force-pushed the depr/read_table branch 2 times, most recently from ea4a798 to 5431629 Compare July 25, 2018 00:25
@dahlbaek
Copy link
Contributor Author

@jreback After recreating my virtual environment from scratch, I still get that failed test, even on master, when running from the base dir. Not sure what the cause is, but as pointed out by @gfyoung it seems to be a local problem on my machine.

@jreback
Copy link
Contributor

jreback commented Jul 25, 2018

@dahlbaek see my comments above. The tests are not designed to be run from subdirs, nor are they on our CI. pytest can't find things.

@dahlbaek
Copy link
Contributor Author

dahlbaek commented Jul 25, 2018

@jreback I think I am now running the test the way proposed in the docs, right? I still get the failure (even on master). In any case, this seems unrelated to the current PR. I suppose it is a local phenomenon that I will have to figure out myself…

(pandas-dev) $ pytest pandas/tests/test_errors.py -k test_error_rename
=============================== test session starts ===============================
platform linux -- Python 3.6.6, pytest-3.6.3, py-1.5.4, pluggy-0.6.0
rootdir: /home/dahlbaek/git/pandas, inifile: setup.cfg
plugins: xdist-1.22.3, forked-0.2, cov-2.5.1
collected 12 items / 11 deselected                                                

pandas/tests/test_errors.py F                                               [100%]

==================================== FAILURES =====================================
________________________________ test_error_rename ________________________________

    def test_error_rename():
        # see gh-12665
        from pandas.errors import ParserError
        from pandas.io.common import CParserError
    
        try:
            raise CParserError()
        except ParserError:
            pass
    
        try:
            raise ParserError()
        except CParserError:
            pass
    
        with catch_warnings(record=True):
            try:
>               raise ParserError()
E               pandas.errors.ParserError

pandas/tests/test_errors.py:52: ParserError
===================== 1 failed, 11 deselected in 0.06 seconds =====================
(pandas-dev) $ 

@jreback
Copy link
Contributor

jreback commented Jul 26, 2018

@dahlbaek hmm that looks like a real error

@dahlbaek
Copy link
Contributor Author

@jreback I'll try to look into it when I have time and make a separate issue.

@dahlbaek
Copy link
Contributor Author

dahlbaek commented Aug 1, 2018

@jreback I re-cloned the repo and installed a fresh python3.7 environment. I don't know what was causing the problem, but now it is gone.

@jreback
Copy link
Contributor

jreback commented Aug 1, 2018

@TomAugspurger if you have any issues here

@gfyoung
Copy link
Member

gfyoung commented Aug 1, 2018

@dahlbaek : This looks good to me, though if you could fix the merge conflicts, that would be good.

- pd.read_table is deprecated and replaced by pd.read_csv.

- add whatsnew note

- change tests to test for warning messages

- change DataFrame.from_csv to use pandas.read_csv instead of
  pandas.read_table

- Change pandas.read_clipboard to use pandas.read_csv instead
  of pandas.read_table
@dahlbaek
Copy link
Contributor Author

dahlbaek commented Aug 2, 2018

@gfyoung Done.

@TomAugspurger
Copy link
Contributor

Pushed a doc formatting fix.

Thanks @dahlbaek!

@TomAugspurger TomAugspurger merged commit 83f6be5 into pandas-dev:master Aug 2, 2018
dberenbaum pushed a commit to dberenbaum/pandas that referenced this pull request Aug 3, 2018
* DEPR: pd.read_table

- pd.read_table is deprecated and replaced by pd.read_csv.

- add whatsnew note

- change tests to test for warning messages

- change DataFrame.from_csv to use pandas.read_csv instead of
  pandas.read_table

- Change pandas.read_clipboard to use pandas.read_csv instead
  of pandas.read_table

* Add sep note to whatsnew
minggli added a commit to minggli/pandas that referenced this pull request Aug 5, 2018
* master: (47 commits)
  Run tests in conda build [ci skip] (pandas-dev#22190)
  TST: Check DatetimeIndex.drop on DST boundary (pandas-dev#22165)
  CI: Fix Travis failures due to lint.sh on pandas/core/strings.py (pandas-dev#22184)
  Documentation: typo fixes in MultiIndex / Advanced Indexing (pandas-dev#22179)
  DOC: added .join to 'see also' in Series.str.cat (pandas-dev#22175)
  DOC: updated Series.str.contains see also section (pandas-dev#22176)
  0.23.4 whatsnew (pandas-dev#22177)
  fix: scalar timestamp assignment (pandas-dev#19843) (pandas-dev#19973)
  BUG: Fix get dummies unicode error (pandas-dev#22131)
  Fixed py36-only syntax [ci skip] (pandas-dev#22167)
  DEPR: pd.read_table (pandas-dev#21954)
  DEPR: Removing previously deprecated datetools module (pandas-dev#6581) (pandas-dev#19119)
  BUG: Matplotlib scatter datetime (pandas-dev#22039)
  CLN: Use public method to capture UTC offsets (pandas-dev#22164)
  implement tslibs/src to make tslibs self-contained (pandas-dev#22152)
  Fix categorical from codes nan 21767 (pandas-dev#21775)
  BUG: Better handling of invalid na_option argument for groupby.rank(pandas-dev#22124) (pandas-dev#22125)
  use memoryviews instead of ndarrays (pandas-dev#22147)
  Remove depr. warning in SeriesGroupBy.count (pandas-dev#22155)
  API: Default to_* methods to compression='infer' (pandas-dev#22011)
  ...
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
* DEPR: pd.read_table

- pd.read_table is deprecated and replaced by pd.read_csv.

- add whatsnew note

- change tests to test for warning messages

- change DataFrame.from_csv to use pandas.read_csv instead of
  pandas.read_table

- Change pandas.read_clipboard to use pandas.read_csv instead
  of pandas.read_table

* Add sep note to whatsnew
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DEPR: read_table
6 participants