Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: Deprecate from_items #18529

Merged
merged 6 commits into from
Jan 31, 2018
Merged

DEPR: Deprecate from_items #18529

merged 6 commits into from
Jan 31, 2018

Conversation

reidy-p
Copy link
Contributor

@reidy-p reidy-p commented Nov 27, 2017

@jreback jreback added the Deprecate Functionality to remove in pandas label Nov 27, 2017
@jreback
Copy link
Contributor

jreback commented Nov 27, 2017

@reidy-p can you look at the original issue; do we have a replacement for .from_items(...., orient=....); I guess this is the same issue as .from_dict(...., orient=...)

suppose we could just add an optional orient= kwarg to DataFrame() to alleviate these cases.

@jorisvandenbossche @TomAugspurger

since we are doing pretty much everything with the main constructors and not the from_* constructors (ship has sailed as far as that). Then for consistency this makes sense.

@jorisvandenbossche
Copy link
Member

-1 on adding even more to the main DataFrame constructor. The ship has not fully sailed since from_dict and from_items are effectively giving functionality not available in the DataFrame(..).

If there is no actual alternative for .from_items(.., orient=..), then for me that sounds as a reason to not deprecate this one.

Although I suppose you can do it with from_dict(dict(..), orient=..) ?

@jorisvandenbossche
Copy link
Member

cc @wesm as you recently posted an issue related to from_items

@reidy-p
Copy link
Contributor Author

reidy-p commented Nov 27, 2017

Yeah, you can use from_dict(.., orient=) to achieve the same results as from_items(.., orient=) or you could just use DataFrame():

In [1]: DataFrame.from_items([('A', [1, 2]), ('B', [3, 4])], columns=['col1', 'col2'], orient='index')
Out[1]:    
    col1  col2
A     1     2
B     3     4

In [2]: DataFrame([[1, 2], [3, 4]], columns=['col1', 'col2'], index=['A', 'B'])
Out[2]:
    col1  col2
A     1     2
B     3     4

In [3]: df = DataFrame.from_dict(dict([('A', [1, 2]), ('B', [3, 4])]), orient='index')
In [4]: df.columns = ['col1', 'col2'] # from_dict has no columns argument
Out[4]: df
    col1  col2
A     1     2
B     3     4

This PR also seems to be failing a lot of tests because when I replaced from_items with DataFrame(dict()) in some tests (e.g., tests/io/test_stata.py) the order of the columns changes. But the tests all pass on my local branch for some reason. I will investigate it a bit more.

@reidy-p reidy-p force-pushed the from_items_depr branch 2 times, most recently from 2d4e46b to 80dcec6 Compare November 28, 2017 13:37
@codecov
Copy link

codecov bot commented Nov 28, 2017

Codecov Report

Merging #18529 into master will decrease coverage by 0.01%.
The diff coverage is 83.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18529      +/-   ##
==========================================
- Coverage   91.35%   91.33%   -0.02%     
==========================================
  Files         164      164              
  Lines       49802    49804       +2     
==========================================
- Hits        45496    45489       -7     
- Misses       4306     4315       +9
Flag Coverage Δ
#multiple 89.13% <83.33%> (ø) ⬆️
#single 40.81% <16.66%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/frame.py 97.81% <100%> (-0.1%) ⬇️
pandas/io/stata.py 93.71% <80%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2a0e54b...80dcec6. Read the comment docs.

@codecov
Copy link

codecov bot commented Nov 28, 2017

Codecov Report

Merging #18529 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18529      +/-   ##
==========================================
- Coverage   91.62%   91.61%   -0.01%     
==========================================
  Files         150      150              
  Lines       48724    48725       +1     
==========================================
- Hits        44642    44640       -2     
- Misses       4082     4085       +3
Flag Coverage Δ
#multiple 89.98% <100%> (-0.01%) ⬇️
#single 41.74% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/frame.py 97.42% <100%> (-0.16%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 238499a...1838f65. Read the comment docs.

@jreback
Copy link
Contributor

jreback commented Jan 21, 2018

if we can deprecate .from_items and recommend .from_dict would be great. You might want to recomment DataFrame.from_dict(OrderedDict(...)) for preserving order (and do this in tests).

("FloatCol", [1.25, 2.25, 1.83, 1.92, 0.0000000005]),
("BoolCol", [True, False, True, True, False]),
("StrCol", [1, 2, 3, 4, 5]),
expected = DataFrame.from_dict(OrderedDict({
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to change the contents of the OrderedDict from a dict to a list of tuples to ensure that order is maintained to stop the tests failing

@@ -1242,6 +1242,11 @@ def to_records(self, index=True, convert_datetime64=True):
@classmethod
def from_items(cls, items, columns=None, orient='columns'):
"""
DEPRECATED: from_items is deprecated and will be removed in a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we no longer use DEPRECATED (it will fail the linter), instead use ..deprecated (the sphinx directive)

@@ -363,12 +363,12 @@ def test_reader_converters(self):

basename = 'test_converters'

expected = DataFrame.from_items([
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel free to change some / most constructions with from items entirely (rather than catch the deprecation warning)

@@ -1256,13 +1283,13 @@ def test_constructor_column_duplicates(self):

tm.assert_frame_equal(df, edf)

idf = DataFrame.from_items(
[('a', [8]), ('a', [5])], columns=['a', 'a'])
idf = DataFrame.from_records([(8, 5)],
Copy link
Contributor Author

@reidy-p reidy-p Jan 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that some of the dataframe constructors don't allow duplicated columns while others do so I had to change the from_items in this test to from_records and from_dict(OrderedDict()) to get the test passing. But I'm not sure if it still tests for the original issue correctly (#2079)

All of the other tests in this file using from_items are directly testing from_items so I left the check for the deprecation warning rather than trying to replace from_items with a different constructor.

@jreback jreback added this to the 0.23.0 milestone Jan 31, 2018
@jreback jreback merged commit fb3b237 into pandas-dev:master Jan 31, 2018
@jreback
Copy link
Contributor

jreback commented Jan 31, 2018

thanks!

@reidy-p reidy-p deleted the from_items_depr branch February 2, 2018 22:44
@jreback
Copy link
Contributor

jreback commented Feb 6, 2018

at least 1 deprecation warnings are still showing, can you convert (and see if any more)

pandas/tests/reshape/test_reshape.py::TestGetDummies::()::test_get_dummies_dont_sparsify_all_columns[True]
  C:\projects\pandas\pandas\tests\reshape\test_reshape.py:460: FutureWarning: from_items is deprecated. Please use DataFrame.from_dict(dict()) instead. DataFrame.from_dict(OrderedDict()) may be used to preserve the key order.
    df = DataFrame.from_items([('GDP', [1, 2]), ('Nation', ['AB', 'CD'])])

@jorisvandenbossche
Copy link
Member

We have an example in the documentation about from_items: http://pandas-docs.github.io/pandas-docs-travis/dsintro.html#alternate-constructors
The first one is easy to change to from_dict (or even just DataFrame()), but for the second with orient='index' I don't see a simple alternative:

In [55]: pd.DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6])],
   ....:                         orient='index', columns=['one', 'two', 'three'])
   ....: 
Out[55]: 
   one  two  three
A    1    2      3
B    4    5      6

Also, the deprecation warning in that case is not really correct, as simply changing to from_dict(dict(..)) does not work:

In [50]: pd.DataFrame.from_dict(dict([('A', [1, 2, 3]), ('B', [4, 5, 6])]), orient='index', columns=['one', 'two', 'three'])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-50-e431c313820a> in <module>()
----> 1 pd.DataFrame.from_dict(dict([('A', [1, 2, 3]), ('B', [4, 5, 6])]), orient='index', columns=['one', 'two', 'three'])

TypeError: from_dict() got an unexpected keyword argument 'columns'

Should we add a columns keyword for this to from_dict ?

@reidy-p
Copy link
Contributor Author

reidy-p commented Feb 19, 2018

@jorisvandenbossche yes, these are good points. As I show above, it is possible to recreate the functionality of from_items(..., orient='index', columns=['one', 'two', 'three']) without from_items but it's not obvious how to do this from the deprecation warning. So I would be in favour of adding a columns keyword to from_dict if possible. from_dict only has three parameters at present anyway.

@jorisvandenbossche
Copy link
Member

I think it would be rather easy to add a columns keyword to from_dict (it is basically passing this through, which it is already doing but with a hardcoded value of None).
That would make this usecase of from_items easier to replace, so I think worth it. @jreback thoughts?

@reidy-p would you like to do a PR for this?

@jreback
Copy link
Contributor

jreback commented Feb 20, 2018

that sounds reasonable to me, adding a kwarg to .from_dict()

@reidy-p
Copy link
Contributor Author

reidy-p commented Feb 20, 2018

@jorisvandenbossche yes I’ll try and do a PR

@PatrickDRusk
Copy link

I haven't seen any mention that the from_dict(OrderedDict(items)... doesn't work in cases where there would be duplicates in the index. Here is a test that would fail:

import pandas
from collections import OrderedDict
def test_from_dict_replacing_from_items_with_duplicates():
    rows = [(1, (2,)), (1, (2,))]
    df1 = pandas.DataFrame.from_items(rows, columns=('a', ), orient='index')
    df2 = pandas.DataFrame.from_dict(OrderedDict(rows), columns=('a', ), orient='index')
    pandas.testing.assert_frame_equal(df1, df2)

karenc added a commit to WildMeOrg/wildbook-ia that referenced this pull request Jul 22, 2020
When running the test `wbia/algo/graph/refresh.py::demo_refresh:0`:

```
Traceback (most recent call last):
  File "/virtualenv/env3/lib/python3.6/site-packages/xdoctest/doctest_example.py", line 556, in run
    exec(code, test_globals)
  File "<doctest:/wbia/wildbook-ia/wbia/algo/graph/refresh.py::demo_refresh:0>", line rel: 3, abs: 218, in <module>
    >>> demo_refresh()
  File "/wbia/wildbook-ia/wbia/algo/graph/refresh.py", line 232, in demo_refresh
    ys = infr.match_state_df(edges)[POSTV].values
  File "/wbia/wildbook-ia/wbia/algo/graph/mixin_groundtruth.py", line 56, in match_state_df
    match_state_df = pd.DataFrame.from_items(
AttributeError: type object 'DataFrame' has no attribute 'from_items'
```

Looking at the panda changelog, it seems `from_items` was removed in
`v1.0.0` and replaced by `from_dict`.

See pandas-dev/pandas#18529.
bluemellophone pushed a commit to WildMeOrg/wildbook-ia that referenced this pull request Jul 24, 2020
When running the test `wbia/algo/graph/refresh.py::demo_refresh:0`:

```
Traceback (most recent call last):
  File "/virtualenv/env3/lib/python3.6/site-packages/xdoctest/doctest_example.py", line 556, in run
    exec(code, test_globals)
  File "<doctest:/wbia/wildbook-ia/wbia/algo/graph/refresh.py::demo_refresh:0>", line rel: 3, abs: 218, in <module>
    >>> demo_refresh()
  File "/wbia/wildbook-ia/wbia/algo/graph/refresh.py", line 232, in demo_refresh
    ys = infr.match_state_df(edges)[POSTV].values
  File "/wbia/wildbook-ia/wbia/algo/graph/mixin_groundtruth.py", line 56, in match_state_df
    match_state_df = pd.DataFrame.from_items(
AttributeError: type object 'DataFrame' has no attribute 'from_items'
```

Looking at the panda changelog, it seems `from_items` was removed in
`v1.0.0` and replaced by `from_dict`.

See pandas-dev/pandas#18529.
bluemellophone pushed a commit to WildMeOrg/wildbook-ia that referenced this pull request Aug 11, 2020
When running the test `wbia/algo/graph/refresh.py::demo_refresh:0`:

```
Traceback (most recent call last):
  File "/virtualenv/env3/lib/python3.6/site-packages/xdoctest/doctest_example.py", line 556, in run
    exec(code, test_globals)
  File "<doctest:/wbia/wildbook-ia/wbia/algo/graph/refresh.py::demo_refresh:0>", line rel: 3, abs: 218, in <module>
    >>> demo_refresh()
  File "/wbia/wildbook-ia/wbia/algo/graph/refresh.py", line 232, in demo_refresh
    ys = infr.match_state_df(edges)[POSTV].values
  File "/wbia/wildbook-ia/wbia/algo/graph/mixin_groundtruth.py", line 56, in match_state_df
    match_state_df = pd.DataFrame.from_items(
AttributeError: type object 'DataFrame' has no attribute 'from_items'
```

Looking at the panda changelog, it seems `from_items` was removed in
`v1.0.0` and replaced by `from_dict`.

See pandas-dev/pandas#18529.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: deprecate DataFrame.from_items ?
4 participants