New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: Deprecate from_items #18529

Merged
merged 6 commits into from Jan 31, 2018

Conversation

Projects
None yet
4 participants
@reidy-p
Contributor

reidy-p commented Nov 27, 2017

  • closes #17320
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@jreback jreback added the Deprecate label Nov 27, 2017

@jreback

This comment has been minimized.

Contributor

jreback commented Nov 27, 2017

@reidy-p can you look at the original issue; do we have a replacement for .from_items(...., orient=....); I guess this is the same issue as .from_dict(...., orient=...)

suppose we could just add an optional orient= kwarg to DataFrame() to alleviate these cases.

@jorisvandenbossche @TomAugspurger

since we are doing pretty much everything with the main constructors and not the from_* constructors (ship has sailed as far as that). Then for consistency this makes sense.

@jorisvandenbossche

This comment has been minimized.

Member

jorisvandenbossche commented Nov 27, 2017

-1 on adding even more to the main DataFrame constructor. The ship has not fully sailed since from_dict and from_items are effectively giving functionality not available in the DataFrame(..).

If there is no actual alternative for .from_items(.., orient=..), then for me that sounds as a reason to not deprecate this one.

Although I suppose you can do it with from_dict(dict(..), orient=..) ?

@jorisvandenbossche

This comment has been minimized.

Member

jorisvandenbossche commented Nov 27, 2017

cc @wesm as you recently posted an issue related to from_items

@reidy-p

This comment has been minimized.

Contributor

reidy-p commented Nov 27, 2017

Yeah, you can use from_dict(.., orient=) to achieve the same results as from_items(.., orient=) or you could just use DataFrame():

In [1]: DataFrame.from_items([('A', [1, 2]), ('B', [3, 4])], columns=['col1', 'col2'], orient='index')
Out[1]:    
    col1  col2
A     1     2
B     3     4

In [2]: DataFrame([[1, 2], [3, 4]], columns=['col1', 'col2'], index=['A', 'B'])
Out[2]:
    col1  col2
A     1     2
B     3     4

In [3]: df = DataFrame.from_dict(dict([('A', [1, 2]), ('B', [3, 4])]), orient='index')
In [4]: df.columns = ['col1', 'col2'] # from_dict has no columns argument
Out[4]: df
    col1  col2
A     1     2
B     3     4

This PR also seems to be failing a lot of tests because when I replaced from_items with DataFrame(dict()) in some tests (e.g., tests/io/test_stata.py) the order of the columns changes. But the tests all pass on my local branch for some reason. I will investigate it a bit more.

@codecov

This comment has been minimized.

codecov bot commented Nov 28, 2017

Codecov Report

Merging #18529 into master will decrease coverage by 0.01%.
The diff coverage is 83.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18529      +/-   ##
==========================================
- Coverage   91.35%   91.33%   -0.02%     
==========================================
  Files         164      164              
  Lines       49802    49804       +2     
==========================================
- Hits        45496    45489       -7     
- Misses       4306     4315       +9
Flag Coverage Δ
#multiple 89.13% <83.33%> (ø) ⬆️
#single 40.81% <16.66%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/frame.py 97.81% <100%> (-0.1%) ⬇️
pandas/io/stata.py 93.71% <80%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2a0e54b...80dcec6. Read the comment docs.

@codecov

This comment has been minimized.

codecov bot commented Nov 28, 2017

Codecov Report

Merging #18529 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18529      +/-   ##
==========================================
- Coverage   91.62%   91.61%   -0.01%     
==========================================
  Files         150      150              
  Lines       48724    48725       +1     
==========================================
- Hits        44642    44640       -2     
- Misses       4082     4085       +3
Flag Coverage Δ
#multiple 89.98% <100%> (-0.01%) ⬇️
#single 41.74% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/frame.py 97.42% <100%> (-0.16%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 238499a...1838f65. Read the comment docs.

@jreback

This comment has been minimized.

Contributor

jreback commented Jan 21, 2018

if we can deprecate .from_items and recommend .from_dict would be great. You might want to recomment DataFrame.from_dict(OrderedDict(...)) for preserving order (and do this in tests).

("FloatCol", [1.25, 2.25, 1.83, 1.92, 0.0000000005]),
("BoolCol", [True, False, True, True, False]),
("StrCol", [1, 2, 3, 4, 5]),
expected = DataFrame.from_dict(OrderedDict({

This comment has been minimized.

@reidy-p

reidy-p Jan 24, 2018

Contributor

I need to change the contents of the OrderedDict from a dict to a list of tuples to ensure that order is maintained to stop the tests failing

@@ -1242,6 +1242,11 @@ def to_records(self, index=True, convert_datetime64=True):
@classmethod
def from_items(cls, items, columns=None, orient='columns'):
"""
DEPRECATED: from_items is deprecated and will be removed in a

This comment has been minimized.

@jreback

jreback Jan 25, 2018

Contributor

we no longer use DEPRECATED (it will fail the linter), instead use ..deprecated (the sphinx directive)

@@ -363,12 +363,12 @@ def test_reader_converters(self):
basename = 'test_converters'
expected = DataFrame.from_items([

This comment has been minimized.

@jreback

jreback Jan 25, 2018

Contributor

feel free to change some / most constructions with from items entirely (rather than catch the deprecation warning)

@@ -1256,13 +1283,13 @@ def test_constructor_column_duplicates(self):
tm.assert_frame_equal(df, edf)
idf = DataFrame.from_items(
[('a', [8]), ('a', [5])], columns=['a', 'a'])
idf = DataFrame.from_records([(8, 5)],

This comment has been minimized.

@reidy-p

reidy-p Jan 25, 2018

Contributor

It seems that some of the dataframe constructors don't allow duplicated columns while others do so I had to change the from_items in this test to from_records and from_dict(OrderedDict()) to get the test passing. But I'm not sure if it still tests for the original issue correctly (#2079)

All of the other tests in this file using from_items are directly testing from_items so I left the check for the deprecation warning rather than trying to replace from_items with a different constructor.

@jreback jreback added this to the 0.23.0 milestone Jan 31, 2018

@alefnula alefnula referenced this pull request Jan 31, 2018

Open

ENH: DataFrame.from_xy methods are duplicates #4916

2 of 4 tasks complete

@jreback jreback merged commit fb3b237 into pandas-dev:master Jan 31, 2018

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jreback

This comment has been minimized.

Contributor

jreback commented Jan 31, 2018

thanks!

@jsexauer jsexauer referenced this pull request Jan 31, 2018

Open

DEPR: deprecations from prior versions #6581

0 of 85 tasks complete

@reidy-p reidy-p deleted the reidy-p:from_items_depr branch Feb 2, 2018

@jreback

This comment has been minimized.

Contributor

jreback commented Feb 6, 2018

at least 1 deprecation warnings are still showing, can you convert (and see if any more)

pandas/tests/reshape/test_reshape.py::TestGetDummies::()::test_get_dummies_dont_sparsify_all_columns[True]
  C:\projects\pandas\pandas\tests\reshape\test_reshape.py:460: FutureWarning: from_items is deprecated. Please use DataFrame.from_dict(dict()) instead. DataFrame.from_dict(OrderedDict()) may be used to preserve the key order.
    df = DataFrame.from_items([('GDP', [1, 2]), ('Nation', ['AB', 'CD'])])
@jorisvandenbossche

This comment has been minimized.

Member

jorisvandenbossche commented Feb 19, 2018

We have an example in the documentation about from_items: http://pandas-docs.github.io/pandas-docs-travis/dsintro.html#alternate-constructors
The first one is easy to change to from_dict (or even just DataFrame()), but for the second with orient='index' I don't see a simple alternative:

In [55]: pd.DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6])],
   ....:                         orient='index', columns=['one', 'two', 'three'])
   ....: 
Out[55]: 
   one  two  three
A    1    2      3
B    4    5      6

Also, the deprecation warning in that case is not really correct, as simply changing to from_dict(dict(..)) does not work:

In [50]: pd.DataFrame.from_dict(dict([('A', [1, 2, 3]), ('B', [4, 5, 6])]), orient='index', columns=['one', 'two', 'three'])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-50-e431c313820a> in <module>()
----> 1 pd.DataFrame.from_dict(dict([('A', [1, 2, 3]), ('B', [4, 5, 6])]), orient='index', columns=['one', 'two', 'three'])

TypeError: from_dict() got an unexpected keyword argument 'columns'

Should we add a columns keyword for this to from_dict ?

@reidy-p

This comment has been minimized.

Contributor

reidy-p commented Feb 19, 2018

@jorisvandenbossche yes, these are good points. As I show above, it is possible to recreate the functionality of from_items(..., orient='index', columns=['one', 'two', 'three']) without from_items but it's not obvious how to do this from the deprecation warning. So I would be in favour of adding a columns keyword to from_dict if possible. from_dict only has three parameters at present anyway.

@jorisvandenbossche

This comment has been minimized.

Member

jorisvandenbossche commented Feb 20, 2018

I think it would be rather easy to add a columns keyword to from_dict (it is basically passing this through, which it is already doing but with a hardcoded value of None).
That would make this usecase of from_items easier to replace, so I think worth it. @jreback thoughts?

@reidy-p would you like to do a PR for this?

@jreback

This comment has been minimized.

Contributor

jreback commented Feb 20, 2018

that sounds reasonable to me, adding a kwarg to .from_dict()

@reidy-p

This comment has been minimized.

Contributor

reidy-p commented Feb 20, 2018

@jorisvandenbossche yes I’ll try and do a PR

@reidy-p reidy-p referenced this pull request Feb 20, 2018

Merged

ENH: Add columns parameter to from_dict #19802

4 of 4 tasks complete

harisbal pushed a commit to harisbal/pandas that referenced this pull request Feb 28, 2018

@PatrickDRusk

This comment has been minimized.

PatrickDRusk commented Sep 14, 2018

I haven't seen any mention that the from_dict(OrderedDict(items)... doesn't work in cases where there would be duplicates in the index. Here is a test that would fail:

import pandas
from collections import OrderedDict
def test_from_dict_replacing_from_items_with_duplicates():
    rows = [(1, (2,)), (1, (2,))]
    df1 = pandas.DataFrame.from_items(rows, columns=('a', ), orient='index')
    df2 = pandas.DataFrame.from_dict(OrderedDict(rows), columns=('a', ), orient='index')
    pandas.testing.assert_frame_equal(df1, df2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment