Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: DataFrame from_dict constructor ignores Ordered dict when orient='index' #8425

Closed
aimboden opened this issue Sep 30, 2014 · 8 comments

Comments

@aimboden
Copy link

commented Sep 30, 2014

Hello,
I have been experimenting with OrderedDicts lately, and found a bug with the DataFrame from_dict constructor. Here is a sample code.

import collections
import pandas as pd

firstrow={}
firstrow['foo'] = 'bar'
firstrow['baz'] = 'buzz'

row1 = pd.Series(firstrow)

secondrow={}
secondrow['foo'] = 'bar2'
secondrow['baz'] = 'buzz2'

row2 = pd.Series(secondrow)

roworder = collections.OrderedDict()

roworder['zShould be first'] = row1
roworder['Should be second'] = row2

# Ordering is respected when sorting on columns
df = pd.DataFrame.from_dict(data=roworder, orient='columns')

# But not when sorting on rows
incorrectdf = pd.DataFrame.from_dict(data=roworder, orient='index')
correctdf = df.transpose()

INSTALLED VERSIONS

commit: None
python: 3.3.5.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: fr_CH

pandas: 0.14.1
nose: 1.3.4
Cython: 0.20.1
numpy: 1.9.0
scipy: 0.13.3
statsmodels: 0.5.0
IPython: 2.2.0
sphinx: 1.2.3
patsy: 0.3.0
scikits.timeseries: None
dateutil: 2.2
pytz: 2013.9
bottleneck: None
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.4.0
openpyxl: None
xlrd: 0.9.3
xlwt: None
xlsxwriter: 0.5.7
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.4
pymysql: None
psycopg2: None

@jreback

This comment has been minimized.

Copy link
Contributor

commented Sep 30, 2014

can you make your code runnable (so can simply copy/paste). you have some undefined variables.

@aimboden

This comment has been minimized.

Copy link
Author

commented Sep 30, 2014

Sorry about that! Should be fine now. If not, will check when back in the office tomorrow.

EDIT: the code now reproduces the above mentioned bug

@jreback jreback added this to the 0.16 milestone Oct 1, 2014

@jreback

This comment has been minimized.

Copy link
Contributor

commented Oct 1, 2014

@Gimli510 that does look buggy.

welcome a pull-request to fix.

You can use your test example above, just step thru the code and see where its breaking and try a fix.

@aimboden

This comment has been minimized.

Copy link
Author

commented Oct 2, 2014

@jreback I think I found where the bug comes from.
The function _union_index calls
lib.fast_unique_multiple_list(indexes), which sorts the keys before returning them. Should we carry a flag telling this cython function not to sort the keys when the indexes list was created from an ordered dict? I guess there is a cleaner way to do this, but don't really have any idea about how to go about it.

# Up to this point, the future index is ordered as it should.
indexes = [['zShould be first', 'Should be second'], ['zShould be first', 'Should be second']]
# When indexes is a list with more than 1 items, we hit this path:        
# return Index(lib.fast_unique_multiple_list(indexes))

# However, 
lib.fast_unique_multiple_list(indexes)

returns

['Should be second', 'zShould be first']
@jreback

This comment has been minimized.

Copy link
Contributor

commented Oct 2, 2014

I think this should be handled in core/pandas/frame/extract_index. Need to differentiate between a dict and an OrderedDict.

maybe add in a have_ordered in addition to setting have_dict. Then you can pass this to _union_indexes(indexes,ordered=have_ordered)

Then you can validate that if ordered=True is passed (default is False)
then can do a unique preserving order (so pass the flag into fast_unique_multiple, iow don't sort)

@hamedhsn

This comment has been minimized.

Copy link

commented Sep 29, 2015

@jreback
I have done based on what you said and in the last part how can I pass the flag to fast_unique_multiple because it calls fast_unique_multiple_list(_args, *_kwargs) and when I look at the lib.pyx it always sort the list at the end(uniques.sort())
any idea?

@alichaudry

This comment has been minimized.

Copy link

commented Feb 17, 2017

@jreback is this still an issue in the current version of pandas? I'm seeing the problem on an older version (v0.16.2) and I'm not sure if it's been addressed in the current one.

df = pd.DataFrame.from_dict(ordered_dict_data, orient='index') 

sorts the index alphabetically. I've been using the following hack to address it:

df = pd.DataFrame.from_dict(ordered_dict_data, orient='columns').T

My hack, however, sorts the columns alphabetically.

For the data that I have, it's easier for me to re-order these columns so the latter solution works better. To be precise, my data is an OrderedDict of OrderedDicts so I expect the sort order of both the index and columns to be respected. It looks something like this:

data = OrderedDict(
    'a': OrderedDict('aa': 5, 'bb': 10),
    'b': OrderedDict('aa': 7, 'bb': 14),
    ...)

If it's not fixed, I can take a stab at it.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Jul 6, 2018

Still an open issue.

jzwinck added a commit to jzwinck/pandas that referenced this issue Jul 28, 2018

MAINT: refactor from_items() using from_dict(). Fixes pandas-dev#21850
This removes the deprecation warnings introduced in pandas-dev#18262,
by reimplementing DataFrame.from_items() in the recommended
way using DataFrame.from_dict() and collections.OrderedDict.

This eliminates the maintenance burden of separate code for
from_items(), while allowing existing uses to keep working.

A small cleanup can be done once pandas-dev#8425 is fixed.

jzwinck added a commit to jzwinck/pandas that referenced this issue Jul 28, 2018

MAINT: refactor from_items() using from_dict(). Fixes pandas-dev#21850
This removes the deprecation warnings introduced in pandas-dev#18262,
by reimplementing DataFrame.from_items() in the recommended
way using DataFrame.from_dict() and collections.OrderedDict.

This eliminates the maintenance burden of separate code for
from_items(), while allowing existing uses to keep working.

A small cleanup can be done once pandas-dev#8425 is fixed.

@jzwinck jzwinck referenced this issue Jul 28, 2018

Closed

MAINT: refactor from_items() using from_dict() #22094

4 of 4 tasks complete

jzwinck added a commit to jzwinck/pandas that referenced this issue Jul 28, 2018

MAINT: refactor from_items() using from_dict(). Fixes pandas-dev#21850
This removes the deprecation warnings introduced in pandas-dev#18262,
by reimplementing DataFrame.from_items() in the recommended
way using DataFrame.from_dict() and collections.OrderedDict.

This eliminates the maintenance burden of separate code for
from_items(), while allowing existing uses to keep working.

A small cleanup can be done once pandas-dev#8425 is fixed.

mazayo added a commit to mazayo/pandas that referenced this issue Jun 15, 2019

mazayo added a commit to mazayo/pandas that referenced this issue Jun 16, 2019

mazayo added a commit to mazayo/pandas that referenced this issue Jun 16, 2019

mazayo added a commit to mazayo/pandas that referenced this issue Jun 16, 2019

mazayo added a commit to mazayo/pandas that referenced this issue Jun 16, 2019

mazayo added a commit to mazayo/pandas that referenced this issue Jun 17, 2019

@jreback jreback modified the milestones: Someday, 0.25.0 Jun 21, 2019

@jreback jreback modified the milestones: 0.25.0, Contributions Welcome Jul 3, 2019

jreback added a commit that referenced this issue Jul 8, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.