Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.from_dict multilevel dicts with datetime.date objects as keys sets the values as NaN in the DataFrame #19993

Closed
Tracked by #7
hodossy opened this issue Mar 5, 2018 · 11 comments · Fixed by #52913
Assignees
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions

Comments

@hodossy
Copy link

hodossy commented Mar 5, 2018

Code Sample, from Jupyter notebook

import pandas as pd
import datetime

example_multilevel_dict = {
  (0, datetime.date(2018, 3, 3)): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20},
  (0, datetime.date(2018, 3, 4)): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20},
  (1, datetime.date(2018, 3, 3)): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20},
  (1, datetime.date(2018, 3, 4)): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20}
}

# MultiIndex rows are required, but this give multilevel columns
multicolumn_df = pd.DataFrame.from_dict(example_multilevel_dict)
print(multicolumn_df)

# but this is what 'orient' is for, however all values are NaN
multiindex_df = pd.DataFrame.from_dict(example_multilevel_dict, orient='index')
print(multiindex_df)

# but when you transpose the multicolumn_df, you get it right
print(multicolumn_df.T)

# This is not the case when the tuple contains strings instead of dates
ok_multilevel_dict = {
  (0, 'a'): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20},
  (0, 'b'): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20},
  (1, 'a'): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20},
  (1, 'b'): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20}
}
print(pd.DataFrame.from_dict(ok_multilevel_dict , orient='index'))

Output:

           0                     1           
  2018-03-03 2018-03-04 2018-03-03 2018-03-04
A          5          5          5          5
B       1344       1344       1344       1344
C          0          0          0          0
D         48         48         48         48
E         20         20         20         20
               A   B   C   D   E
0 2018-03-03 NaN NaN NaN NaN NaN
  2018-03-04 NaN NaN NaN NaN NaN
1 2018-03-03 NaN NaN NaN NaN NaN
  2018-03-04 NaN NaN NaN NaN NaN
              A     B  C   D   E
0 2018-03-03  5  1344  0  48  20
  2018-03-04  5  1344  0  48  20
1 2018-03-03  5  1344  0  48  20
  2018-03-04  5  1344  0  48  20

Problem description

When constructing a DataFrame from a dictionary with date objects from the datetime library, values are set to NaN when using orient='index'. See examples.

Please note that I suppose that using orient='index' should be equal with a transpose.

Expected Output

print(pd.DataFrame.from_dict(example_multilevel_dict , orient='index'))

              A     B  C   D   E
0 2018-03-03  5  1344  0  48  20
  2018-03-04  5  1344  0  48  20
1 2018-03-03  5  1344  0  48  20
  2018-03-04  5  1344  0  48  20

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 28.8.0
Cython: None
numpy: 1.14.1
scipy: None
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Mar 8, 2018

datetime.date are not a first class type and you end up with edge cases. If you look at the transposed version these are converted to a DatetimeIndex.

This code path in .from_dict does not have much coverage and is likely not handling the conversion keys as much as the 'regular' path. You are welcome to have a look.

@jreback jreback added Bug Timeseries Reshaping Concat, Merge/Join, Stack/Unstack, Explode Difficulty Intermediate labels Mar 8, 2018
@jreback jreback added this to the Next Major Release milestone Mar 8, 2018
@williamkhshea
Copy link

williamkhshea commented Mar 24, 2018

@jreback The issue is that when arrays is passed into _arrays_to_mgr the datetime.date in index is not evaluated, while extract_index returns a MultiIndex in which the datetime.date got parsed into Timestamp and is eventually used to match against the arrays, therefore the array cannot be matched based on the index extracted.

screen shot 2018-03-24 at 4 17 41 pm

One way to solve the issue is to parse the keys in arrays into Timestamp first in _arrays_to_mgr before any further operations, but I am not sure whether it is the best way to do it. Any opinions?

@jreback
Copy link
Contributor

jreback commented Mar 24, 2018

not sure of the beat place; see how the from_dict path is handled - this should be similar

@williamkhshea
Copy link

williamkhshea commented Mar 24, 2018

@jreback Wend down the from_dict path rabbit hold and discovered that when MultiIndex is constructed, it convert any datetimelike values, while Index construction doesn't do that. Not sure why the difference in behaviour.

values = maybe_infer_to_datetimelike(values, convert_dates=True)

Setting convert_dates = False fixed this particular issue, though I am not sure if there are any implications in other area.

@jreback
Copy link
Contributor

jreback commented Mar 25, 2018

@williamkhshea you can't change that as this will break a number of existing behavior. The point is that datetime.date are not treated as first class and converted almost immediately. I think a more straightfroward way to to do this is like .from_dict convert the keys.

@williamkhshea
Copy link

@jreback You can't simply convert the keys in .from_dict as single level dict is still using datetime to match.

Index in single level dict
screen shot 2018-03-26 at 6 15 09 am

MultiIndex in multilevel dict
screen shot 2018-03-26 at 6 15 55 am

If datetime.date is not treated as first class I think the way to go is to convert the datetime.date in single level dict when it is constructed and convert the datetime.date keys in .from_dict.

@carlosdeoncedos
Copy link

carlosdeoncedos commented Oct 10, 2018

I have a similar issue, using Jupyter Notebook. It's seems to be related to @hodossy post.

import pandas as pd
import datetime

concept_list = ['a', 'b', 'c']
date_list = [datetime.date(2018, 10, 8), datetime.date(2018, 10, 9), 
             datetime.date(2018, 10, 10)]

dictionary = {'John': {'a': {datetime.date(2018, 10, 8): 1, datetime.date(2018, 10, 9): 1,
                         datetime.date(2018, 10, 10): 1}, 
                 'b': {datetime.date(2018, 10, 8): 2,datetime.date(2018, 10, 9): 2, 
                         datetime.date(2018, 10, 10): 2}, 
                 'c': {datetime.date(2018, 10, 8): 3, datetime.date(2018, 10, 9): 3,
                         datetime.date(2018, 10, 10): 3}},
        'Alice': {'a': {datetime.date(2018, 10, 8): 1, datetime.date(2018, 10, 9): 1,
                         datetime.date(2018, 10, 10): 1}, 
                 'b': {datetime.date(2018, 10, 8): 2, datetime.date(2018, 10, 9): 2, 
                         datetime.date(2018, 10, 10): 2}, 
                 'c':{datetime.date(2018, 10, 8): 3, datetime.date(2018, 10, 9): 3,
                         datetime.date(2018, 10, 10): 3}},
        'Rupert': {'a': {datetime.date(2018, 10, 8): 1, datetime.date(2018, 10, 9): 1,
                         datetime.date(2018, 10, 10): 1}, 
                  'b': {datetime.date(2018, 10, 8): 2, datetime.date(2018, 10, 9): 2, 
                         datetime.date(2018, 10, 10): 2}, 
                  'c':{datetime.date(2018, 10, 8): 3, datetime.date(2018, 10, 9): 3,
                         datetime.date(2018, 10, 10): 3}}}

index_list = [concept_list, date_list]
index = pd.MultiIndex.from_product(index_list, names = ['Concept', 'Date'])

df = pd.DataFrame(dictonary, index = index)
df

The output:

                          John    Alice    Rupert
Concept  Date
a        2018-10-08        NaN     NaN      NaN
         2018-10-09        NaN     NaN      NaN
         2018-10-10        NaN     NaN      NaN

b        2018-10-08        NaN     NaN      NaN
         2018-10-09        NaN     NaN      NaN
         2018-10-10        NaN     NaN      NaN

c        2018-10-08        NaN     NaN      NaN
         2018-10-09        NaN     NaN      NaN
         2018-10-10        NaN     NaN      NaN

Expected output:

                         John  Alice  Rupert
Concept  Date
a        2018-10-08        1     1      1
         2018-10-09        2     2      2
         2018-10-10        3     3      3

b        2018-10-08        1     1      1
         2018-10-09        2     2      2
         2018-10-10        3     3      3

c        2018-10-08        1     1      1
         2018-10-09        2     2      2
         2018-10-10        3     3      3

@afzalhasn
Copy link

afzalhasn commented Nov 11, 2019

My solution is Below.
Is this approach is okay or not?

example_multilevel_dict = {
(0, str(datetime.date(2018, 3, 3))): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20},
(0, str(datetime.date(2018, 3, 4))): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20},
(1, str(datetime.date(2018, 3, 3))): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20},
(1, str(datetime.date(2018, 3, 4))): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20}
}

pd.DataFrame.from_dict(example_multilevel_dict,orient='index')

OUTPUT:
A B C D E
0 2018-03-03 5 1344 0 48 20
2018-03-04 5 1344 0 48 20
1 2018-03-03 5 1344 0 48 20
2018-03-04 5 1344 0 48 20

@afzalhasn
Copy link

Screenshot 2019-11-11 at 11 35 29 PM

@mroeschke mroeschke changed the title DataFrame.from_dict multilevel dicts with datetime objects as keys sets the values as NaN in the DataFrame DataFrame.from_dict multilevel dicts with datetime.date objects as keys sets the values as NaN in the DataFrame Apr 1, 2020
@mroeschke mroeschke added Constructors Series/DataFrame/Index/pd.array Constructors and removed Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Apr 1, 2020
@mroeschke
Copy link
Member

This looks to work on master now. Could use a test

In [33]: print(pd.DataFrame.from_dict(example_multilevel_dict , orient='index'))
              A     B  C   D   E
0 2018-03-03  5  1344  0  48  20
  2018-03-04  5  1344  0  48  20
1 2018-03-03  5  1344  0  48  20
  2018-03-04  5  1344  0  48  20

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Constructors Series/DataFrame/Index/pd.array Constructors Timeseries labels Jun 19, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@shteken
Copy link
Contributor

shteken commented Apr 25, 2023

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants