New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: convert datetime components (year, month, day, ...) in different columns to datetime #8158

Closed
jorisvandenbossche opened this Issue Sep 2, 2014 · 4 comments

Comments

Projects
None yet
4 participants
@jorisvandenbossche
Member

jorisvandenbossche commented Sep 2, 2014

from SO

I didn't find an issue about this, but it has come up some times at stackoverflow: having columns with integers for year, month, day, hour, ..., how do you convert this to a datetime column/index ?

http://stackoverflow.com/questions/19350806/how-to-convert-columns-into-one-datetime-column-in-pandas

You have the typical solution of adding the columns: pd.to_datetime((df['Y']*10000 + df['M']*100 + df['D']).astype('int'), format='%Y%m%d'), and @unutbu added now a faster solution using numpy's different datetime64 resolutions to that question on SO.

I personally think this would be a nice addition to pandas to have a more native solution for this. But then we need to figure out a nice API. Or we keep it as is, but try to document it more (add as example to docs?)

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Sep 2, 2014

Contributor

I think we tried to doc this some where (or cookbook - don't remember offhand)

how about

pd.to_datetime(DataFrame,
format={'Y' : column_year,
'm' : column_month,
'd' : column_day})

format could also be list of these columns I guess as well (this is more powerful though; a list would have to be unambiguously Ymd)

Contributor

jreback commented Sep 2, 2014

I think we tried to doc this some where (or cookbook - don't remember offhand)

how about

pd.to_datetime(DataFrame,
format={'Y' : column_year,
'm' : column_month,
'd' : column_day})

format could also be list of these columns I guess as well (this is more powerful though; a list would have to be unambiguously Ymd)

@unutbu

This comment has been minimized.

Show comment
Hide comment
@unutbu

unutbu Sep 2, 2014

Contributor

How about adding a helper function like

def combine64(years, months=1, days=1, weeks=None, hours=None, minutes=None,
              seconds=None, milliseconds=None, microseconds=None, nanoseconds=None):
    years = np.asarray(years) - 1970
    months = np.asarray(months) - 1
    days = np.asarray(days) - 1
    types = ('<M8[Y]', '<m8[M]', '<m8[D]', '<m8[W]', '<m8[h]',
             '<m8[m]', '<m8[s]', '<m8[ms]', '<m8[us]', '<m8[ns]')
    vals = (years, months, days, weeks, hours, minutes, seconds,
            milliseconds, microseconds, nanoseconds)
    return sum(np.asarray(v, dtype=t) for t, v in zip(types, vals)
               if v is not None)

It would be easier than adding the functionality to pd.to_datetime, and people could apply it wherever they wish. Note that combine64 returns a NumPy array of dtype datetime64[*]. It can be passed to pd.to_datetime to create a DatetimeIndex:

In [196]: combine64(df['Y'], df['M'])
Out[196]: array(['1990-01-01', '1991-02-01', '1992-03-01', '1993-04-01'], dtype='datetime64[D]')

In [197]: pd.to_datetime(combine64(df['Y'], df['M']))
Out[197]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[1990-01-01, ..., 1993-04-01]
Length: 4, Freq: None, Timezone: None
Contributor

unutbu commented Sep 2, 2014

How about adding a helper function like

def combine64(years, months=1, days=1, weeks=None, hours=None, minutes=None,
              seconds=None, milliseconds=None, microseconds=None, nanoseconds=None):
    years = np.asarray(years) - 1970
    months = np.asarray(months) - 1
    days = np.asarray(days) - 1
    types = ('<M8[Y]', '<m8[M]', '<m8[D]', '<m8[W]', '<m8[h]',
             '<m8[m]', '<m8[s]', '<m8[ms]', '<m8[us]', '<m8[ns]')
    vals = (years, months, days, weeks, hours, minutes, seconds,
            milliseconds, microseconds, nanoseconds)
    return sum(np.asarray(v, dtype=t) for t, v in zip(types, vals)
               if v is not None)

It would be easier than adding the functionality to pd.to_datetime, and people could apply it wherever they wish. Note that combine64 returns a NumPy array of dtype datetime64[*]. It can be passed to pd.to_datetime to create a DatetimeIndex:

In [196]: combine64(df['Y'], df['M'])
Out[196]: array(['1990-01-01', '1991-02-01', '1992-03-01', '1993-04-01'], dtype='datetime64[D]')

In [197]: pd.to_datetime(combine64(df['Y'], df['M']))
Out[197]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[1990-01-01, ..., 1993-04-01]
Length: 4, Freq: None, Timezone: None
@shoyer

This comment has been minimized.

Show comment
Hide comment
@shoyer

shoyer Sep 4, 2014

Member

I like @unutbu's combine64 helper function. It seems cleaner than shoving support for format dictionaries into to_datetime.

A couple of thoughts:

  1. The name combine64 is too ambiguous. How about combine_datetime instead?
  2. I think it should return a DatetimeIndex directly, since this is the only fully functional datetime array we have access to in pandas.
Member

shoyer commented Sep 4, 2014

I like @unutbu's combine64 helper function. It seems cleaner than shoving support for format dictionaries into to_datetime.

A couple of thoughts:

  1. The name combine64 is too ambiguous. How about combine_datetime instead?
  2. I think it should return a DatetimeIndex directly, since this is the only fully functional datetime array we have access to in pandas.

@jreback jreback added this to the Next Major Release milestone Oct 20, 2015

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 20, 2015

Contributor

@unutbu want to do a PR for this?

Contributor

jreback commented Oct 20, 2015

@unutbu want to do a PR for this?

@jreback jreback modified the milestones: 0.18.1, Next Major Release Apr 25, 2016

jreback added a commit to jreback/pandas that referenced this issue Apr 25, 2016

@jreback jreback closed this in 59082e9 Apr 26, 2016

nps added a commit to nps/pandas that referenced this issue May 17, 2016

ENH: allow construction of datetimes from columns in a DataFrame
closes pandas-dev#8158

Author: Jeff Reback <jeff@reback.net>

Closes pandas-dev#12967 from jreback/dates and squashes the following commits:

e18c9cc [Jeff Reback] TST: move .to_datetime() tests to new testing class
7dc9406 [Jeff Reback] ENH: allow construction of datetimes from columns in a DataFrame
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment