-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement datetime columns #1646
Labels
cust-goldmansachs
EPIC ⭐
Big task that may encompass many smaller ones
new feature
Feature requests for new functionality
Comments
IMHO, every option is better than the |
Prerequisite: #1396 |
Merged
st-pasha
added a commit
that referenced
this issue
Feb 18, 2021
Created column type `date32` for storing calendar dates. Currently only the following operations are supported: - creation from python `datetime.date` objects; - converting into python; - `repr()`, i.e. the column can be viewed in a console. WIP for #1646
Done. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
cust-goldmansachs
EPIC ⭐
Big task that may encompass many smaller ones
new feature
Feature requests for new functionality
This is a request to new stypes and functions in support of date/time functionality.
Currently, different packages use different formats for dates and times:
Apache Arrow
date[day]
(32bit, days since the Unix epoch)date[ms]
(64bit, milliseconds since the Unix epoch)time[s]
(32bit)time[ms]
(32bit)time[µs]
(64bit)time[ns]
(64bit)timestamp[s]
(64bit)timestamp[ms]
(64bit)timestamp[µs]
(64bit)timestamp[ns]
(64bit)When reading from csv,
pyarrow
parses date/time columns astimestamp[s]
. Timestamps with milliseconds are read as strings. Some common date formats likeMM/DD/YYYY
are not recognized.The arrow<->pandas conversion guide says that all
timestamp[*]
formats are converted topd.Timestamp
(np.datetime64[ns]
), and alldate[*]
become object column withdatetime.date
items.Pandas
In pandas, there are standalone classes such as
pd.Timestamp
,pd.Period
,pd.DateOffset
andpd.Timedelta
, but also column-likeDatetimeIndex
(with dtypedatetime64[ns]
), and also aSeries
with dtypedatetime64[ns]
.When reading from CSV the datetime columns are not parsed, and remain as strings (objects). Conversion can be performed afterwards using
to_datetime()
function.Numpy
In numpy all dates are 64-bit, but with various time units: Y, M, W, D, h, m, s, ms, us, ns, ps, fs, as. All datetimes are based on POSIX time with epoch of 1970-01-01T00:00Z.
In numpy 1.6 the default format when parsing was
datetime64[us]
; since 1.7 the format is selected based on the string.Python
The datetime module supports classes
datetime.date
,datetime.time
,datetime.datetime
anddatetime.timedelta
. All of these are relatively "heavy" objects, each storing multiple fields:.date
has year, month, day; time has hour, min, sec, microsec and timezone; etc.There is also C-equivalent of
datetime
module; there the.date
object is 4+N bytes,.time
is 6+N bytes, and.datetime
is 10+N bytes, where N=17, a per-object overhead.See Also
The text was updated successfully, but these errors were encountered: