Datetime dev #87

Merged
merged 34 commits into from Jun 16, 2011

Projects

None yet

3 participants

@mwiebe
NumPy member

This pull request has all the datetime work that's not merged yet, with tags:

datetime-autounit:
Determine the datetime unit from the form of the input object instead of defaulting to microseconds

datetime-promotion:
Unifying promotion of M8,M8 and M8,m8 operations

datetime-meta:
Adding support for a generic unit to the datetime unit metadata

datetime-arange:
Implementing np.arange support for the datetime type

datetime-bday:
Implementing the proposed business day API.

Mark Wiebe added some commits Jun 7, 2011
Mark Wiebe ENH: datetime-autounit: Automatically detect the unit for scalar cons…
…truction
e64b6f1
Mark Wiebe ENH: datetime-autounit: Make 'now' and 'today' only parse with approp…
…riate units

In particular, 'now' needs time-like units, and 'today' needs
date-like units.
6395979
Mark Wiebe ENH: datetime-promotion: Unify datetime/timedelta type promotion
Now it always goes to the more precise unit.
ab2ac7c
Mark Wiebe BUG: datetime: Had int instead of Py_ssize_t for an AsStringAndSize call fdb4190
Mark Wiebe ENH: datetime-arange: Add boilerplate for the specialized datetime_ar…
…ange
487874a
Mark Wiebe ENH: datetime-arange: Filling in the datetime-specific arange function
Here I've realized that default 'microsecond' units isn't very good,
and would like to make a default 'generic' unit instead.
50261be
Mark Wiebe ENH: datetime-meta: Add generic units as a datetime unit type
This allows integers to convert into timedeltas without binding
to a default unit, so that later when it's combined with another
data type it adopts that type instead of overriding it haphazardly.
This makes things generally more intuitive.
d6c63e3
Mark Wiebe ENH: datetime-arange: Use the generic units for parameter conversion 5c16411
Mark Wiebe ENH: datetime-arange: The arange function largely works now 53ab0c1
Mark Wiebe ENH: datetime-arange: Detect the unit when a dtype with generic units…
… is given
98b4c38
Mark Wiebe ENH: datetime-arange: Move the unit metadata promotion to a separate …
…function

This cleans up the implementation of arange a lot, and makes the
promotion rules behave consistently.
dadf6c2
Mark Wiebe ENH: datetime-bday: Remove business days as a datetime metadata unit
The complexity of the operations desired for business days is such
that expressing it as a unit in the datetime doesn't fit naturally.
Instead, an API operating on day-based datetimes appears to be
a superior approach.
c3f963e
Mark Wiebe ENH: datetime-bday: Add datetime_busday.c/.h, start busday_offset fun…
…ction
e24e9d4
Mark Wiebe ENH: datetime-bday: Implement the weekmask part of the busday_offset …
…algorithm
4fced4a
Mark Wiebe ENH: datetime-bday: Connect busday_offset so it can be called from Py…
…thon
8a8a84a
Mark Wiebe ENH: datetime-autounit: Unit detection working with arrays, fix ufunc…
… reductions

The default NPY_DATETIME type was still in microseconds, because it
wasn't using the NPY_DATETIME_DEFAULTUNIT macro as it should have been.
The reduction functions in ufuncs didn't respect the metadata appropriately.
db62c35
Mark Wiebe TST: datetime-bday: Write some tests for busday_offset 29a3ceb
Mark Wiebe ENH: datetime-bday: Functions to get and normalize a list of holidays 3d8fda5
Mark Wiebe ENH: datetime-bday: Get holidays working with busday_offset b90d182
Mark Wiebe ENH: datetime-bday: Create the np.busdaydef business day definition o…
…bject
68f1cf2
Mark Wiebe ENH: datetime-bday: Cache the business day count in the weekmask in b…
…usdaydef
e164d94
Mark Wiebe ENH: datetime-bday: Move the weekmask and holidays list convertors to…
… busdaydef
f7ae666
Mark Wiebe ENH: datetime-bday: Got busday_count function working 0418574
Mark Wiebe ENH: datetime-bday: Got is_busday function working, completed busines…
…s day API
84e1f7d
Mark Wiebe DOC: datetime-bday: Document the datetime business day functions 6b5a42a
@charris charris and 1 other commented on an outdated diff Jun 15, 2011
numpy/add_newdocs.py
@@ -5936,6 +5936,268 @@ add_newdoc('numpy.core.multiarray', 'dtype', ('newbyteorder',
##############################################################################
#
+# Datetime-related Methods
+#
+##############################################################################
+
+add_newdoc('numpy.core.multiarray', 'busdaydef',
+ """
+ busdaydef(weekmask='1111100', holidays=None)
@charris
charris Jun 15, 2011

Hmm, the name could be more descriptive, especially as the return object is given the same name. Maybe work calender in there somehow. busdaycal{ender} or some such.

@mwiebe
mwiebe Jun 15, 2011

Maybe "busdaycalendar" for the class name, and "busdaycal" for the parameter name in the functions.

@charris charris and 1 other commented on an outdated diff Jun 15, 2011
numpy/add_newdocs.py
+ holidays : array_like of datetime64[D]
+ An array of dates which should be blacked out from being considered
+ as business days. They may be specified in any order, and NaT
+ (not-a-time) dates are ignored. Internally, this list is normalized
+ into a form suited for fast business day calculations.
+ busdaydef : busdaydef
+ A `busdaydef` object which specifies the business days. If this
+ parameter is provided, neither weekmask nor holidays may be
+ provided.
+ out : array of bool
+ If provided, this array is filled with the result.
+
+ Returns
+ -------
+ out : array of bool
+ An array containing True for each valid business day, and
@charris
charris Jun 15, 2011

The booleans corresponding to the to the entries in dates?

@mwiebe
mwiebe Jun 15, 2011

Yes, I'll make that more clear.

@charris charris and 1 other commented on an outdated diff Jun 15, 2011
numpy/add_newdocs.py
+
+add_newdoc('numpy.core.multiarray', 'busday_offset',
+ """
+ busday_offset(dates, offsets, roll='raise', weekmask='1111100', holidays=None, busdaydef=None, out=None)
+
+ First adjusts the date to fall on a business day according to
+ the ``roll`` rule, then applies offsets to the given dates
+ counted in business days.
+
+ Parameters
+ ----------
+ dates : array_like of datetime64[D]
+ The array of dates to process.
+ offsets : array_like of integer
+ The array of offsets, which is broadcast with ``dates``.
+ roll : {'raise', 'nat', 'forward', 'following', 'backward', 'preceding', 'modifiedfollowing', 'modifiedpreceding'}
@charris
charris Jun 15, 2011

Not quite sure how to handle the long line.

@mwiebe
mwiebe Jun 15, 2011

Yeah, I'm not so clear on everything about the restructured text, and the warning messages from sphinx provide no help...

@charris charris commented on the diff Jun 15, 2011
numpy/core/include/numpy/ndarraytypes.h
@@ -233,7 +233,6 @@ typedef enum {
NPY_FR_Y, /* Years */
@charris
charris Jun 15, 2011

Could these go in npy_common.h?

@mwiebe
mwiebe Jun 15, 2011

Yeah, that sounds fine. I'm wanting to rename away from the 'frequency' nomenclature as well.

@charris charris commented on the diff Jun 15, 2011
numpy/core/include/numpy/ndarraytypes.h
+ NPY_BUSDAY_PRECEDING = NPY_BUSDAY_BACKWARD,
+ /*
+ * Go forward in time to the following business day, unless it
+ * crosses a month boundary, in which case go backward
+ */
+ NPY_BUSDAY_MODIFIEDFOLLOWING,
+ /*
+ * Go backward in time to the preceding business day, unless it
+ * crosses a month boundary, in which case go forward.
+ */
+ NPY_BUSDAY_MODIFIEDPRECEDING,
+ /* Produce a NaT for non-business days. */
+ NPY_BUSDAY_NAT,
+ /* Raise an exception for non-business days. */
+ NPY_BUSDAY_RAISE
+} NPY_BUSDAY_ROLL;
/*
* This is to typedef npy_intp to the appropriate pointer size for
@charris
charris Jun 15, 2011

I've moved that into npy_common for the sort library. Looks like I'll need to do some merging ;)

@mwiebe
mwiebe Jun 15, 2011

I want to do a relatively invasive header shuffle at some point towards managing future ABI compatibility, we'll have to make sure not to step on each others toes with that.

@charris charris and 1 other commented on an outdated diff Jun 15, 2011
numpy/core/src/multiarray/_datetime.h
@@ -35,6 +47,10 @@ convert_datetimestruct_to_datetime(PyArray_DatetimeMetaData *meta,
const npy_datetimestruct *dts,
npy_datetime *out);
+/* Extracts the month number from a 'datetime64[D]' value */
@charris
charris Jun 15, 2011

Is there an epoch?

@mwiebe
mwiebe Jun 15, 2011

The epoch is currently always January 1, 1970 in the datetime64 dtype.

@charris charris commented on the diff Jun 15, 2011
numpy/core/src/multiarray/_datetime.h
@@ -175,23 +192,48 @@ append_metastr_to_string(PyArray_DatetimeMetaData *meta,
NPY_NO_EXPORT int
get_datetime_iso_8601_strlen(int local, NPY_DATETIMEUNIT base);
-
/*
* Parses (almost) standard ISO 8601 date strings. The differences are:
*
@charris
charris Jun 15, 2011

If there is an official document somewhere, a link could be useful.

@mwiebe
mwiebe Jun 15, 2011

I've just written the business day API documentation so far. With the datetime dtype and API in flux, I didn't want to write too much with a particular point of view until that settled a bit more.

@charris charris commented on the diff Jun 15, 2011
numpy/core/src/multiarray/ctors.c
#include "lowlevel_strided_loops.h"
+#include "_datetime.h"
@charris
charris Jun 15, 2011

What is the reason for the underscore in _datetime.h ?

@mwiebe
mwiebe Jun 15, 2011

That's historical, and seems to be because datetime.h is the CPython header for accessing the Python datetime library objects.

@charris charris and 1 other commented on an outdated diff Jun 15, 2011
numpy/core/src/multiarray/ctors.c
@@ -1694,6 +1690,27 @@ PyArray_FromAny(PyObject *op, PyArray_Descr *newtype, int min_depth,
}
}
}
+ /* Treat datetime generic units with the same idea as flexible strings */
@mwiebe
mwiebe Jun 15, 2011

Sure, I'll add some elaboration in that comment.

@charris charris commented on the diff Jun 15, 2011
numpy/core/src/multiarray/datetime.c
};
-/*
- ====================================================
- }
- == Beginning of section borrowed from mx.DateTime ==
- ====================================================
-*/
-
-/*
- * Functions in the following section are borrowed from mx.DateTime version
- * 2.0.6, and hence this code is subject to the terms of the egenix public
- * license version 1.0.0
- */
@charris
charris Jun 15, 2011

provided, however, that the eGenix.com Public License Agreement is retained in the Software, or in any derivative version of the Software prepared by Licensee.

Do we need to do more?

@mwiebe
mwiebe Jun 15, 2011

I eliminated all the eGenix source code, which is why I removed this. There was some sloppy coding like calculating leap years with floating point, so I just rewrote everything that the eGenix code was being used for.

@charris charris commented on the diff Jun 15, 2011
numpy/core/src/multiarray/datetime.c
@@ -1101,6 +1128,15 @@ parse_datetime_metadata_from_metastr(char *metastr, Py_ssize_t len,
{
char *substr = metastr, *substrend = NULL;
+ /* Treat the empty string as generic units */
@charris
charris Jun 15, 2011

What's the difference between generic and default units?

@mwiebe
mwiebe Jun 15, 2011

Here's the email thread I created about this:

http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056718.html

I'm really liking the resulting behavior this is giving while I'm coding all the datetime tests and playing around with it. The change was to introduce generic units, and to make them the default.

@charris charris and 1 other commented on an outdated diff Jun 15, 2011
numpy/core/src/multiarray/datetime.c
@@ -1503,6 +1504,11 @@ get_datetime_conversion_factor(PyArray_DatetimeMetaData *src_meta,
/* If something overflowed, make both num and denom 0 */
if (denom == 0) {
+ PyErr_Format(PyExc_OverflowError,
+ "Integer overflow getting a conversion factor between "
@charris
charris Jun 15, 2011

Maybe "Integer overflow while computing conversion factor between "

Looks pretty long, a linebreak in there somewhere might help.

@mwiebe
mwiebe Jun 15, 2011

I was under the impression that it wasn't a good idea to put a line break inside exception messages, there's lots more places to change if that's desired.

@charris charris commented on the diff Jun 15, 2011
numpy/core/src/multiarray/datetime.c
@@ -1754,75 +1760,129 @@ compute_datetime_metadata_greatest_common_divisor(
/* Compute the GCD of the resulting multipliers */
num = _uint64_euclidean_gcd(num1, num2);
- /* Create and return the metadata capsule */
- dt_data = PyArray_malloc(sizeof(PyArray_DatetimeMetaData));
- if (dt_data == NULL) {
- return PyErr_NoMemory();
- }
-
- dt_data->base = base;
- dt_data->num = (int)num;
- if (dt_data->num <= 0 || num != (npy_uint64)dt_data->num) {
+ /* Fill the 'out_meta' values */
+ out_meta->base = base;
+ out_meta->num = (int)num;
@mwiebe
mwiebe Jun 15, 2011

I'm not sure I understand your question. For the promotion, it returns a common type, which may additionally be checked by other code that it satisfies a casting condition. In datetime promotion, I'm causing promotion across a nonlinear unit boundary to fail, but that's independent of the casting rules.

@charris charris commented on the diff Jun 15, 2011
numpy/core/src/multiarray/datetime.c
@@ -2344,6 +2449,15 @@ parse_iso_8601_date(char *str, int len, npy_datetimestruct *out)
tolower(str[2]) == 'w') {
time_t rawtime = 0;
PyArray_DatetimeMetaData meta;
+
+ /* 'now' only works for units of hours or smaller */
@charris
charris Jun 15, 2011

I assume the 'now' concept is documented somewhere.

@mwiebe
mwiebe Jun 15, 2011

Not yet, but writing documentation is on my todo list.

@charris charris and 1 other commented on an outdated diff Jun 15, 2011
numpy/core/src/multiarray/datetime.c
+ return NULL;
+ }
+
+ /* Calculate the array length */
+ if (values[2] > 0 && values[1] > values[0]) {
+ length = (values[1] - values[0] + (values[2] - 1)) / values[2];
+ }
+ else if (values[2] < 0 && values[1] < values[0]) {
+ length = (values[1] - values[0] + (values[2] + 1)) / values[2];
+ }
+ else if (values[2] != 0) {
+ length = 0;
+ }
+ else {
+ PyErr_SetString(PyExc_ValueError,
+ "arange: step may not be zero");
@charris
charris Jun 15, 2011

Cannot would be better than may not

@charris charris commented on the diff Jun 15, 2011
numpy/core/src/multiarray/datetime_busday.c
@@ -0,0 +1,1302 @@
+/*
@charris
charris Jun 15, 2011

This is as far as I got tonight, I'll try to finish tomorrow.

@mwiebe
mwiebe Jun 15, 2011

Thanks for the reviewing!

@rgommers

Small comment on type descriptions, applies to many items in this commit:
"string" should be "str", "integer" should be "int", and keyword args should have ", optional" appended

NumPy member

I've added another commit which should fix up those conventions.

@rgommers

Like the docstrings. Very complete.

NumPy member

Thanks!

Mark Wiebe added some commits Jun 15, 2011
Mark Wiebe STY: datetime-feedback: Rename np.busdaydef -> np.busdaycalendar
Also rename the busdaydef parameters to busdaycal parameters. This
change was motivated by Chuck's code review feedback.
d117335
Mark Wiebe DOC: datetime-feedback: Applying Ralf's feedback for the parameter co…
…nventions
31056d4
@charris charris commented on the diff Jun 16, 2011
numpy/core/src/multiarray/datetime_busday.c
@@ -0,0 +1,1302 @@
+/*
+ * This file implements business day functionality for NumPy datetime.
+ *
+ * Written by Mark Wiebe (mwwiebe@gmail.com)
+ * Copyright (c) 2011 by Enthought, Inc.
+ *
+ * See LICENSE.txt for the license.
+ */
+
+#define PY_SSIZE_T_CLEAN
@charris
charris Jun 16, 2011

You know, I can't find this macro anywhere except with the #define

@mwiebe
mwiebe Jun 16, 2011

It's a signal to Python.h regarding the PyArg_Parse* functions. I've just propagated it from other C files in this case.

@charris
charris Jun 16, 2011

Yeah, I found it. It's going to go away somewhere in 3.*

@charris charris and 1 other commented on an outdated diff Jun 16, 2011
numpy/core/src/multiarray/datetime_busday.c
+ int day_of_week;
+
+ /* Get the day of the week for 'date' (1970-01-05 is Monday) */
+ day_of_week = (int)((date - 4) % 7);
+ if (day_of_week < 0) {
+ day_of_week += 7;
+ }
+
+ return day_of_week;
+}
+
+/*
+ * Returns 1 if the date is a holiday (contained in the sorted
+ * list of dates), 0 otherwise.
+ *
+ * The holidays list should be normalized.
@mwiebe
mwiebe Jun 16, 2011

I'm expanding this comment.

@charris charris commented on the diff Jun 16, 2011
numpy/core/src/multiarray/datetime_busday.c
+ * The holidays list should be normalized.
+ */
+static npy_datetime *
+find_earliest_holiday_after(npy_datetime date,
+ npy_datetime *holidays_begin, npy_datetime *holidays_end)
+{
+ npy_datetime *trial;
+
+ /* Simple binary search */
+ while (holidays_begin < holidays_end) {
+ trial = holidays_begin + (holidays_end - holidays_begin) / 2;
+
+ if (date < *trial) {
+ holidays_end = trial;
+ }
+ else if (date > *trial) {
@charris
charris Jun 16, 2011

Is the holidays list checked for repeats?

@mwiebe
mwiebe Jun 16, 2011

Yes, that's part of the "normalized" thing.

@charris charris commented on the diff Jun 16, 2011
numpy/core/src/multiarray/datetime_busday.c
+ * + Applies the 'roll' rule to the date to either produce NaT, raise
+ * an exception, or land on a valid business day.
+ * + Adds 'offset' business days to the valid business day found.
+ * + Sets the value in 'out' if provided, or the allocated output array
+ * otherwise.
+ */
+NPY_NO_EXPORT PyArrayObject *
+business_day_offset(PyArrayObject *dates, PyArrayObject *offsets,
+ PyArrayObject *out,
+ NPY_BUSDAY_ROLL roll,
+ npy_bool *weekmask, int busdays_in_weekmask,
+ npy_datetime *holidays_begin, npy_datetime *holidays_end)
+{
+ PyArray_DatetimeMetaData temp_meta;
+ PyArray_Descr *dtypes[3] = {NULL, NULL, NULL};
+
@charris
charris Jun 16, 2011

I don't really mind the blank lines in the variable declarations, but they are a bit non-standard.

@mwiebe
mwiebe Jun 16, 2011

I was trying to break things up to make it more readable, generally.

@charris charris and 1 other commented on an outdated diff Jun 16, 2011
numpy/core/src/multiarray/datetime_busday.c
+ }
+ out = (PyArrayObject *)out_in;
+ }
+
+ ret = business_day_offset(dates, offsets, out, roll,
+ weekmask, busdays_in_weekmask,
+ holidays.begin, holidays.end);
+
+ Py_DECREF(dates);
+ Py_DECREF(offsets);
+ if (allocated_holidays && holidays.begin != NULL) {
+ PyArray_free(holidays.begin);
+ }
+
+ return out == NULL ? PyArray_Return(ret) : (PyObject *)ret;
+fail:
@charris
charris Jun 16, 2011

A blank line above the label would be appropriate. There are other cases down the line.

@charris charris commented on the diff Jun 16, 2011
numpy/core/src/multiarray/datetime_busdaycal.c
+ *
+ * Returns the number of dates left after removing weekmask-excluded
+ * dates.
+ */
+NPY_NO_EXPORT void
+normalize_holidays_list(npy_holidayslist *holidays, npy_bool *weekmask)
+{
+ npy_datetime *dates = holidays->begin;
+ npy_intp count = holidays->end - dates;
+
+ npy_datetime lastdate = NPY_DATETIME_NAT;
+ npy_intp trimcount, i;
+ int day_of_week;
+
+ /* Sort the dates */
+ qsort(dates, count, sizeof(npy_datetime), &qsort_datetime_compare);
@charris
charris Jun 16, 2011

Might want to use the sorting library later.

@charris
NumPy member

I really like that you have documented a lot of the C functions (although not quit all), it's refreshing. The code looks very nice overall.

@charris
NumPy member

Have you tested what happens with negative times? There are a fair number of integer divisions scattered about...

@charris charris closed this Jun 16, 2011
@charris charris reopened this Jun 16, 2011
@mwiebe
NumPy member

Thanks again for the feedback. I've tried to test the negative cases by including dates before and after 1970 in many tests, and focusing the tests on potential problem cases.

@charris
NumPy member

It is the integer divisions that concern me since they round towards zero and GCD could be negative, etc.

@mwiebe
NumPy member

The gcd function operates on unsigned ints. I've tried to be pretty careful about the usage of integer division, can you point out the specific cases that are bothering you?

@charris
NumPy member

No, I just know from experience that it can be a problem and wanted to be sure you were aware of it.

Mark Wiebe and others added some commits Jun 16, 2011
Mark Wiebe BUG: dtype: Cleanups and fix parsing datetime dtypes with an endian s…
…pecifier

One fairly major improvement is that parsing with kind and a size now
makes sure there isn't any garbage after the size.
e6bffff
Derek Homeier BUG: Py3k: some of the string type-related failures in numpy/core/tests
MW: I've removed the asbytes part and changed 'S5' to 'S0' from
    Derek's original commit.
e9f4e75
Mark Wiebe BUG: core: promote_types wasn't always returning NBO data types 8019d91
Mark Wiebe BUG: ufunc: Type promotion output must always be in NBO (fixes #1867) afe25c1
Mark Wiebe ENH: datetime-ufunc: Add m8 / m8 -> f8 case to the datetime ufunc ope…
…rations
2d7d59a
@mwiebe mwiebe merged commit 2d7d59a into numpy:master Jun 16, 2011
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment