PERF: Release GIL on some datetime ops #11263

Merged
merged 1 commit into from Oct 17, 2015

Conversation

Projects
None yet
3 participants
Contributor

chris-b1 commented Oct 8, 2015

This is a WIP, but far enough along I thought I'd share and see if the approach was reasonable.

This releases the GIL on most vectorized field accessors (e.g. dt.year) and conversion to and from Period. May be places it could be done - obviously would be nice for parsing, but I'm not sure that's possible.

Contributor

jreback commented Oct 8, 2015

ohh nice!

can u share some timings?

@jreback jreback commented on an outdated diff Oct 8, 2015

pandas/src/period.pyx
@@ -164,10 +165,11 @@ def periodarr_to_dt64arr(ndarray[int64_t] periodarr, int freq):
out = np.empty(l, dtype='i8')
for i in range(l):
- if periodarr[i] == iNaT:
- out[i] = iNaT
- continue
- out[i] = period_ordinal_to_dt64(periodarr[i], freq)
+ with nogil:
+ if periodarr[i] == NPY_NAT:
@jreback

jreback Oct 8, 2015

Contributor

move nogil outside the loop

@jreback jreback commented on an outdated diff Oct 8, 2015

pandas/src/period.pyx
cdef:
pandas_datetimestruct dts
date_info dinfo
float subsecond_fraction
- if ordinal == iNaT:
+ if ordinal == NPY_NAT: #TODO: does this break anything?
@jreback

jreback Oct 8, 2015

Contributor

NPY_NAT and iNaT are equivalent

NPY_NAT is just has a c type declared

Contributor

chris-b1 commented Oct 8, 2015

Here are some timings - getting a pretty nice speedup. In single-threaded case things are looking about flat.

In [1]: from pandas.util.testing import test_parallel
In [2]: dti = pd.date_range('1900-1-1', periods=100000)

In [3]: def f():
   ...:     for i in range(4):
   ...:         dti.year
In [4]: @test_parallel(4)
   ...: def g():
   ...:     dti.year

In [8]: %timeit f()
10 loops, best of 3: 25.8 ms per loop

In [9]: %timeit g()
100 loops, best of 3: 7.71 ms per loop

@jreback jreback commented on an outdated diff Oct 8, 2015

pandas/tslib.pyx
- pandas_datetime_to_datetimestruct(dtindex[i], PANDAS_FR_ns, &dts)
- out[i] = monthrange(dts.year, dts.month)[1]
+ pandas_datetime_to_datetimestruct(dtindex[i], PANDAS_FR_ns, &dts)
+ out[i] = days_per_month_table[is_leapyear(dts.year)][dts.month-1]
@jreback

jreback Oct 8, 2015

Contributor

prob makes sense to define this as a c-function and make it nogil (the days_per_month......)

@kawochen kawochen commented on the diff Oct 15, 2015

pandas/tslib.pyx
@@ -3849,6 +3849,7 @@ def get_time_micros(ndarray[int64_t] dtindex):
@cython.wraparound(False)
+@cython.boundscheck(False)
def get_date_field(ndarray[int64_t] dtindex, object field):
@kawochen

kawochen Oct 15, 2015

Contributor

If you declared field as char[:] instead would you be able to nogil the whole thing until raise?

@chris-b1

chris-b1 Oct 16, 2015

Contributor

hmm, tried that out, but cython doesn't seem to take a view of strings like that? http://stackoverflow.com/questions/28203670/how-to-use-cython-typed-memoryviews-to-accept-strings-from-python

jreback added this to the 0.17.1 milestone Oct 16, 2015

Contributor

jreback commented Oct 16, 2015

@chris-b1 loooks good. can you add a whatsnew note (perf) and squash.

chris-b1 changed the title from (WIP) PERF: Release GIL on some datetime ops to PERF: Release GIL on some datetime ops Oct 16, 2015

Contributor

chris-b1 commented Oct 17, 2015

@jreback - updated

@jreback jreback added a commit that referenced this pull request Oct 17, 2015

@jreback jreback Merge pull request #11263 from chris-b1/tslib-gil
PERF: Release GIL on some datetime ops
7e5b223

@jreback jreback merged commit 7e5b223 into pandas-dev:master Oct 17, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Contributor

jreback commented Oct 17, 2015

thanks!

Contributor

jreback commented Oct 20, 2015

@chris-b1 can you add these (clean then make again to see them)

warning: pandas/src/period.pyx:144:24: Use boundscheck(False) for faster access
warning: pandas/src/period.pyx:145:23: Use boundscheck(False) for faster access
warning: pandas/src/period.pyx:147:55: Use boundscheck(False) for faster access
warning: pandas/src/period.pyx:148:19: Use boundscheck(False) for faster access
warning: pandas/src/period.pyx:169:24: Use boundscheck(False) for faster access
warning: pandas/src/period.pyx:170:19: Use boundscheck(False) for faster access
warning: pandas/src/period.pyx:172:15: Use boundscheck(False) for faster access
warning: pandas/src/period.pyx:172:53: Use boundscheck(False) for faster access
building 'pandas._period' extension

chris-b1 deleted the chris-b1:tslib-gil branch Oct 21, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment