PERF: releasing the GIL, #8882 #10199

jreback · 2015-05-22T20:19:47Z

This is now implemented for all groupbys with the factorization part as well (which is actually a big part fo the time)

In [2]: N = 1000000

In [3]: ngroups = 1000

In [4]: np.random.seed(1234)

In [5]: df = DataFrame({'key' : np.random.randint(0,ngroups,size=N),
   ...:                 'data' : np.random.randn(N) })

def f():
    df.groupby('key')['data'].sum()

# run consecutivily
def g2():
    for i in range(2):
        f()
def g4():
    for i in range(4):
        f()
def g8():
    for i in range(8):
        f()

# run in parallel
@test_parallel(num_threads=2)
def pg2():
    f()

@test_parallel(num_threads=4)
def pg4():
    f()

@test_parallel(num_threads=8)
def pg8():
    f()

So seeing a nice curve as far as addtl cores are used (as compared to some of my previous posts).

In [19]: df
Out[19]: 
   after  before   speedup
2   27.3    52.1  1.908425
4   44.8   105.0  2.343750
8   83.2   213.0  2.560096

mrocklin · 2015-05-22T20:24:26Z

What was the deal with NA and nan. These were Python objects?

jreback · 2015-05-22T20:25:53Z

apparently doing something

cdef ndarray[double_t] foo

foo = np.empty(10,dtype='float64')

foo[0] = np.nan

actually assigns a python object (and does inference and such), so that you MUST have the gil,
but casting to a double seems to work though (as its now c-level)

cdef double nan = <double> np.nan
with nogil:
    foo[0] = nan

ahojnnes · 2015-05-22T20:42:26Z

You can reuse NPY_NAN cross-platform, e.g., https://github.com/scikit-image/scikit-image/blob/master/skimage/feature/_texture.pyx#L11

jreback · 2015-05-22T20:50:05Z

@ahojnnes thxs, I did change it.

jreback · 2015-05-28T21:50:20Z

I updated this. works for all groupbys now. So it gives a general 5-10% speedup on most operations (even single threaded), as there is slightly less generated code (in nogil it pretty much generates straight c code).

so we still need to hold the gil for the .resize operation, but its generally quite limited when this happens (e.g. resizing is a function of the number of uniques for example), but the cost of this is pretty low.

There IS a soln where I can not use numpy resize at ALL. But its quite a bit more c-code and IMHO not really worth it.

mrocklin · 2015-05-29T02:29:05Z

Cool. I'll try to run a few different groupby operations with dask.dataframe sometime early next week and report back. I'm curious what were the operations that needed to be released? Was there a core set of functionality or was it a large volume of trivial work?

jreback · 2015-05-29T11:45:17Z

TL;DR.

So lots of 'boilerplate' changes in generated.pyx, which is basically all of the algorithms for groupby ops themselves (e.g. sum,mean). These were straightforward insertions of with nogil, just needed to avoid typing the object ones (which for the most part was easy; though a bit tricky on some non-groupby routines, e.g. see #10213).

However, the big speeds occur once I fixed the hashtable.pyx factorizer (e.g. the get_labels routine). The trivial soln actually produces a very odd perf issue (see here), which is a bug / documentation issue.

In effect, you are using a nogil function that happens to have a with gil block WITHIN it, then the function is way slower than if you inline the exact same code.

I 'fixed' this issue by adding some additional c-structures to hold data (basically I changed the low-level implementation); these can be easily passed around; downside is that are a little 'messy' in the code department.

note to @shoyer I didn't actually bypass the realloc of memory (e.g. .resize) of numpy arrays. though I do have a soln to that (basically manage a memory block myself, then turning that into a numpy array at the very end (in to_array), easy enough, but was running into some compile issues, so dropped it in the end).

mrocklin · 2015-06-17T01:12:34Z

I think that this is awesome and am afraid of it going stale. Is there something blocking it from being merged into master?

shoyer · 2015-06-17T01:18:01Z

@mrocklin We were waiting to get the 0.16.2 release out first, which happened last Friday. At this point I don't think there are any blockers.

jreback · 2015-06-17T01:22:17Z

nothing blocking. I am going to update and see if I can hit some of #10213.

I have to do a bit of odd code-generation because for example you want to simultaneously support nogil (for numeric) and gil for object, but its a context manager (and NOT a function, which has some odd per characteristics). IOW, you have to worry about formatting, a bit annoying.

jreback · 2015-06-17T01:23:53Z

actually I do need to add a doc-note to the whatsnew (and maybe a small mention in the enhancingperf section as well).

mrocklin · 2015-06-17T01:24:49Z

Hrm, yes, unfortunate that you can't manipulate the lock directly

mrocklin · 2015-06-26T00:02:27Z

Thought I'd caress this PR with a gentle ping.

ping

jreback · 2015-06-26T00:03:24Z

is that euphemism for harassment?

mrocklin · 2015-06-26T00:06:32Z

Kind and loving harassment, yes.

jreback · 2015-06-26T19:53:39Z

any commentary @jorisvandenbossche @shoyer @cpcloud @TomAugspurger

going to have a followup for more GIL stuff. Its actually tricky making sure its doing (and you are measure exactly the right stuff)

cpcloud · 2015-06-26T20:02:11Z

doc/source/whatsnew/v0.17.0.txt

+
+We are releasing the global-interpreter-lock (GIL) on some cython operations.
+This will allow other threads to run simultaneously during computation, potentially allowing performance improvements
+from multi-threading. Notably ``groupby`` and some indexing operations are a benefit to this. (:issue:`8882`)


i'd say "benefit from this"

jreback · 2015-06-26T23:03:53Z

pandas/hashtable.pyx

+    # compile time specilization of the fused types
+    # as the cross-product is generated, but we cannot assign float->int
+    # the types that don't pass are pruned
+    if (vector_data is Int64VectorData and sixty_four_bit_scalar is int64_t) or (


@cpcloud I got this to work after puzzling on what cython does with multiple fused types. It creates the cross-product, but you generally need only certain specilizations; it does this beaut as a compile-time prune, so no perf hit.

PERF: releasing the GIL, #8882

sinhrks · 2015-06-30T12:33:41Z

Great job!!

mrocklin · 2015-06-30T14:26:09Z

Woohoo!

Pulls from master, checks benchmarks.

jorisvandenbossche · 2015-06-30T14:36:48Z

@mrocklin Can we already look forward to a new awesome blogpost? :-)

mrocklin · 2015-06-30T14:37:51Z

Best I have to offer at the moment are the slowly growing dask.dataframe docs

jorisvandenbossche · 2015-06-30T14:41:32Z

I should definitely check it out once I find some time!

scari · 2015-06-30T16:58:13Z

Wow! Great job! 👍🏻

- [x] tests added / passed - [x] passes ``git diff upstream/master | flake8 --diff`` Rebased version of #10229 which was [actually not](h ttps://github.com//pull/10229#issuecomment-131470116) fixed by #10199. Nothing particular relevant, just wanted to delete this branch locally and noticed it still applies: you'll judge what to do of it. Author: Pietro Battiston <me@pietrobattiston.it> Closes #13594 from toobaz/fix_checkunique and squashes the following commits: a63bd12 [Pietro Battiston] CLN: Initialization coincides with mapping, hence with uniqueness check

jreback added the Performance Memory or execution speed performance label May 22, 2015

jreback added this to the 0.17.0 milestone May 22, 2015

mrocklin mentioned this pull request May 22, 2015

Nogil scikit-image/scikit-image#1519

Merged

mrocklin mentioned this pull request May 22, 2015

Release the GIL pydata/bottleneck#101

Closed

jreback force-pushed the gil branch from f0a9480 to aa0a659 Compare May 26, 2015 22:21

jreback mentioned this pull request May 27, 2015

PERF: additonal GIL releasing #10213

Closed

12 tasks

jreback force-pushed the gil branch 2 times, most recently from 73baaa0 to b69c759 Compare May 28, 2015 12:33

jreback force-pushed the gil branch from a07a166 to 00fa05f Compare May 28, 2015 22:06

jreback force-pushed the gil branch 5 times, most recently from b6eea4b to 4e2acd1 Compare June 3, 2015 12:52

jreback force-pushed the gil branch 2 times, most recently from d1a08d7 to bb78fef Compare June 12, 2015 19:41

jreback force-pushed the gil branch from bb78fef to a5aad65 Compare June 17, 2015 01:23

jreback self-assigned this Jun 17, 2015

jreback force-pushed the gil branch from 446e2b1 to 9a51927 Compare June 26, 2015 19:32

cpcloud reviewed Jun 26, 2015
View reviewed changes

jreback force-pushed the gil branch from 9a51927 to 77d03c4 Compare June 26, 2015 23:02

jreback reviewed Jun 26, 2015
View reviewed changes

jreback added 2 commits June 26, 2015 19:23

PERF: vbenches for pandas-dev#8882, releasing the GIL

b08ab8e

PERF: releasing the GIL, pandas-dev#8882

0bc2904

jreback force-pushed the gil branch from 77d03c4 to 0bc2904 Compare June 26, 2015 23:23

jreback added a commit that referenced this pull request Jun 30, 2015

Merge pull request #10199 from jreback/gil

16a44ad

PERF: releasing the GIL, #8882

jreback merged commit 16a44ad into pandas-dev:master Jun 30, 2015

jreback mentioned this pull request Aug 15, 2015

CLN: Remove redundant self.unique_check and self._do_unique_check #10229

Closed

toobaz mentioned this pull request Jul 9, 2016

CLN: Initialization coincides with mapping, hence with uniqueness check #13594

Closed

2 tasks

ssanderson mentioned this pull request Sep 21, 2016

Large Monotonic Index Objects Always Allocate Hash Tables on get_loc #14266

Closed

zzhengnan mentioned this pull request Nov 30, 2020

Typo fixes and minor editorial changes dask/dask-tutorial#201

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: releasing the GIL, #8882 #10199

PERF: releasing the GIL, #8882 #10199

jreback commented May 22, 2015

mrocklin commented May 22, 2015

jreback commented May 22, 2015

ahojnnes commented May 22, 2015

jreback commented May 22, 2015

jreback commented May 28, 2015

mrocklin commented May 29, 2015

jreback commented May 29, 2015

mrocklin commented Jun 17, 2015

shoyer commented Jun 17, 2015

jreback commented Jun 17, 2015

jreback commented Jun 17, 2015

mrocklin commented Jun 17, 2015

mrocklin commented Jun 26, 2015

jreback commented Jun 26, 2015

mrocklin commented Jun 26, 2015

jreback commented Jun 26, 2015

cpcloud Jun 26, 2015

jreback Jun 26, 2015

sinhrks commented Jun 30, 2015

mrocklin commented Jun 30, 2015

jorisvandenbossche commented Jun 30, 2015

mrocklin commented Jun 30, 2015

jorisvandenbossche commented Jun 30, 2015

scari commented Jun 30, 2015

PERF: releasing the GIL, #8882 #10199

PERF: releasing the GIL, #8882 #10199

Conversation

jreback commented May 22, 2015

mrocklin commented May 22, 2015

jreback commented May 22, 2015

ahojnnes commented May 22, 2015

jreback commented May 22, 2015

jreback commented May 28, 2015

mrocklin commented May 29, 2015

jreback commented May 29, 2015

mrocklin commented Jun 17, 2015

shoyer commented Jun 17, 2015

jreback commented Jun 17, 2015

jreback commented Jun 17, 2015

mrocklin commented Jun 17, 2015

mrocklin commented Jun 26, 2015

jreback commented Jun 26, 2015

mrocklin commented Jun 26, 2015

jreback commented Jun 26, 2015

cpcloud Jun 26, 2015

Choose a reason for hiding this comment

jreback Jun 26, 2015

Choose a reason for hiding this comment

sinhrks commented Jun 30, 2015

mrocklin commented Jun 30, 2015

jorisvandenbossche commented Jun 30, 2015

mrocklin commented Jun 30, 2015

jorisvandenbossche commented Jun 30, 2015

scari commented Jun 30, 2015