Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update setting data pointers for Cython 3 #34014

Closed
1 of 2 tasks
jreback opened this issue May 5, 2020 · 34 comments
Closed
1 of 2 tasks

Update setting data pointers for Cython 3 #34014

jreback opened this issue May 5, 2020 · 34 comments
Labels
CI Continuous Integration

Comments

@jreback
Copy link
Contributor

jreback commented May 5, 2020

Need to update the following locations

  • _libs/window/aggregations.pyx: bufarr.data
  • _libs/reduction.pyx: chunk.data

https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=34888&view=logs&j=33ccdd52-c922-5ef2-8209-78215e36d994&t=046ffbec-62c2-54e3-e88a-7745f4292bb6&l=457

just started failing.

cc @pandas-dev/pandas-core @pandas-dev/pandas-triage

if anyone has insights

@jreback jreback added Bug Needs Triage Issue that has not been reviewed by a pandas team member CI Continuous Integration and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 5, 2020
@jreback jreback added this to the 1.1 milestone May 5, 2020
@alimcmaster1
Copy link
Member

alimcmaster1 commented May 5, 2020

Cython alpha was released today: https://pypi.org/project/Cython/#history

Locally I get below error when building the extensions

        buf = <float64_t *>arr.data
        bufarr = np.empty(win, dtype=float)
        oldbuf = <float64_t *>bufarr.data
        for i in range((win - offset), (N - offset)):
            buf = buf + 1
            bufarr.data = <char *>buf
                 ^
------------------------------------------------------------

pandas/_libs/window/aggregations.pyx:1385:18: Assignment to a read-only property

cc @scoder

@WillAyd
Copy link
Member

WillAyd commented May 6, 2020

My guess is this is the result of cython/cython#3571

@WillAyd
Copy link
Member

WillAyd commented May 6, 2020

So IIUC assigning the data pointer has been deprecated since 2013. Quick fix might be to not use Cython alpha in CI, but in any case we probably need to fix this.

Not sure there is a true replacement in the more modern API that allows for assigning the data pointer at least according to the following mailing list:

https://mail.python.org/pipermail/numpy-discussion/2013-November/068172.html

ref cython/cython#2498 and ping @rgommers as well

@scoder
Copy link

scoder commented May 6, 2020

That looks like the one evil hack that smells a bit when you implement it but otherwise works perfectly, and that drops on your feet as soon as you've forgotten about it. You already CC-ed NumPy, they are the right people to ask here. CC-ing @mattip as well, who came up with the property wrapper for these fields for Cython.

I'm happy that you didn't ask for an upstream revert of the change. :)

@bashtage
Copy link
Contributor

bashtage commented May 6, 2020

arr = self.arr
chunk = self.dummy
dummy_buf = chunk.data
chunk.data = arr.data
labels = self.labels

is also very fishy. It also seems to be at the root of a very strange behavior when group by apply is used in statsmodels/statsmodels#6603 . In this example, the wrong data was ending up in the OLS regression because it looks like it has the wrong buffer somehow. This seems to be not completely safe in the case of a deferred operation.

@mattip
Copy link
Contributor

mattip commented May 6, 2020

Maybe rewrite that to use bufarr = PyArray_SimpleNewFromData(1, &win, NPY_FLOAT32, buf) instead of bufarr.data = ...? As for the line pointed to by @bashtage, that also needs a rewrite to be safe.

@mattip
Copy link
Contributor

mattip commented May 6, 2020

The Reducer class was added in a commit in response to gh-309 in 2011. Perhaps it could be rewritten to use nditer from python or the Array Iterator from C (if the former gives performance problems).

@mattip
Copy link
Contributor

mattip commented May 6, 2020

Of course you could just write PyArray_Bytes(chunk) = arr.data, although that would not solve any possible bugs or race conditions due to reusing chunk inappropriately

@jreback
Copy link
Contributor Author

jreback commented May 6, 2020

ok I think we need to fix the cython version again, @alimcmaster1 would you mind doing a PR for this.

We should certainly address this older code paths. IIRC @jbrockmendel was almost able to remove this entire module, but has some perf issues.

@TomAugspurger
Copy link
Contributor

If we don't fix these for 1.1.0 (or 1.0.4), we might consider putting an upper bound on Cython in our pyproject.toml: https://github.com/pandas-dev/pandas/blob/master/pyproject.toml. I think that's the only situation where this would affect users. People installing pandas from source after Cython 3.0 is released.

@TomAugspurger TomAugspurger added the Blocker Blocking issue or pull request for an upcoming release label May 13, 2020
@jbrockmendel
Copy link
Member

was almost able to remove this entire module, but has some perf issues.

Yah, we need to decide how big a perf penalty we're willing to accept in exchange for losing the maintenance burdens in this module.

@bashtage
Copy link
Contributor

In the case of dummy_buf it is whether the code path is correct for any functions that defer evaluation outside of the apply, which can result in incorrect results. This seems like a bigger issue.

@WillAyd
Copy link
Member

WillAyd commented May 14, 2020

Do you have a code sample to reproduce that issue? Can see if #34080 fixes it once green

@bashtage
Copy link
Contributor

import statsmodels.api as sm
import pandas as pd
import numpy as np

np.random.seed(42)

Ys = pd.DataFrame(np.random.randn(4,10))
regressors = np.random.randn(10,2)

def regress1(s):
    return sm.OLS(s.values, regressors).fit()

results = pd.DataFrame(Ys).apply(regress1, axis='columns')
print([res.rsquared for res in results])

correct = [ sm.OLS(Ys.T[col], regressors).fit().rsquared for col in Ys.T]
print(correct)

Return

[-inf, -inf, -inf, -inf]
[0.4362196642459433, 0.011922654209731376, 0.31130804443466864, 0.16468214939479164]
c:\git\statsmodels\statsmodels\regression\linear_model.py:1702: RuntimeWarning: divide by zero encountered in double_scalars
  return 1 - self.ssr/self.uncentered_tss

The first return may vary since the data seems to be from an empty array which has random values.

@bashtage
Copy link
Contributor

Not that simple, but it was how this issue was discovered.

@TomAugspurger
Copy link
Contributor

What's the status here? We have a few places to update (can we compile a list in the original post? I'll start it) that we need to update before we can build pandas with Cython 3.x?

Is there anything that has to be done here for pandas 1.1.0, other than perhaps setting an upper bound for Cython in our pyproject.toml?

@TomAugspurger TomAugspurger changed the title CI: numpydev build failing Update setting data pointers for Cython 3.0 Jun 17, 2020
@TomAugspurger TomAugspurger changed the title Update setting data pointers for Cython 3.0 Update setting data pointers for Cython 3 Jun 17, 2020
@WillAyd
Copy link
Member

WillAyd commented Jun 17, 2020

I think we need to invest some time to get things to work with Cython 3. Specifically this comment #34014 (comment) I started down that path in #34080 just never saw through

@jbrockmendel
Copy link
Member

jbrockmendel commented Jun 17, 2020

Yah, we need to decide how big a perf penalty we're willing to accept in exchange for losing the maintenance burdens in this module.

Re-upping this. Just ripping out libreduction entirely asvs affected are:

       before           after         ratio
     [e6e08890]       [d03335bc]
     <master>         <cy30>
+     4.98±0.03ms       42.5±0.5ms     8.54  groupby.Apply.time_scalar_function_single_col
+      8.59±0.2ms       59.4±0.4ms     6.92  frame_methods.Apply.time_apply_ref_by_name
+      17.9±0.1ms        113±0.7ms     6.32  groupby.Apply.time_scalar_function_multi_col
+         144±1ms         466±20ms     3.23  groupby.MultiColumn.time_lambda_sum
+        77.4±1ms         239±10ms     3.09  groupby.MultiColumn.time_col_select_lambda_sum
+         276±2μs          534±4μs     1.93  groupby.GroupByMethods.time_dtype_as_group('int', 'std', 'transformation')
+         277±1μs          535±6μs     1.93  groupby.GroupByMethods.time_dtype_as_group('int', 'std', 'direct')
+      49.7±0.6ms         92.1±1ms     1.85  frame_methods.Iteration.time_iteritems_indexing
+         270±2μs          495±1μs     1.83  groupby.GroupByMethods.time_dtype_as_field('int', 'std', 'direct')
+         277±2μs          507±2μs     1.83  groupby.GroupByMethods.time_dtype_as_group('float', 'std', 'direct')
+       279±0.7μs          507±2μs     1.82  groupby.GroupByMethods.time_dtype_as_group('float', 'std', 'transformation')
+         271±2μs          494±2μs     1.82  groupby.GroupByMethods.time_dtype_as_field('int', 'std', 'transformation')
+         267±2μs          421±2μs     1.58  groupby.GroupByMethods.time_dtype_as_field('float', 'std', 'transformation')
+       267±0.7μs          419±4μs     1.57  groupby.GroupByMethods.time_dtype_as_field('float', 'std', 'direct')
+         315±1ms          437±2ms     1.39  frame_methods.Nunique.time_frame_nunique
+       129±0.4ms          175±4ms     1.36  frame_methods.Apply.time_apply_user_func
+         749±6μs          983±3μs     1.31  groupby.GroupByMethods.time_dtype_as_group('int', 'sem', 'transformation')
+         731±3μs          952±6μs     1.30  groupby.GroupByMethods.time_dtype_as_field('int', 'sem', 'transformation')
+         731±2μs         947±30μs     1.30  groupby.GroupByMethods.time_dtype_as_field('int', 'sem', 'direct')
+        964±20μs      1.24±0.01ms     1.28  arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis0('ne')
+         751±2μs          960±6μs     1.28  groupby.GroupByMethods.time_dtype_as_group('float', 'sem', 'direct')
+    10.00±0.05ms       12.7±0.1ms     1.27  frame_methods.Apply.time_apply_lambda_mean
+         752±2μs          956±4μs     1.27  groupby.GroupByMethods.time_dtype_as_group('float', 'sem', 'transformation')
+     11.2±0.08ms       13.9±0.2ms     1.24  frame_methods.Apply.time_apply_np_mean
+         736±5μs          880±6μs     1.19  groupby.GroupByMethods.time_dtype_as_field('float', 'sem', 'direct')
+         732±7μs          868±3μs     1.19  groupby.GroupByMethods.time_dtype_as_field('float', 'sem', 'transformation')
+        809±40ns        938±100ns     1.16  index_cached_properties.IndexCache.time_is_monotonic('RangeIndex')
+      2.17±0.2μs       2.46±0.2μs     1.13  index_cached_properties.IndexCache.time_is_all_dates('IntervalIndex')
+      74.1±0.5ms       83.6±0.4ms     1.13  groupby.Groups.time_series_groups('int64_large')
+     10.8±0.02ms      12.2±0.07ms     1.12  groupby.MultiColumn.time_col_select_numpy_sum
+        812±20ns         903±30ns     1.11  index_cached_properties.IndexCache.time_is_monotonic('Int64Index')
+     1.30±0.03μs       1.44±0.2μs     1.11  index_cached_properties.IndexCache.time_values('DatetimeIndex')
+     1.93±0.05μs       2.14±0.1μs     1.11  index_cached_properties.IndexCache.time_inferred_type('MultiIndex')
+        457±10ns         507±20ns     1.11  index_cached_properties.IndexCache.time_is_all_dates('RangeIndex')
+        445±20ns         493±50ns     1.11  index_cached_properties.IndexCache.time_is_unique('RangeIndex')
+     1.91±0.01ms       2.11±0.5ms     1.11  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'int', 'count')
+     1.38±0.04μs      1.53±0.08μs     1.10  index_cached_properties.IndexCache.time_is_all_dates('MultiIndex')
+     2.24±0.01ms       2.47±0.4ms     1.10  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'int', 'sum')
+     2.26±0.01ms       2.48±0.4ms     1.10  rolling.VariableWindowMethods.time_rolling('DataFrame', '1h', 'int', 'sum')
+     3.27±0.01ms       3.60±0.5ms     1.10  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'int', 'std')
-      45.9±0.5ms       41.3±0.3ms     0.90  io.csv.ReadCSVCategorical.time_convert_post
-         552±2μs          495±2μs     0.90  groupby.GroupByMethods.time_dtype_as_group('object', 'unique', 'direct')
-         554±1μs          493±2μs     0.89  groupby.GroupByMethods.time_dtype_as_group('object', 'unique', 'transformation')
-     14.8±0.08ms       13.1±0.3ms     0.89  groupby.Nth.time_frame_nth('float64')
-     6.07±0.04ms      5.33±0.05ms     0.88  stat_ops.SeriesMultiIndexOps.time_op(0, 'skew')
-        29.7±2ms       25.7±0.3ms     0.87  stat_ops.FrameMultiIndexOps.time_op(0, 'mad')
-      30.6±0.3ms      26.3±0.08ms     0.86  frame_methods.Dropna.time_dropna_axis_mixed_dtypes('any', 1)
-      2.17±0.3μs       1.86±0.1μs     0.86  index_cached_properties.IndexCache.time_inferred_type('CategoricalIndex')
-     5.39±0.03ms      4.59±0.03ms     0.85  index_object.SetOperations.time_operation('int', 'symmetric_difference')
-        516±20μs          434±9μs     0.84  arithmetic.NumericInferOps.time_subtract(<class 'numpy.int32'>)
-      6.74±0.2ms       5.53±0.2ms     0.82  stat_ops.FrameMultiIndexOps.time_op([0, 1], 'std')
-      20.7±0.5ms       16.6±0.3ms     0.80  stat_ops.FrameMultiIndexOps.time_op(0, 'skew')
-     7.75±0.06ms      6.15±0.04ms     0.79  stat_ops.FrameMultiIndexOps.time_op(0, 'sem')
-      1.28±0.01s       1.01±0.02s     0.79  groupby.Apply.time_copy_function_multi_col
-         518±3ms          403±3ms     0.78  groupby.Apply.time_copy_overhead_single_col
-     4.85±0.04ms      3.33±0.01ms     0.69  stat_ops.FrameMultiIndexOps.time_op(1, 'std')
-     4.90±0.02ms      3.32±0.01ms     0.68  stat_ops.FrameMultiIndexOps.time_op(0, 'std')
-         519±1ms        347±0.8ms     0.67  groupby.Transform.time_transform_lambda_max
-         188±2ms          104±2ms     0.55  groupby.TransformEngine.time_dataframe_cython

Updated Shown asvs removing all of libreduction instead of just Reducer

@bashtage
Copy link
Contributor

Can't you use PyArray_New to create a zero-copy array from an existing array without reassigning the data pointer?

@mroeschke
Copy link
Member

Not immediately. It looks to affect rolling.apply but I can try to play around with it

@jbrockmendel
Copy link
Member

AFAICT this code

    cdef move(self, int start, int end):
        """
        For slicing
        """
        self.buf.data = self.values.data + self.stride * start
        self.buf.shape[0] = end - start

should be equivalent (ex the creation of a new ndarray object) to

    cdef move(self, int start, int end):
        self.buf = self.values[start:end]

but when I try this I get a bunch of test failures and a segfault that i cant reproduce in isolation. Am I misunderstanding what move is doing?

@jbrockmendel
Copy link
Member

Is this a blocker for 1.1?

@WillAyd
Copy link
Member

WillAyd commented Jul 6, 2020 via email

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jul 6, 2020

I think if we don't update this for 1.1, then we'll want to set an upper bound on Cython in our pyproject.toml so that users can compile pandas 1.1 from source after Cython 3.0 is released.

@bashtage
Copy link
Contributor

bashtage commented Jul 7, 2020

To advocate for fixing sooner rather than later, in addition to being a technical issue for future Cython, it is also a bug when the function being applied has any deferred operations.

@jbrockmendel
Copy link
Member

@bashtage attempts so far have not gone great, xref #34997, extra eyeballs may help out

@jreback jreback modified the milestones: 1.1, Contributions Welcome Jul 11, 2020
TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Jul 23, 2020
We know that pandas doesn't work with Cython 3.0
(pandas-dev#34213,
pandas-dev#34014)

This sets the maximum supported version of Cython in our pyproject.toml
to ensure that pandas 1.1.0 can continue to be built from source without
Cython pre-installed after Cython 3.0 is released.
jreback pushed a commit that referenced this issue Jul 23, 2020
We know that pandas doesn't work with Cython 3.0
(#34213,
#34014)

This sets the maximum supported version of Cython in our pyproject.toml
to ensure that pandas 1.1.0 can continue to be built from source without
Cython pre-installed after Cython 3.0 is released.
@tacaswell tacaswell mentioned this issue Aug 11, 2020
5 tasks
@fangchenli fangchenli mentioned this issue Sep 12, 2020
2 tasks
fangchenli added a commit to fangchenli/pandas that referenced this issue Oct 9, 2020
fangchenli added a commit to fangchenli/pandas that referenced this issue Oct 10, 2020
fangchenli added a commit to fangchenli/pandas that referenced this issue Oct 10, 2020
@fangchenli
Copy link
Member

Here is one example that does not set data pointer in the move method.

cdef class Slider:
    cdef:
        ndarray values, buf
        Py_ssize_t stride
        char *orig_data

    def __init__(self, ndarray values, ndarray buf):
        assert values.ndim == 1
        assert values.dtype == buf.dtype

        if not values.flags.contiguous:
            values = values.copy()

        self.values = values.copy()  # the original values would be modified if not copied
        self.buf = buf

        self.stride = values.strides[0]
        self.orig_data = self.buf.data

        self.buf.data = self.values.data
        self.buf.strides[0] = self.stride

    cdef move(self, int start, int end):
        # self.buf.data = self.values.data + self.stride * start
        self.buf.shape[0] = end - start  # length must be set first
        self.buf[:] = self.values[start:end]

    cdef reset(self):
        self.buf.data = self.orig_data
        self.buf.shape[0] = 0

The two pointer setting statements in the __init__ and reset cannot be removed. Otherwise, there would be segfaults and free wrong memory errors. Somehow, we have to keep the original buf. It won't work if we do something like self.buf = buf.copy() or self.buf = self.values[start:end].

Unfortunately I don't know enough about cython and other parts of the code to figure this out. We might need to look at how the Slider's buf is used to find another way for this.

@tacaswell
Copy link
Contributor

I spent some more time looking at what this is doing today and came up with the following notes (I apologize if I am saying really obvious things)

The Slider and BlockSlider classes are implementing views into a numpy array by:

  • taking in two arrays
  • stashing the pointer to one of them
  • on demand mutating the second array to point at the (offset) pointer to the first and adjusting the strides to give the window
  • on the way out the old guts of the second array are put back

The mutated array is then used to update "cached objects" in their calling class by updating the pandas side block manager details. I suspect that this is the source of the stats model issues mentioned above as the code is aggressively changing things underneath the eventual user-exposed objects.

The change that has broken things is than cython now disallows relpacing the guts of a numpy array (which seem fair!). My guess is that the performance gains come from both not memory thrashing and not falling back to the python layer. The cython docs says that when you do [] on a numpy array it falls back to python (I assume because the inputs are too variable) which is probably the source of the major performance regressions.

I am not super clear how the numpy nbiter interface works, but it looks like it is focused on getting an iterator over single elements, or at least fixed steps through the array, where as for this code we need iteration over variable size windows.

It looks like the way to do this with memory views ( https://cython.readthedocs.io/en/latest/src/userguide/numpy_tutorial.html#efficient-indexing-with-memoryviews ) but those seem to require knowing what the type is up front.

My suspicion is that the right solution here is to do something like what @mattip suggested above and in def move use the pointers we have to the underlying data and fabricate new numpy arrays of just the sub-section that is needed.

These classes appear to only be used internally to the reduction module so I do not think there are any back-compatibility with completely re-writing them.

attn @scoder for guidance on which of these methods (or one I do not see) is the best path.

@tacaswell
Copy link
Contributor

also, I think this should be milestoned to an actual release.

@jreback
Copy link
Contributor Author

jreback commented Nov 14, 2020

also, I think this should be milestoned to an actual release.

w/o a dedicated resource to do this it's impossible to actually say it will be in a particular release

@jbrockmendel
Copy link
Member

also, I think this should be milestoned to an actual release.

This would certainly be nice.

Is there any indication that cython will be releasing 3.0 anytime soon? The mailing list has been quiet; i havent been following the issue tracker recently.

I still think #34997 was a reasonable approach, but i never got it working. More eyeballs may be helpful.

IIRC one of the hangups was that the non-cython path behaves slightly differently from the cython path when it comes to unwrapping results at each step in the loop. If that were resolved, then we would at least have the option of ripping out the cython version altogether.

@mroeschke mroeschke removed the Blocker Blocking issue or pull request for an upcoming release label Aug 7, 2021
@fangchenli
Copy link
Member

closed #43505

@tacaswell
Copy link
Contributor

Well, "rip it all out" is one way to fix it !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration
Projects
None yet
Development

Successfully merging a pull request may close this issue.