Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: block-wise arithmetic for frame-with-frame #32779

Merged
merged 63 commits into from
May 19, 2020

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented Mar 17, 2020

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@jbrockmendel
Copy link
Member Author

Did numpy recently change something that would make np.errstate(all="ignore") not suppress a DeprecationWarning?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lookimg pretty good. do you have benchmarks?

if you can create some helper functions as indicated would be great.

nbs = blk._split_op_result(res_values)
res_blks.extend(nbs)
continue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some comments here.(general + line comments as needed)


if not isinstance(blk_vals, np.ndarray):
# 1D EA
assert len(locs) == 1, locs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be more clear to make for example this section (368 to 374) as a function

so the structure of what you are doing is more clear.

@jreback jreback added Performance Memory or execution speed performance Internals Related to non-user accessible pandas implementation labels Mar 19, 2020
@jreback
Copy link
Contributor

jreback commented Mar 19, 2020

also this has a linked issue IIRC? (you just commented today on it, I think)

@jbrockmendel
Copy link
Member Author

will add comments as requested

before implementing helper functions, I'm going to wait for the where/putmask PRs (#32769,#32791, #32846) to go through, as I think that will allow us to simplify this in a more elegant way

@pep8speaks
Copy link

pep8speaks commented Mar 25, 2020

Hello @jbrockmendel! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-05-17 21:35:14 UTC

@jbrockmendel
Copy link
Member Author

Refactored the new function here so as to have one main execution path. This makes the debugging assertions clunkier, but whats left is much smoother than the previous implementation.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems nice overall. Can you ensure we have ASVs for this (I think we do) and post the results?

result = masked_arith_op(left, right, op)
with warnings.catch_warnings():
# suppress warnings from numpy about element-wise comparison
warnings.simplefilter("ignore", DeprecationWarning)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What warning are we suppressing here, and is this future-proof? If this is the usual

In [2]: np.array([1, 2]) == 'a'
/Users/taugspurger/.virtualenvs/dask-dev/bin/ipython:1: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  #!/Users/taugspurger/Envs/dask-dev/bin/python
Out[2]: False

then in the future the result will be array([False, False]). Is that what we want?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this warning-catching was necessary to make the npdev build, pass, im going to see if i can revert it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like catching this is needed in some form, but i can move the catching so as to cast a less-wide net.

pandas/core/ops/__init__.py Outdated Show resolved Hide resolved
pandas/core/ops/__init__.py Outdated Show resolved Hide resolved
pandas/core/ops/__init__.py Outdated Show resolved Hide resolved
pandas/core/ops/__init__.py Outdated Show resolved Hide resolved
@jbrockmendel
Copy link
Member Author

jbrockmendel commented Apr 1, 2020

Just pushed with a some new asvs, about 200x speedup in the best-case scenario:

arr = np.random.randn(10 ** 6).reshape(500, 2000).astype(np.float64)
df = pd.DataFrame(arr)
df[1000] = df[1000].astype(np.float32)
df.iloc[:, 1000:] = df.iloc[:, 1000:].astype(np.float32)

df2 = pd.DataFrame(arr)
df2[1000] = df2[1000].astype(np.int64)
df2.iloc[:, 500:1500] = df2.iloc[:, 500:1500].astype(np.int64)

df._consolidate_inplace()
df2._consolidate_inplace()

In [35]: %timeit df + df2                                                                                                                                                                         
572 ms ± 55 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  # <-- master
113 ms ± 10.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)  # <-- PR

In [38]: %timeit df + df                                                                                                                                                                                
527 ms ± 69.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  # <-- master
2.51 ms ± 24.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # <-- PR

@jbrockmendel
Copy link
Member Author

gentle ping

@jorisvandenbossche
Copy link
Member

Sorry for the slow follow-up, will take a look at your latest changes tomorrow!

@jorisvandenbossche
Copy link
Member

just updated with a simpler variant of #33597 that is effectively operating column-wise in pretty-precisely those situations where doing so will avoid copies being made.

That indeed nicely avoids copies in some cases!
However, the example that I gave above (third paragraph at #32779 (comment)) is still using "take" instead of slice: so suppose a left df with "int, int, float, int" columns and a right df with "int, int, int, int" columns (eg because of a NaN being present). Such a case is still copying a part of the right block.
Now, I suppose further special cases can be added to _slice_take_blocks_ax0 to also avoid a copy here.


Personally, I still think that with something a little bit simpler (eg the "only do block-wise in case of identical block layout"), we can reduce the complexity quite a bit while at the same time preserving the performance speedup for the majority of use cases of wide frames.

operator.gt,
operator.ge,
operator.lt,
operator.le,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it needed to test it with all those different ops?
I think what we are interested in catching with the benchmark, is the general code handling wide dataframes (how the dispatching to the actual op is done, dispathing to block/column instead of series, etc), not the actual op right? So for those aspects, they all use the same code, and testing all ops seems overkill? (it just adds to the number of benchmarks needlessly, making it harder to run the benchmark suite / interpret the results)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it needed to test it with all those different ops?

No strong opinion here.

making it harder to run the benchmark suite

Yah, this is a hassle. Best guess is eventually we're going to land on a "throw hardware at the problem" solution, but that doesn't help the local-running troubles.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinion here.

Then let's keep only 2 of them or so, I would say

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about four: one comparison, one logical, floordiv (for which we have special handling), and one other arithmetic

asv_bench/benchmarks/arithmetic.py Show resolved Hide resolved
pandas/core/ops/blockwise.py Outdated Show resolved Hide resolved
pandas/core/internals/managers.py Show resolved Hide resolved
warn = PerformanceWarning if box_with_array is not pd.DataFrame else None
warn = None
if box_with_array is not pd.DataFrame or tz_naive_fixture is None:
warn = PerformanceWarning
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why this changed? We do / do not raise a warning on the array-level?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't do a warning when operating op(arr, length_1_object_array), which turns out to be the cases described here (in reverse)

@jorisvandenbossche
Copy link
Member

Expanding on the last paragraph of my comment above:

Personally, I still think that with something a little bit simpler (eg the "only do block-wise in case of identical block layout"), we can reduce the complexity quite a bit while at the same time preserving the block-wise performance speedup for the majority of use cases of wide frames (in the assumption that wide dataframes typically have uniform dtypes).

Now, if both Jeff and Tom are fine with the current PR instead of my reduced proposal, I am not going to further block it. I made my stance clear, but if the majority prefers otherwise, I go with that.

@jbrockmendel
Copy link
Member Author

@jorisvandenbossche thanks for new round of comments. Looking into the non-slicing case you pointed out now.

@jbrockmendel
Copy link
Member Author

Updated to avoid copy in the case Joris identified.

gentle ping @jreback @TomAugspurger

@TomAugspurger
Copy link
Contributor

I haven't been able to stay up to date with this. I wouldn't wait around for me.

@jbrockmendel
Copy link
Member Author

Travis is green, just hasnt updated the icon here

@jbrockmendel
Copy link
Member Author

rebased+green

blocks = []
for i, ml in enumerate(slobj):
nb = blk.getitem_block([ml], new_mgr_locs=i)
print(nb.shape, np.values.shape)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra print here, canyou use a list-comprehension here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated+green

elif only_slice:
# GH#33597 slice instead of take, so we get
# views instead of copies
for i, ml in zip(taker, mgr_locs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use a list comprehesnion

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

above on 1317 i can, here it is less clean

# else:
# assert res_values.shape == lvals.shape, (res_values.shape, lvals.shape)

for nb in nbs:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do this here

return new_mgr


def _reset_block_mgr_locs(nbs: List["Block"], locs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returns None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though I would actually return the List['Block'] here as conceptually simpler to grok

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though I would actually return the List['Block'] here as conceptually simpler to grok

that can give the misleading impression that args are not being altered in place.

@jreback jreback merged commit b9ad20a into pandas-dev:master May 19, 2020
@jreback
Copy link
Contributor

jreback commented May 19, 2020

ok thanks @jbrockmendel

let's see how this goes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internals Related to non-user accessible pandas implementation Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants