Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add matmul to DataFrame, Series #19035

Merged
merged 5 commits into from
Mar 30, 2018
Merged

ENH: Add matmul to DataFrame, Series #19035

merged 5 commits into from
Mar 30, 2018

Conversation

bnaul
Copy link
Contributor

@bnaul bnaul commented Jan 2, 2018

Fixes #10259. Not sure about the exact set of types that should be supported, for now I just copied the classes that are referenced in dot.

@codecov
Copy link

codecov bot commented Jan 2, 2018

Codecov Report

Merging #19035 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #19035      +/-   ##
==========================================
- Coverage   91.83%   91.82%   -0.02%     
==========================================
  Files         152      152              
  Lines       49260    49269       +9     
==========================================
+ Hits        45240    45241       +1     
- Misses       4020     4028       +8
Flag Coverage Δ
#multiple 90.21% <100%> (-0.02%) ⬇️
#single 41.9% <50%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/series.py 93.87% <100%> (+0.01%) ⬆️
pandas/core/frame.py 97.19% <100%> (ø) ⬆️
pandas/plotting/_converter.py 65.07% <0%> (-1.74%) ⬇️
pandas/core/generic.py 95.85% <0%> (ø) ⬆️
pandas/util/_decorators.py 82.4% <0%> (+0.14%) ⬆️
pandas/util/testing.py 84.73% <0%> (+0.2%) ⬆️
pandas/core/nanops.py 96.69% <0%> (+0.39%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0a00365...c036ed0. Read the comment docs.

@gfyoung gfyoung added Compat pandas objects compatability with Numpy or Python functions Numeric Operations Arithmetic, Comparison, and Logical operations Python 3.5 labels Jan 2, 2018
@@ -2067,6 +2067,58 @@ def test_dot(self):
with tm.assert_raises_regex(ValueError, 'aligned'):
df.dot(df2)

def test_matmul(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference issue number.

@@ -1625,6 +1625,13 @@ def dot(self, other):
else: # pragma: no cover
raise TypeError('unsupported type: %s' % type(other))

def __matmul__(self, other):
from pandas.core.frame import DataFrame
dot_types = (Index, DataFrame, Series, np.ndarray, list)
Copy link
Member

@gfyoung gfyoung Jan 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your tests need to ensure that for every supported dot type, this operation works. I don't think that's the case currently in your changes.

@@ -866,6 +866,12 @@ def dot(self, other):
else: # pragma: no cover
raise TypeError('unsupported type: %s' % type(other))

def __matmul__(self, other):
dot_types = (Index, DataFrame, Series, np.ndarray, list)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this should become a common resource, since you use it in multiple places.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add a whatsnew note in other enhancements

dot_types = (Index, DataFrame, Series, np.ndarray, list)
if not isinstance(other, dot_types):
return NotImplemented
return self.dot(other)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to specify the dot_types here, these are already handled in .dot (though maybe not tested fully). if something is rejected it should be there

@@ -2067,6 +2067,58 @@ def test_dot(self):
with tm.assert_raises_regex(ValueError, 'aligned'):
df.dot(df2)

def test_matmul(self):
a = DataFrame(np.random.randn(3, 4), index=['a', 'b', 'c'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of writing a new test, just parameterize test_dot on dot/matmul as this is pretty much the same

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you will need to skip the matmul tests if not PY35

@@ -837,6 +837,32 @@ def test_dot(self):
pytest.raises(Exception, a.dot, a.values[:3])
pytest.raises(ValueError, a.dot, b.T)

def test_matmul(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@bnaul bnaul force-pushed the matmul branch 2 times, most recently from 5519e13 to d5110ed Compare January 6, 2018 23:12
@bnaul
Copy link
Contributor Author

bnaul commented Jan 6, 2018

Per @jreback 's comment I changed the implementation to just run dot directly and rely on its type verification; for an unsupported type dot will throw TypeError and __matmul__ will catch and return NotImplemented (per #10259 (comment)).

Re: Python<3.5: it seems like we can still run the test for .__matmul__ even though the @ operator doesn't exist, right?

@@ -1596,6 +1596,12 @@ def dot(self, other):
else: # pragma: no cover
raise TypeError('unsupported type: %s' % type(other))

def __matmul__(self, other):
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u add a docstring

@@ -874,6 +874,12 @@ def dot(self, other):
else: # pragma: no cover
raise TypeError('unsupported type: %s' % type(other))

def __matmul__(self, other):
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc string

@@ -2075,41 +2075,42 @@ def test_clip_with_na_args(self):
self.frame)

# Matrix-like

@pytest.mark.parametrize('dot_fn', [DataFrame.dot, DataFrame.__matmul__])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u add a test that hits the not implemented error

@@ -202,6 +202,7 @@ Other Enhancements
- ``Resampler`` objects now have a functioning :attr:`~pandas.core.resample.Resampler.pipe` method.
Previously, calls to ``pipe`` were diverted to the ``mean`` method (:issue:`17905`).
- :func:`~pandas.api.types.is_scalar` now returns ``True`` for ``DateOffset`` objects (:issue:`18943`).
- :class:`DataFrame` and :class:`Series` now support matrix multiplication (```@```) operator (:issue:`10259`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

say for >= 3.5

@@ -1596,6 +1596,12 @@ def dot(self, other):
else: # pragma: no cover
raise TypeError('unsupported type: %s' % type(other))

def __matmul__(self, other):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u update the .dot doc string that ‘@‘ operator is supported in >=3.5

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean update the .dot doc string? I believe that is done

@jreback
Copy link
Contributor

jreback commented Feb 10, 2018

can you rebase / update

@bnaul bnaul force-pushed the matmul branch 2 times, most recently from 36a0cb8 to 419c9d2 Compare February 19, 2018 20:02
@bnaul
Copy link
Contributor Author

bnaul commented Feb 19, 2018

@jreback rebased and added docstrings.

I also tried to add a test for the NotImplemented case but I realized that the way .dot() is currently structured makes it impossible to actually reach that code: if other is not a series or data frame, it gets passed to rvals = np.asarray(other); so the subsequent type-checking will always finish by elif isinstance(rvals, np.ndarray) and short-circuit the TypeError. This is arguably a slight flaw in .dot() but I don't think this particular PR alters the behavior at all.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comments. lgtm otherwise. yeah I think the np.asarray(other) call is actually ok.

def __matmul__(self, other):
""" Matrix multiplication using binary `@` operator in Python>=3.5 """
try:
return self.dot(other)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so let's remove the try/except then

def __matmul__(self, other):
""" Matrix multiplication using binary `@` operator in Python>=3.5 """
try:
return self.dot(other)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@bnaul
Copy link
Contributor Author

bnaul commented Feb 20, 2018

removed!

@jreback jreback added this to the 0.23.0 milestone Feb 20, 2018
@jreback
Copy link
Contributor

jreback commented Feb 20, 2018

lgtm. ping on green!

@bnaul
Copy link
Contributor Author

bnaul commented Feb 20, 2018

@jreback 💚

@@ -895,28 +895,30 @@ def test_count(self):
ts.iloc[[0, 3, 5]] = nan
assert_series_equal(ts.count(level=1), right - 1)

def test_dot(self):
@pytest.mark.parametrize('dot_fn', [Series.dot, Series.__matmul__])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer operator.matmul (which corresponds to the @ operator) to Series.__matmul__.

@shoyer
Copy link
Member

shoyer commented Feb 20, 2018

You really should implement __rmatmul__ here, so series @ ndarray works as well as ndarray @ series. This will be pretty surprising otherwise.

@shoyer
Copy link
Member

shoyer commented Feb 20, 2018

I also tried to add a test for the NotImplemented case but I realized that the way .dot() is currently structured makes it impossible to actually reach that code: if other is not a series or data frame, it gets passed to rvals = np.asarray(other); so the subsequent type-checking will always finish by elif isinstance(rvals, np.ndarray) and short-circuit the TypeError. This is arguably a slight flaw in .dot() but I don't think this particular PR alters the behavior at all.

I don't think this is a great excuse for not implementing NotImplemented for __matmul__. There's no need for a protocol for .dot() since the implementation has been explicitly indicated.

That said, it seems that pandas doesn't currently defer to other objects for any arithmetic operations, so I don't see any particularly good reasons to start here first.

@jreback
Copy link
Contributor

jreback commented Mar 16, 2018

can you rebase and update

@bnaul bnaul force-pushed the matmul branch 2 times, most recently from 9922a78 to 41bc7b1 Compare March 17, 2018 19:38
@pep8speaks
Copy link

pep8speaks commented Mar 17, 2018

Hello @bnaul! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on March 27, 2018 at 23:15 Hours UTC

@bnaul bnaul force-pushed the matmul branch 2 times, most recently from e391293 to 236bae4 Compare March 17, 2018 19:47
@bnaul
Copy link
Contributor Author

bnaul commented Mar 17, 2018

Added __rmatmul__ as @shoyer suggested. Because the functionality is sufficiently different from .dot now I separated the two tests.

Series.__rmatmul__ seems fairly uncomplicated but DataFrame is a bit more so. If anyone has thoughts on a better way to implement it I'd be keen to hear it.

@jreback
Copy link
Contributor

jreback commented Mar 25, 2018

@bnaul can you rebase

@@ -1596,6 +1596,12 @@ def dot(self, other):
else: # pragma: no cover
raise TypeError('unsupported type: %s' % type(other))

def __matmul__(self, other):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do this

@jreback
Copy link
Contributor

jreback commented Mar 25, 2018

@jbrockmendel do these belong in ops.py rather as defined in-line?

@jbrockmendel
Copy link
Member

do these belong in ops.py rather as defined in-line?

I think the PR defines the methods in the right place. Defining things directly in the class makes for more obvious code than pinning from afar.

The tests might belong in test_arithmetic. That could go either way.

@bnaul can I get you to add a couple of mixed-dtype tests? e.g. a DataFrame with a mix of integer/float columns.

@bnaul
Copy link
Contributor Author

bnaul commented Mar 27, 2018

@jbrockmendel What would a mixed dtype test case test exactly? I can't think of a scenario where everything wouldn't just get coerced to floats.

@jbrockmendel
Copy link
Member

What would a mixed dtype test case test exactly? I can't think of a scenario where everything wouldn't just get coerced to floats.

exactly that it does get coerced correctly. Some methods don't play nicely with mixed-dtypes

@bnaul
Copy link
Contributor Author

bnaul commented Mar 28, 2018

Makes sense, lmk if you think the tests I added missed anything

@jreback I think I already made the change you requested

@jbrockmendel
Copy link
Member

Looks good, thanks

@jreback jreback merged commit d7a4f5b into pandas-dev:master Mar 30, 2018
@jreback
Copy link
Contributor

jreback commented Mar 30, 2018

thanks @bnaul nice patch!

kornilova203 pushed a commit to kornilova203/pandas that referenced this pull request Apr 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants