Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

custom dispatchers #4

Merged
merged 3 commits into from Feb 15, 2021
Merged

custom dispatchers #4

merged 3 commits into from Feb 15, 2021

Conversation

RikVoorhaar
Copy link
Contributor

I implemented custom dispatchers as discussed in #3.


I wrote some tests to confirm it works as expected. One potential caveat, functions like np.stack can also eat a one-dimensional array, in which case it acts like the identity function. In those cases backend is inferred from first element of this one-dimensional array by join_array_dispatcher. This works correctly for all supported backends except sparse, where backend is inferred as numpy. This is however not the intended usage of stack anyway, so I don't think this is a problem.


For join_array_dispatcher we could optionally check if args[0][0] even exists in the first place, and if not just return args[0]. This is only relevant if you want to supply it empty sequences or 0-dimensional arrays (which numpy doesn't allow, for example).
If so, one can check this with hasattr(args[0], '__getitem__'). This is a bit nicer than a try: / except: block, especially since we have to catch both IndexErrors and TypeErrors.


It seems my auto formatter black was wreaking havoc on your code. Biggest thing it did is replacing single quotes for double quotes everywhere. If this bothers you, I will do a commit with only my actual changes. Otherwise, do you use a particular auto formatter for this project?


I also took the liberty of adding translations for np.take. I'm a bit confused about the syntax of make_translator. For torch implementation of take (i.e. index_select) the names of arguments and order of arguments is both different, and I'm not 100% sure I did it correctly (at least it seems to work).

@pep8speaks
Copy link

pep8speaks commented Feb 14, 2021

Hello @RikVoorhaar! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-02-15 16:00:15 UTC

@codecov
Copy link

codecov bot commented Feb 14, 2021

Codecov Report

Merging #4 (03c66a4) into master (1a7c9a3) will decrease coverage by 0.27%.
The diff coverage is 97.95%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master       #4      +/-   ##
==========================================
- Coverage   97.58%   97.30%   -0.28%     
==========================================
  Files           1        1              
  Lines         455      482      +27     
==========================================
+ Hits          444      469      +25     
- Misses         11       13       +2     
Impacted Files Coverage Δ
autoray/autoray.py 97.30% <97.95%> (-0.28%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1a7c9a3...03c66a4. Read the comment docs.

Copy link
Owner

@jcmgray jcmgray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for the PR! Looks generally great.

Regarding your first couple of points I've suggested something that should dispatch on args[0] or args[0][0] or args[0][0][0] etc., but haven't tested. Let me know your thoughts.

Happy to keep the black formatting, although the two conflicts with PEP8 (my usual linter) would be nice to resolve (sounds like making line length 80 -> 79 is a common one) at maybe some later point.

I think you figured/guessed the make_translator call correctly. It's not very clear but the idea is you take the ordered args you want (i.e. the numpy syntax) as the left tuple items, and translate them into the backend args + desired defaults (right tuple items).

@@ -75,7 +75,10 @@ def do(fn, *args, like=None, **kwargs):
<tf.Tensor: id=91, shape=(3, 3), dtype=float32>
"""
if like is None:
backend = infer_backend(args[0])
dispatch_arg = _DISPATCHERS.get(fn, default_dispatcher)(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If using the defaultdict this line can just be _DISPATCHERS[fn](....

Comment on lines 432 to 436
def join_array_dispatcher(*args, **kwargs):
# NOTE: To be super safe, we could do a try / except, or
# hasattr(args[0], '__getitem__'). hasattr is then probably better,
# since there would be multiple types of errors to catch
return args[0][0]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One solution here that might? solve the two problems you mention is:

def join_array_dispatcher(arg0, *args, **kwargs):
    if infer_backend(arg0)== 'builtins':
        return join_array_dispatcher(arg0[0])
    return arg0

though you will still get an error with an e.g. empty tuple.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also not perfect. It requires additional lookups, whereas the situation where args[0][0] is not defined is rare and not intended usage (at least within numpy API). Furthermore this will fail in the case that the sequences is not of builtins type, for example if you use a numpy object array containing e.g. torch tensors.
Granted that's not a common use-case, but in those cases it will silently convert the tensors to the wrong type, and I think it's better to just throw an error, because in any case you should not call these functions with an empty tuple or one-dimensional array.

I think we should go for something like

def join_array_dispatcher(*args, **kwargs):
    try:
        return args[0][0]
    except (TypeError, IndexError):
       raise ValueError("array joining function should not be called with empty argument")

This way the user gets an informative error if called with wrong argument. Overhead of error handling should be less than the additional lookup you proposed, although I haven't timed it.
This still will silently convert to numpy backend if called with a one-dimensional sparse array. I think that's however fine.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, I'm happy with this logic also.

Comment on lines 450 to 455
# einsum takes string as first argument a string, but backend should
# be inferred from second argument.
def einsum_dispatcher(*args, **kwargs):
if isinstance(args[0], str):
return args[1]
return args[0]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to elaborate here, einsum has the two following formats:

einsum('ab,bc->ca', x, y)
einsum(x, [0, 1], y, [1, 2], [2, 0])

thus line 455 which is uncovered by tests currently. I think fine as is, just thought I'd mention it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right I forgot about the second format. I think it's worth updating the tests to see if this works as intended for all libraries.

== ar.infer_backend(C2)
== ar.infer_backend(C3)
== backend
)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a black preference (no final newline)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's an honest mistake. Black is an autoformatter to be used in conjunction with the pycodestyle linter, which also wants a newline at end of file. Most of the time pycodestyle and the PEP8 guidelines agree. Biggest difference is the standard maximum line width, which is 79 for PEP8. However on a 1080p screen you can easily fit two editors next to each other with line width ~100 chars, so many people consider this line width a bit too strict.

@jcmgray
Copy link
Owner

jcmgray commented Feb 14, 2021

Another very minor point, but all the do('stack', ..., like=...) instances can also be nicely simplified in the readme and tests now.

@RikVoorhaar
Copy link
Contributor Author

Another very minor point, but all the do('stack', ..., like=...) instances can also be nicely simplified in the readme and tests now.

good point, I'll update tests and readme

@RikVoorhaar
Copy link
Contributor Author

I made the changes, please have another look.
Unfortunately I don't have everything set up to run tests locally on this machine, but if some tests fail I'll fix it soon.

@RikVoorhaar
Copy link
Contributor Author

Only test that fails is test_linalg_solve[float32-numpy], on one machine. Maybe loosen rtol a little bit for this test?

@jcmgray
Copy link
Owner

jcmgray commented Feb 15, 2021

All looks good, thanks very much for those changes and the PR! I'll tweak that unrelated failing test a bit.

@jcmgray jcmgray merged commit 5fdcbdf into jcmgray:master Feb 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants