Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider: should UDF implementations be scoped to a backend? #8748

Open
1 task done
NickCrews opened this issue Mar 22, 2024 · 1 comment
Open
1 task done

consider: should UDF implementations be scoped to a backend? #8748

NickCrews opened this issue Mar 22, 2024 · 1 comment
Labels
feature Features or general enhancements

Comments

@NickCrews
Copy link
Contributor

NickCrews commented Mar 22, 2024

Is your feature request related to a problem?

I have this UDF:

@ibis.udf.scalar.builtin
def damerau_levenshtein(left: str, right: str) -> int:
    ...

this only works in duckdb (or any backend with a builtin function called damerau_levenshtein).

I have some library function like def address_similarity(a1: ir.StringValue, a2: ir.StringValue) -> ir.FloatingValue. Internally it wants to use damerau levenshtein string edit distance to calculate the score. But, when a user hands me an abstract expression, I don't know what backend they are hoping to execute it on. If they are going to execute it on duckdb, then using the building UDF would work fine. But if they are going to execute it on a different backend, then I would want to fall back to some python/pyarrow UDF. But I don't know which to do at expression creation time!

Describe the solution you'd like

spitballing here:

# other args like name, database, etc aren't allowed here. This is just creating the contract on the ibis side.
@ibis.udf.scalar(signature=...)
def damerau_levenshtein(left: str, right: str) -> int: ...

# now we plug in implementations...
@damerau_levenshtein.builtin(backends=["duckdb", ...], name="damerau_levenshtein", database=...)
def _damerau_levensthein_duckdb(): ...

# backends=None means use this as the fallback
@damerau_levenshtein.python(backends=None, database=...)
def _damerau_levensthein_udf(s1: str, s2: str) -> str:
    return somelib.damlev(s1, s2)

def address_similarity(a1, a2):
   return damerau_levenshtein(a1, a2)

The old APIs should remain working as they did, I don't think they need to change?

What version of ibis are you running?

main

What backend(s) are you using, if any?

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@NickCrews NickCrews added the feature Features or general enhancements label Mar 22, 2024
@NickCrews NickCrews changed the title consider: should builtin UDFs be scoped to a backend? consider: should UDF implementations be scoped to a backend? Mar 22, 2024
@cpcloud
Copy link
Member

cpcloud commented Apr 15, 2024

It seems like this is something that many folks are wanting/asking about.

I think we should try to include this in the 10.0.0 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements
Projects
Status: backlog
Development

No branches or pull requests

2 participants