Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add user-interface for Runtime UDF registration #261

Closed
pearu opened this issue Sep 10, 2019 · 7 comments
Closed

Add user-interface for Runtime UDF registration #261

pearu opened this issue Sep 10, 2019 · 7 comments

Comments

@pearu
Copy link
Contributor

pearu commented Sep 10, 2019

The latest OmnisciDB release contains support for Runtime UDF registration. Python users can use the rbc package for registering Python functions as Runtime UDFs in OmnisciDB server.

This issue proposes adding Runtime UDF registration user-interface to pymapd. The proposal would consist of the following tasks:

  • Add __call__ method to Connection class so that the Connection instance can be used as a decorator on Python functions.
  • Add UDF registration call (provided by rbc) to all query methods like select_ipc_gpu, etc.
  • Add rbc package optional dependency to pymapd. While rbc is a pure Python package, it depends on the following packages: numba, llvmlite>=0.29, tblib, thriftpy2, and six. The rbc optional dependency means that the rbc would be imported within Connection.__call__ method only, that is, pymapd would be functional even when rbc is not installed.

As a result, here is how a typical workflow of using Runtime UDFs would look like:

from pymapd import connect
con = connect(user="admin", password="HyperInteractive", host="localhost", dbname="omnisci")

# Define a Runtime UDF that computes the sum of its arguments:
@con('int32(int32, int32)')
def totaldelay(depdelay, arrdelay):
    return depdelay + arrdelay

# Running the query would preceed a UDF registration call (when UDFs have been defined).
df = con.select_ipc_gpu("SELECT depdelay, arrdelay, totaldelay(depdelay, arrdelay) FROM flights_2008_10k LIMIT 100")

Are there any suggestions or concerns about the proposed UI for Runtime UDFs?

@randyzwitch
Copy link
Contributor

My biggest concern about adding these as dependencies is that we're trying to maintain pip/PyPI installation. Are any of these packages conda only?

@pearu
Copy link
Contributor Author

pearu commented Sep 10, 2019

All these packages are available via pip. The rbc itself is available via pip under the name rbc-project.

@randyzwitch
Copy link
Contributor

Some of them are dependent on Cython? That's the biggest issue that caused us to re-write the IPC code, people were having a hard time installing pymapd via pip (especially on Windows)

@pearu
Copy link
Contributor Author

pearu commented Sep 10, 2019

thriftpy2 has optional Cython dependency that is disabled for non-linux platforms.
I guess the best answer comes from trying it out on Windows.

@pearu
Copy link
Contributor Author

pearu commented Sep 12, 2019

I have updated rbc to work on Windows. Also installing rbc via pip seems to work fine (tested on Windows).

@randyzwitch
Copy link
Contributor

Please go ahead and start working on this @pearu, and before we merge we can do a final pip/conda & linux/osx/windows check

@randyzwitch
Copy link
Contributor

This was implemented in #272

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants