Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparklyr Support Future Frontend. #1935

Open
harryprince opened this issue Feb 28, 2019 · 2 comments
Open

Sparklyr Support Future Frontend. #1935

harryprince opened this issue Feb 28, 2019 · 2 comments

Comments

@harryprince
Copy link

harryprince commented Feb 28, 2019

related to futureverse/future#286, A good idea comes from future package.

the future package provides a great parallel framework across multi-engine, including socket, fork , mpi, multiprocess and so on.

wish to use future pakcage with spark_apply function.

@harryprince harryprince changed the title Support Future Frontend. Sparklyr Support Future Frontend. Feb 28, 2019
@javierluraschi
Copy link
Collaborator

javierluraschi commented Mar 9, 2019

Ah, this would be really interesting! So, Spark recently introduced the concept of a barrier which is meant to be used with deep learning workflows but is generic enough that we can use for anything.

I created a work item a while ago that tracks support for barriers in sparklyr here: #1791

If we could support barrier execution in sparklyr, a user would be free to use Spark executors for whatever they want, including using the future package.

See also, Barrier Execution Mode.

@harryprince
Copy link
Author

harryprince commented Apr 17, 2019

Ah, this would be really interesting! So, Spark recently introduced the concept of a barrier which is meant to be used with deep learning workflows but is generic enough that we can use for anything. ...

@javierluraschi barrier makes spark support MPI mechanism with better failover design. it's awesome for machine leaning. And pyspark pandas udf seems pretty powerful, can we implement it in dplyr way which is a pretty common desired operation?

https://databricks.com/blog/2017/10/30/introducing-vectorized-udfs-for-pyspark.html

Sent with GitHawk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants