Join GitHub today
sparklyr and SparkR - the future? #502
Just checked out @nwstephens's talk at Spark Summit East where he answers some questions regarding sparklyr vs. SparkR towards the end. I've been thinking about this and am just a little bit concerned about how things might play out.
While I'm a fan of having different ways to solve a problem, sometimes it gets in the way of collaboration. As an example, most of the work I've done involves data manipulation/transformation, and it's been difficult for dplyr and data.table "speakers" to work together on the same task. What I'm worried about is that we'll have a "sparklyr camp" and a "SparkR camp" come next year, and we'll further factionalize the community and in turn discourage new data scientists from picking up R.
Would be interested to get some thoughts from the main contributors and results of discussions with the SparkR folks.
Some of the main motivations behind why we decided to write
All in all, we wanted
That said, we certainly haven't reached that goal yet (there are things SparkR does better than sparklyr currently; for example, parallel execution of R code across Spark nodes) but we hope to get there in the future. And while you won't be able to (easily) write code that uses SparkR and sparklyr at the same time, there's nothing stopping users from using them independently to mutate datasets in the same data store, so I think it's still an overall net win for users.
Follow up regarding:
This is no longer the case, later in 2017 we introduced