Add pyspark model base class #8

schuderer · 2019-06-16T16:33:10Z

~~using system from #7~~

supporting pyspark (possibly very little user-model-related stuff necessary, except maybe convenience base-class-to-use to deal with spark context.

schuderer · 2021-10-18T08:52:36Z

Actually, I found that it makes much more sense to just use the (currently experimental) Spark DataSource #145 --> It will create the connection and spark object/context for you based on the config. Furthermore, once you get your hands on a spark dataframe (which the Spark DataSource provides through get_spark_dataframe()), you usually don't need to handle the spark context explicitly any more anyway (unless you want to store spark DFs -- ideally, a data sink should be created for this. Let me know if you need this, and I'll take a look at it).

So, in essence, spark context handling is against the gist of what ML Launchpad tries to achieve -- getting the I/O as much as possible out of your hair. That's what the Spark DataSource should provide.

If you absolutely need a spark context (and are not querying a spark dataframe), you can configure a dummy spark datasource (e.g. with query 'select 1 as dummy'). That SQL query does not even get executed unless you call get_spark_dataframe() or get_dataframe(), but you have access to the connection through data_source.spark anyway.

schuderer added the enhancement New feature or request label Jun 16, 2019

schuderer modified the milestone: Plugin System Jun 16, 2019

schuderer added this to To do in Prioritized User Issues via automation May 17, 2021

schuderer mentioned this issue Jun 22, 2021

Add spark support #145

Merged

schuderer closed this as completed in #145 Oct 18, 2021

Prioritized User Issues automation moved this from To do to Done Oct 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pyspark model base class #8

Add pyspark model base class #8

schuderer commented Jun 16, 2019 •

edited

schuderer commented Oct 18, 2021

Add pyspark model base class #8

Add pyspark model base class #8

Comments

schuderer commented Jun 16, 2019 • edited

schuderer commented Oct 18, 2021

schuderer commented Jun 16, 2019 •

edited