You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Actually, I found that it makes much more sense to just use the (currently experimental) Spark DataSource #145 --> It will create the connection and spark object/context for you based on the config. Furthermore, once you get your hands on a spark dataframe (which the Spark DataSource provides through get_spark_dataframe()), you usually don't need to handle the spark context explicitly any more anyway (unless you want to store spark DFs -- ideally, a data sink should be created for this. Let me know if you need this, and I'll take a look at it).
So, in essence, spark context handling is against the gist of what ML Launchpad tries to achieve -- getting the I/O as much as possible out of your hair. That's what the Spark DataSource should provide.
If you absolutely need a spark context (and are not querying a spark dataframe), you can configure a dummy spark datasource (e.g. with query 'select 1 as dummy'). That SQL query does not even get executed unless you call get_spark_dataframe() or get_dataframe(), but you have access to the connection through data_source.spark anyway.
using system from #7supporting pyspark (possibly very little user-model-related stuff necessary, except maybe convenience base-class-to-use to deal with spark context.
The text was updated successfully, but these errors were encountered: