Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert spark dataframe to Optimus #513

Closed
sophia-wright-blue opened this issue Apr 26, 2019 · 7 comments

Comments

Projects
None yet
2 participants
@sophia-wright-blue
Copy link

commented Apr 26, 2019

thanks for releasing this amazing repo: I have a very basic question:

I'd like to read in my data and create a dataframe with Spark. I do a few data preprocessing steps on this Spark dataframe. I now have a Spark dataframe that I'd like to use with the functionality available with Optimus, so how do I convert a Spark dataframe so I can use it as if it were created by reading in the data directly with Optimus?

Hope that makes sense. Thanks,

@issue-label-bot

This comment has been minimized.

Copy link

commented Apr 26, 2019

Issue-Label Bot is automatically applying the label feature_request to this issue, with a confidence of 0.56. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

@argenisleon

This comment has been minimized.

Copy link
Member

commented Apr 26, 2019

Hi @sophia-wright-blue,

We are working is something like this:

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('abc').getOrCreate()
df=spark.read.csv('data/foo.csv',header=True)

from optimus import Optimus
op = Optimus(spark)

# Here do you have all Optimus functionality (except DL)
df.table()

What do you think?

@sophia-wright-blue

This comment has been minimized.

Copy link
Author

commented Apr 27, 2019

That'd be very helpful, currently I use the library handyspark (https://github.com/dvgodoy/handyspark), and that has something similar, where we need to do

hdf = sdf.toHandy()

where sdf is a spark dataframe;

I'm assuming there is also a plan to convert back to Spark dataframe?

Looking forward to this change; Thanks again for releasing this repo - I hope to see more updates on Optimus

@argenisleon

This comment has been minimized.

Copy link
Member

commented Apr 29, 2019

Thanks for the feedback @sophia-wright-blue,
I am not sure how handyspark works but in Optimus, a df is always a Spark Dataframe. Optimus, monkey patch the Spark df to add new functions.

Can you explain a little why you need to convert back to a Spark dataframe?

@sophia-wright-blue

This comment has been minimized.

Copy link
Author

commented Apr 29, 2019

I have an existing workflow in PySpark where I'd like to use the existing code for data pre-processing and post-processing ; I'd like to add to this workflow using all of the functionality available in Optimus; thanks for the prompt reply @argenisleon

@argenisleon argenisleon self-assigned this Apr 29, 2019

@argenisleon

This comment has been minimized.

Copy link
Member

commented Apr 29, 2019

You do not have to change anything to add Optimus to your workflow. I'll be pushing this feature today

@sophia-wright-blue

This comment has been minimized.

Copy link
Author

commented Apr 30, 2019

thanks for the prompt replies @argenisleon , i'll follow #516

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.