Feature Request: df.apply() #350

aamirkhan34 · 2021-06-01T18:08:16Z

Requesting the feature df.apply() . I did not find any issues regarding this.

Thanks.

sethmlarson · 2021-06-01T18:20:34Z

This is unlikely to be implemented as its no more efficient than ed_df.to_pandas().apply(). Is there a use-case in particular that's more efficient to implement and desirable?

aamirkhan34 · 2021-06-01T18:58:44Z

Thanks @sethmlarson.
The ed_df.to_pandas() method is slow and might run out of memory for larger sample. I want to harness the power of our elasticsearch cluster to process the eland dataframe using apply method. This will be very efficient for our process.

What do you think?

Thanks.

sethmlarson · 2021-07-31T17:06:12Z

Unfortunately apply is very generic, you can basically pass it anything and we can't transform arbitrary Python functions into an Elasticsearch query. Is there some operation(s) in particular you're interested in?

kxbin · 2021-08-03T05:45:11Z

Thanks @sethmlarson.
The ed_df.to_pandas() method is slow and might run out of memory for larger sample. I want to harness the power of our elasticsearch cluster to process the eland dataframe using apply method. This will be very efficient for our process.

What do you think?

Thanks.

Yeah, I also found that ed_df.to_pandas() method is very slow for larger sample.

So, Maybe we can process the data in batches，Like this:

pd_df_iterator = ed_df.to_pandas_in_batch(batch_size=1000)
for pd_df in pd_df_iterator:
    pd_df.apply()

After testing, I found that the speed has increased a lot, because the amount of data in each batches is determined by batch_size.

Here is a pull request to handle this situation:
Add to_pandas_in_batch() DataFrame API #369

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: df.apply() #350

Feature Request: df.apply() #350

aamirkhan34 commented Jun 1, 2021

sethmlarson commented Jun 1, 2021

aamirkhan34 commented Jun 1, 2021

sethmlarson commented Jul 31, 2021

kxbin commented Aug 3, 2021 •

edited

Loading

Feature Request: df.apply() #350

Feature Request: df.apply() #350

Comments

aamirkhan34 commented Jun 1, 2021

sethmlarson commented Jun 1, 2021

aamirkhan34 commented Jun 1, 2021

sethmlarson commented Jul 31, 2021

kxbin commented Aug 3, 2021 • edited Loading

kxbin commented Aug 3, 2021 •

edited

Loading