Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation for parallel `read_excel` #467

Open
modin-bot opened this issue Feb 8, 2019 · 4 comments

Comments

@modin-bot
Copy link

commented Feb 8, 2019

馃 This is a bot message 馃

feature_requests@modin.org has been sent an email requesting parallel implementation for read_excel.

Note: Issues are created only once per method.

@devin-petersohn

This comment has been minimized.

Copy link
Member

commented Feb 8, 2019

Currently, we default to pandas on this.

There may be a way to do this relatively simply with a similar way that read_parquet and read_hdf are implemented. read_excel has a usecols parameter that we can use to read a subset of the columns and distributed the reading that way.

If excel files are not column-oriented this may not be faster. If that is the case we may want to use a similar approach to the read_csv reader.

I am posting as Help Wanted because we have some good examples of parallel readers already implemented.

@devin-petersohn devin-petersohn referenced this issue Jun 5, 2019

Open

Modin project suggestions #658

6 of 22 tasks complete
@sunnyjiechao

This comment has been minimized.

Copy link

commented Jul 1, 2019

@devin-petersohn Thanks, I am trying to contribute to this issue, let me know if you have some good advice about it where 'If excel files are not column-oriented'; one more thing, I don't get your meaning that what kinds of excel is column-oriented and what kind of excel is not column-oriented, can you show me an excel example for the two cases.

@devin-petersohn

This comment has been minimized.

Copy link
Member

commented Jul 1, 2019

@sunnyjiechao You can ignore that statement here, I think it just makes sense to get something similar to the read_hdf functionality implemented and we can tune the performance after we have a good working implementation.

If you would like to implement this, take a look at read_hdf and the codepath for that to get a sense about how to implement read_excel. It should be quite similar.

Let me know if you have any questions on the implementation (those questions are best directed to the Discourse board: https://discuss.modin.org

@sunnyjiechao

This comment has been minimized.

Copy link

commented Jul 2, 2019

Sure, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can鈥檛 perform that action at this time.