New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix typo in tutorial_data_conversion.rst #69

Merged
merged 1 commit into from Jan 14, 2017

Conversation

Projects
None yet
3 participants
@rmzelle
Contributor

rmzelle commented Jan 13, 2017

By the way, could you add some example code on how to exactly open a multiple sheet Excel workbook and save its sheets to CSV files? It's not really clear to me what the right incantation is for pyexcel.save_book_as.

I'm currently using pyexcel.save_as(file_name=inputfilepath, dest_file_name=outputfilepath), but that only saves the first sheet.

@codecov-io

This comment has been minimized.

codecov-io commented Jan 13, 2017

Current coverage is 98.58% (diff: 100%)

Merging #69 into master will not change coverage

@@             master        #69   diff @@
==========================================
  Files            61         61          
  Lines          5601       5601          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
  Hits           5522       5522          
  Misses           79         79          
  Partials          0          0          

Powered by Codecov. Last update 4906d51...bef96ee

@chfw chfw merged commit 4d4b6db into pyexcel:master Jan 14, 2017

3 checks passed

codecov/patch Coverage not affected when comparing 4906d51...bef96ee
Details
codecov/project 98.58% (+0.00%) compared to 4906d51
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@chfw

This comment has been minimized.

Member

chfw commented Jan 14, 2017

Surely, will do. Stay tuned.

@rmzelle rmzelle deleted the rmzelle:patch-1 branch Jan 14, 2017

chfw added a commit that referenced this pull request Jan 16, 2017

@chfw

This comment has been minimized.

Member

chfw commented Jan 16, 2017

Does this section, What would happen if I save a multi sheet book into “csv” file help clarify save_book_as? If not clear, could you elaborate a bit? I can write more.

@rmzelle

This comment has been minimized.

Contributor

rmzelle commented Jan 16, 2017

Thanks. That helps. I'm still a bit confused by pyexcel.save_book_as(bookdict=content, dest_file_name="myfile.csv") at http://pyexcel.readthedocs.io/en/latest/tutorial04.html#what-would-happen-if-i-save-a-multi-sheet-book-into-csv-file, since http://pyexcel.readthedocs.io/en/latest/generated/pyexcel.save_book_as.html?highlight=save_book_as doesn't mention a bookdict parameter.

For my little project, I'm hoping to convert specific sheets from .xlsx/.xlsm files to CSV, so I can put the data under version control. If I can't select a specific source sheet with pyexcel.save_as or pyexcel.Book(content).save_as("myfile.csv"), maybe I can do something like:

import pyexcel
sheet = pyexcel.get_sheet(file_name="example.xlsx", sheet_name="Sheet2")
sheet.save_as("example-sheet2.csv")

? Or, if I want to save all sheets in an Excel file to individual CSV files,

import pyexcel
book = pyexcel.get_book(file_name="example.xlsx")
sheets = book.to_dict()
for key, item in sheets.items():
    sheet = { key: sheets[key] }
    sheet.save_as(key + ".csv")

? Does that look reasonable?

@chfw

This comment has been minimized.

Member

chfw commented Jan 16, 2017

the last line "keywords – additional keywords can be found at pyexcel.get_book()" would bring you to the get_book()'s parameters :).

for 1st part of your code, you can try:

import pyexcel as p
p.save_as(file_name="example.xlsx", sheet_name="Sheet2", dest_file_name="example-sheet2.csv")

for 2nd part of your code:

import pyexcel as p
book = p.get_book(file_name="example.xlsx")
for sheet in book:
    sheet.save_as(sheet.name+'.csv')

or:

import pyexcel as p
p.save_book_as(file_name="example.xlsx", dest_file_name="example.csv")

What's more, alternatively you could choose pyexcel-cli, which is not well documented so far but you can use help message to get around. For 1st part of your code, you could do this:

pyexcel transcode --sheet-name Sheet2 example.xlsx example-sheet2.csv

for 2nd part of your code, you could the following instead:

pyexcel transcode example.xlsx example.csv

For the help with pyexcel-cli, you can do pyexcel transcode --help

@rmzelle

This comment has been minimized.

Contributor

rmzelle commented Jan 16, 2017

Awesome, thanks so much. I'll try out those suggestions.

chfw added a commit that referenced this pull request Jan 16, 2017

more on #69: provide more detailed documention to save_as and save_bo…
…ok_as. will find out later how to manage the same doc string in multiple places.
@rmzelle

This comment has been minimized.

Contributor

rmzelle commented Jan 18, 2017

Sorry to turn this into a support ticket, but one more question: do you know if pyexcel offers a faster Excel-to-CSV conversion than pyexcel.save_book_as()? It took 5 minutes to convert a 3.2 Mb .xlsx Excel file (35k rows, 7 columns) to CSV with the line below:

pyexcel.save_book_as(file_name="example.xlsx", dest_file_name="example.csv")

Is that kind of performance to be expected? Doing the conversion in Excel 2010 only takes a few seconds. (i7-5600U, 16 GB RAM, Python 3.4.4 64-bits)

@chfw

This comment has been minimized.

Member

chfw commented Jan 18, 2017

To be honest, the performance evaluation is not systematically done on readers.

So far pyexcel-xls and pyexcel-xlsx can read xlsx file. The latter one may be faster as it uses read-only mode. You can try switch the library and see. Pip unstall one and pip install the other.

@chfw

This comment has been minimized.

Member

chfw commented Jan 19, 2017

I wondered if pyexcel-xlsx made a difference or not. Meanwhile you may also try out xlsx2csv, csvkit as well.

@rmzelle

This comment has been minimized.

Contributor

rmzelle commented Jan 20, 2017

I wondered if pyexcel-xlsx made a difference or not. Meanwhile you may also try out xlsx2csv, csvkit as well.

I haven't had time to take another look, but I was already using "pyexcel-xlsx", I think (I installed "pyexcel" and "pyexcel-xlsx" with pip, but only imported "pyexcel" in my script).

Thanks a lot for the links to those other two libraries. I hadn't seen those before. Especially the description of xlsx2csv ("it is fast, and works for huge xlsx files") looks promising.

@chfw

This comment has been minimized.

Member

chfw commented Jan 20, 2017

I am sorry to hear that. pyexcel-xlsx uses openpyxl in read-only mode, which is the best the 3rd party library could do. If you do not mind, could you please update me on the performance of xlsx2csv?

@chfw

This comment has been minimized.

Member

chfw commented Dec 5, 2017

@rmzelle , could you please try using pyexcel-xlsxr? which I hope it should match the performance of xlsx2csv.

@rmzelle

This comment has been minimized.

Contributor

rmzelle commented Dec 5, 2017

@rmzelle , could you please try using pyexcel-xlsxr? which I hope it should match the performance of xlsx2csv.

@chfw, thanks. I don't know when I will have time to return to this project and test your new tool (sorry). FWIW, I did get good performance with xlsx2csv.py.

@chfw

This comment has been minimized.

Member

chfw commented Dec 5, 2017

Are you dealing with some public dataset? I may do the performance benchmarking myself.

@rmzelle

This comment has been minimized.

Contributor

rmzelle commented Dec 5, 2017

No, sorry, it's a private database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment