Wish: Input / output for Open Document Spreadsheet (ODS) #2311

lbeltrame · 2012-11-21T09:00:06Z

It would be nice to have, along with XLS and XLSX, also support for ODS (used notably by OpenOffice.org and Libreoffice, but not only that).

The main problem I saw is that there's a plethora of libraries to handle that: however not sure which is the "best" one (with a licensing that would fit with pandas).

paulproteus · 2012-12-16T07:31:44Z

It seems to me the best option for this would be:

Start with a simple test case for input and output, and handle that using this library: http://simple-odspy.sourceforge.net/ -- sample code: http://simple-odspy.sourceforge.net/?q=node/5
If there are more complex cases where simple-odspy is not enough, one can use the more heavyweight http://pypi.python.org/pypi/odfpy library

This would be a rather substantial bit of work.

A good start for the test cases would be to look at pandas/io/tests/test_excel.py and write test cases that mirror what is in that file.

H0R5E · 2014-09-23T10:16:07Z

+1 for this feature.

Working on an open-source data wrangling project and I would like to be able to interact with spreadsheet software that is not Excel.

jtratner · 2014-09-23T15:24:28Z

@H0R5E for now, you could just convert the resulting excel file to ODS (it's pretty much a simple set of numbers, nothing more): http://stackoverflow.com/questions/15257032/python-convert-excel-file-xls-or-xlsx-to-from-ods or you could use json as an intermediary and use tablib to export to ODS: https://pypi.python.org/pypi/tablib

robertmuil · 2014-11-13T11:01:20Z

+1 for this!

davidovitch · 2014-11-16T22:04:19Z

I want to tune in with some additional notes:

odfpy is now living on Github, and seems to get some attention now and then.
I bumped into ezodf, which, contrary to simple-odspy, seems to be still under development. It does not depend on odfpy. The docs are here, and some examples here.

Based on some initial looks, ezodf has a straight forward interface. For example, loading an ods spreadsheet into a DataFrame could go as follows:

import pandas as pd
import ezodf
doc = ezodf.opendoc('somefile.ods')

print("Spreadsheet contains %d sheets.\n" % len(doc.sheets))
for sheet in doc.sheets:
    if sheet.name == 'data':
        data = sheet
    print("Sheet name: '%s'" % sheet.name)
    print("Size of Sheet : (rows=%d, cols=%d)" % (sheet.nrows(), sheet.ncols()) )
    print("-"*40)

# a random cell
pos = (row, col)
data.get_cell(pos).display_form

df_dict = {}
for i, col in enumerate(data.columns()):
    # col is a list that contains all the cells (rows of that column)
    # assume the column name is on the first row
    colname = col[0].display_form
    df_dict[colname] = []
    print(colname)
    for j, row in enumerate(col):
        # ignore the header
        if j == 0:
            continue
        else:
            df_dict[col_name].append(row.display_form)
# and convert to a DataFrame
df = pd.DataFrame(df_dict)

I have no idea how odfpy and ezodf compare performance wise, or how their different memory footprints are for very large spreadsheets. I haven't tried to use odfpy because that seems more complicated to get going with. I also have not checked how robust the writing process of ezodf is. I'll try to see if I can get something similar going as for pandas/io/tests/test_excel.py.

I will also mention @sorenroug from odfpy and @T0ha from ezodf just in case someone from upstream is interested in this discussion.

davidovitch · 2014-11-18T22:24:25Z

After a short exploration, it seems doable to create an ods prototype reader based on the structure of pandas/io/excel.py using the ezodf module. I've forked pandas here and will implement an ods prototype reader in the io-ods branch.

jtratner · 2014-11-29T02:33:29Z

@davidovitch - one thing to keep in mind if you choose to handle both writing and reading: you can register an ODS writer in the excel module (just implementing the required methods as listed in the code) and then you can just add a simple subclass of the excel writer tests and get a ton of test cases for free. Just a thought.

davidovitch · 2015-04-30T09:28:11Z

I would like to give an update on the current status. My fork can currently read ODF ODS files, but writing is not implemented yet. The corresponding pull request is still pending review, and it has been a lot more work than I originally anticipated.

Just a few days ago I bumped into yet another library to read/write spreadsheet: pyexcel. What is different about this library is that it aims at creating a single API for all the different read/write libraries, and it builds on all the existing libraries out there (ezodf, xlrd, etc). This is actually kind of similar to what pandas currently has, and to what I am trying to extend in PR #9070. So I am wondering, would it make sense to use a single API library (with optional dependencies to all the relevant readers/writers for the different spreadsheet types) instead of developing something similar, but tailored to pandas use case? I am speculating that a lot of code in io/excel.py could be removed when relying on pyexcel, but it adds yet another dependency. At this stage I am not sure how all different edge cases on data types and other magic happening inside the spreadsheet interpretation differs between the current pandas implementation and pyexcel.

davidovitch · 2015-09-02T07:28:40Z

The ods reader is not ready for the upcoming 0.17 release. PR #9070 is closed (see the PR for a technical discussion), and a new improved PR will be made by someone at some point in the future. I have a working version in the ezodf_reader branch of my pandas fork in case someone wants to have a look at it. Suggestions and improvements are welcome :-)

H0R5E · 2015-09-16T10:08:03Z

@davidovitch thanks for your efforts! Its a shame that that reading and writing to ODF is not better supported in python, otherwise I'm sure this would be less of an ordeal.

PBrockmann · 2015-11-02T13:30:06Z

Working on how to structure data for paleoclimatology community, I would like to promote ODS as the standard format. Then of course, pandas with this possible reading and writing at ODS format would become the natural way to start nice analysis.

+2 and more for this feature that seems so closed to be released.

naught101 · 2015-11-02T21:19:24Z

You should almost certainly be using netcdf for paleoclimate data. It's more or less industry standard. Checkout the python package 'xray' for a nice way to interface with it.

On 3 November 2015 12:30:20 am AEDT, Patrick Brockmann notifications@github.com wrote:

Working on how to structure data for paleoclimatology community, I
would like to promote ODS as the standard format. Then of course,
pandas with this possible reading and writing at ODS format would
become the natural way for IO and start nice analysis.

+2 and more for this feature that seems so closed to be released.

Reply to this email directly or view it on GitHub:
#2311 (comment)

Sent from my Android device with K-9 Mail. Please excuse my brevity.

PBrockmann · 2015-11-02T23:10:07Z

@naught101: I know quite well netCDF format and some conventions like CF (Climate and Forecast) widely used in the Earth System Modelling community but it seems that the PaleoClimate one has also good argument to keep working with others standard. An article that is under discussion describes this motivation and promote a JSON-LD format. Read https://www.authorea.com/users/17200/articles/19163/_show_article and please feel free to contribute it.
In my understanding, the data themselves are not problematic, the difficulty comes more from the metadata that are in many aspect hierachical. Netcdf attributs are in this case not very well designed.
On the other hand, describe those metadata with a simple spreadsheet with a dotted notation in an open source format like ODS (I will do a proposition in this way) could be interesting because it will not radically change the behaviour of PaleoClimate scientists that widelly work with spreadsheets.
All this, to say that if pandas could allow IO from ODS, it could be a good motivation to start their analysis with pandas.

shoyer · 2015-11-03T06:27:56Z

@PBrockmann If your metadata is hierarchical, maybe it would be appropriate to use the hierarchical features of netCDF4/HDF5? Just a thought. In any case, I think we agree regardless that ODS support would be valuable for pandas.

hnykda · 2015-11-28T14:50:10Z

Is there any progress on this? Thanks

jreback · 2015-11-29T22:31:42Z

@hnykda there is a PR #9070 that is not far away, but needs some work if you'd like to pick it up.

hnykda · 2015-11-30T09:37:28Z

Will take a look at it. Thank you for pointing it out.

jameserrico · 2017-01-21T04:41:03Z

On a similar note has anyone considered Google Sheets? There are some limits to the size of a sheet (http://gappstips.com/google-sheets/google-spreadsheet-limitations/) that probably make it only usable for some cases but there are quite a lot of datasets that would fit within this restrictions and the ease of collaboration with Sheets is nice.

detrout · 2017-06-08T23:37:46Z

@jameserrico There's a library https://github.com/embr/gcat that can read from google drive. I have a fork that I updated to work with python3 https://github.com/detrout/gcat but upstream hasn't merged the pull-request

It knows how to return pandas data frames. I.e. I have some code like this:

    book = gcat.get_file(book_name, fmt='pandas_excel')
    experiments = book.parse('Experiments', header=0)`

detrout · 2017-06-16T04:41:40Z

I have a standalone reader styled like pandas.io.excel with tests.
https://github.com/detrout/pandasodf

It uses odfpy as that seems like its still maintained. Any comments, or should I try reformatting into a pull request creating pandas/io/odf.py?

I thought the .read_odf method seemed more reasonable, than overloading .read_excel

naught101 · 2017-06-16T04:48:22Z

I thought the .read_odf method seemed more reasonable, than overloading .read_excel

Makes sense. Perhaps there could be a read_spreadsheet that calls read_csv, read_excel, read_odf etc, depending on the extension?

detrout · 2017-06-16T17:51:50Z

@naught101 The general reader might have some difficulties since there are differences in what the various formats support. CSV/TSV only have a single table, while Excel & ODF can have multiple tables per file.

Also I'm currently contemplating passing the raw odfpy xml cell node for the converters callable, so those would be quite difficult to write if you didn't know the source file type.

davidovitch · 2017-08-07T07:13:31Z

I thought the .read_odf method seemed more reasonable, than overloading .read_excel

@detrout FYI, maybe things have changed since, but at the time of PR #9070 there was a very clear decision by the pandas devs not to do that and to keep read_excel for both the Excel and Open Document Format families.

TrigonaMinima · 2019-02-21T08:24:00Z

@jreback is #9070 the right starting point on this or do you suggest starting with something else?

jreback · 2019-02-21T09:13:22Z

yes that issue is a good starting point

note that the internal code org has changed a lot since then

H0R5E · 2019-07-03T22:35:11Z

Great to see the ODS reader implemented. Well done! I'm not sure this should have been closed though, as writing still isn't supported.

jreback · 2019-07-03T22:36:47Z

i would open a new issue for write support (if it’s really needed)

jtratner mentioned this issue Sep 19, 2014

Bloomberg Hackathon #8323

Closed

jreback added the Good as first PR label Sep 21, 2014

davidovitch mentioned this issue Dec 13, 2014

ENH: Open Document Format ODS support in read_excel() #9070

Closed

TomAugspurger added the good first issue label Oct 11, 2017

jreback removed the Difficulty Novice label Dec 15, 2017

jschendel mentioned this issue Jun 13, 2018

Feature request: adding read_ods function #21468

Closed

detrout mentioned this issue Feb 27, 2019

Class to read OpenDocument Tables #25427

Merged

jreback modified the milestones: Someday, 0.25.0 Jul 3, 2019

jreback closed this as completed in #25427 Jul 3, 2019

solstag mentioned this issue Jul 3, 2019

Wish: Write support for Open Document Spreadsheet (ODS) #27222

Closed

marcelonp mentioned this issue Aug 30, 2019

Update documentation on pd.read_excel, to reflect the fact that support for OpenDocument files is available #28231

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wish: Input / output for Open Document Spreadsheet (ODS) #2311

Wish: Input / output for Open Document Spreadsheet (ODS) #2311

lbeltrame commented Nov 21, 2012

paulproteus commented Dec 16, 2012

H0R5E commented Sep 23, 2014

jtratner commented Sep 23, 2014

robertmuil commented Nov 13, 2014

davidovitch commented Nov 16, 2014

davidovitch commented Nov 18, 2014

jtratner commented Nov 29, 2014

davidovitch commented Apr 30, 2015

davidovitch commented Sep 2, 2015

H0R5E commented Sep 16, 2015

PBrockmann commented Nov 2, 2015

naught101 commented Nov 2, 2015

PBrockmann commented Nov 2, 2015

shoyer commented Nov 3, 2015

hnykda commented Nov 28, 2015

jreback commented Nov 29, 2015

hnykda commented Nov 30, 2015

jameserrico commented Jan 21, 2017

detrout commented Jun 8, 2017

detrout commented Jun 16, 2017

naught101 commented Jun 16, 2017

detrout commented Jun 16, 2017

davidovitch commented Aug 7, 2017 •

edited

Loading

TrigonaMinima commented Feb 21, 2019

jreback commented Feb 21, 2019

H0R5E commented Jul 3, 2019 •

edited

Loading

jreback commented Jul 3, 2019

Wish: Input / output for Open Document Spreadsheet (ODS) #2311

Wish: Input / output for Open Document Spreadsheet (ODS) #2311

Comments

lbeltrame commented Nov 21, 2012

paulproteus commented Dec 16, 2012

H0R5E commented Sep 23, 2014

jtratner commented Sep 23, 2014

robertmuil commented Nov 13, 2014

davidovitch commented Nov 16, 2014

davidovitch commented Nov 18, 2014

jtratner commented Nov 29, 2014

davidovitch commented Apr 30, 2015

davidovitch commented Sep 2, 2015

H0R5E commented Sep 16, 2015

PBrockmann commented Nov 2, 2015

naught101 commented Nov 2, 2015

PBrockmann commented Nov 2, 2015

shoyer commented Nov 3, 2015

hnykda commented Nov 28, 2015

jreback commented Nov 29, 2015

hnykda commented Nov 30, 2015

jameserrico commented Jan 21, 2017

detrout commented Jun 8, 2017

detrout commented Jun 16, 2017

naught101 commented Jun 16, 2017

detrout commented Jun 16, 2017

davidovitch commented Aug 7, 2017 • edited Loading

TrigonaMinima commented Feb 21, 2019

jreback commented Feb 21, 2019

H0R5E commented Jul 3, 2019 • edited Loading

jreback commented Jul 3, 2019

davidovitch commented Aug 7, 2017 •

edited

Loading

H0R5E commented Jul 3, 2019 •

edited

Loading