Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a way to convert an XLSX/ODS file with formula in to a csv file such that it provides insights in to the formula being used #94

Open
kjambunathan opened this issue Jul 17, 2021 · 7 comments

Comments

@kjambunathan
Copy link
Owner

kjambunathan commented Jul 17, 2021

Add a way to convert an XLSX/ODS file with formula in to a csv file such that it provides insights in to the formula being used

This issue is filed on behalf of @ouboub

This is a sister bug of Add a command to convert an Org table to xlsx · Issue #10 · kjambunathan/org-mode-ox-odt.

See this thread export org table to other formats (gnumeric or scalc or xlsx)

And also this issue import of xlsx failed in https://hub.stenci.la · Issue #1030 · stencila/stencila

An example ods file attached to that file is also attached to this issue

Exper-sten.ods

@kjambunathan
Copy link
Owner Author

kjambunathan commented Jul 17, 2021

Add a way to convert an XLSX/ODS file with formula in to a csv file such that it provides insights in to the formula being used

(Exper-sten.ods will be used for the purpose of this example.)

First I outline a solution with LibreOffice UI and then outline how that solution may be carried out in command line.

Solution using LibreOffice UI

Enable Macro Recording in LibreOffice

Do LibreOffice Menubar -> Tools -> Options -> LibreOfficeDev -> Advanced -> Optional Features -> Enable macro recording (may be limited) -> Turn On this checkbox

Screenshot from 2021-07-17 12-02-55

Save XLSX/ODS fil as CSV file with the following options.

Calc's Menubar -> File -> Save as -> choose File type as TextCSV(.csv) & also turn on Edit Filter Settings -> Save -> Confirm File Format as Use TextCSV Format -> enable Save Cell formulas instead of calculated values -> DONE.

See screenshot for details

Screenshot from 2021-07-17 12-08-22

Screenshot from 2021-07-17 12-13-55

Here is a CSV file created with the above procedure:

Exper-sten.csv

Now repeat step (2) with macro recording enabled

Calc Menubar -> Tools -> Macros -> Record Macro.

Once the ODS file is saved as CSV, save the macro to your Basic Library at some location. Now inspect the Basic Code created by the macro recorder. Here is the code that LibreOffice created for me:

REM  *****  BASIC  *****


sub Main
rem ----------------------------------------------------------------------
rem define variables
dim document   as object
dim dispatcher as object
rem ----------------------------------------------------------------------
rem get access to the document
document   = ThisComponent.CurrentController.Frame
dispatcher = createUnoService("com.sun.star.frame.DispatchHelper")

rem ----------------------------------------------------------------------
dim args1(2) as new com.sun.star.beans.PropertyValue
args1(0).Name = "URL"
args1(0).Value = "file:///home/kjambunathan/Downloads/junk/Exper-sten1.csv"
args1(1).Name = "FilterName"
args1(1).Value = "Text - txt - csv (StarCalc)"
args1(2).Name = "FilterOptions"
args1(2).Value = "44,34,76,1,,0,false,true,false,true,false"

dispatcher.executeDispatch(document, ".uno:SaveAs", "", 0, args1())


end sub

Make a note of the FilterOptions in the above Basic code which is

"44,34,76,1,,0,false,true,false,true,false"

We will be passing the above option to the command line.

Solution using Command Line

For comparing the results of conversion using UI and command line, rename Exper-sten.ods to test.ods. Use this test file with command line.

libreofficedev7.2 -convert-to csv:"Text - txt - csv (StarCalc)":44,34,76,1,,0,false,true,false,true,false test.ods

(I use LibreOffice-distributed nightly build. So, the command for me is libreofficedev7.2. In standard Debian-packaged LibreOffice, replace it with soffice. (I think) the binary is called soffice.bin on Windows. It is also called differently on Macs. Use whatever is appropriate for your installation.)

Here is the test.csv file produced by above command

test.csv

Now open a test.org buffer within Emacs, and do M-x org-table-import on the above file. Here is the test.org created.

test.org.txt

Now when in org-mode buffer do Menu bar -> Tbl -> Show Col/Row Numbers bound to C-c } or equivalently M-x org-table-toggle-coordinate-overlays

(custom-set-variables
 '(org-table-use-standard-references t))

This is a screenshot of what you will get. You will get a overview of the formulae used.

Screenshot from 2021-07-17 12-42-19

Screenshot from 2021-07-17 12-42-00

@kjambunathan
Copy link
Owner Author

kjambunathan commented Jul 17, 2021

libreofficedev7.2 -convert-to csv:"Text - txt - csv (StarCalc)":44,34,76,1,,0,false,true,false,true,false test.ods

The recipe shared here is a slightly expanded version of How do I export and import formulas to CSV? [closed] - Ask LibreOffice.

It looks like a reverse process of exporting formulae-embedded CSV -> regular formulae-embedded XLSX/ODS is also possible. Need to figure out whether it is possible via command line, and what the command line options are.

@ouboub
Copy link

ouboub commented Jul 17, 2021 via email

@kjambunathan
Copy link
Owner Author

kjambunathan commented Jul 18, 2021

tidyxl-https://nacnudus.github.io/tidyxl/articles/tidyxl.html -, a R package is an interesting package that is worth trying

It could help you deal with a "wall-of-formula-text", and help you get your head around it.

The package documentation claims that it should be able to identify what it calls as shared formulas. I am quoting from the documentation

Shared formulas

Cells A20 and A21 illustrate how formulas are normalised before
being written to file, using the formula_ref and formula_group
columns. When there is a group of cells whose formulas only
differ by cell reference (e.g. “=A1+1”, “=A2+1”, “=A3+1”, etc.),
only one formula in each group is written to the file, so tidyxl
infers what the formulas in the other cells in the group must be,
from their relative positions.

Going by that claim it should be able to "group" the formulae that occur in column J at cells J4, J5 etc. But it doesn't. It looks like a bug or an incomplete feature to be. You may find it worthwhile to file a bug against the tidyxl package. The benefit that you get is that you will know which formulae are merely a repetition of a templat, and which aren't. This way you can focus on the "unique" formulae. The benefit to the tidyxl package is that they get to see a usecase. (Btw, tidyxl seems to be associated with the UK government)

test.xlsx

rsession.txt

not-recognised-as-formula-group.txt

I thought I will share this with you ... even though it has nothing to do with the ODT exporter as such.

@kjambunathan
Copy link
Owner Author

kjambunathan commented Jul 18, 2021 via email

@ouboub
Copy link

ouboub commented Jul 12, 2022

The other way around, from CVS-->ods with formulas.

I started with

| First |   |     |
|-------+---+-----|
|     3 | 6 |   9 |
|     1 | 3 |   4 |
|     1 | 2 |   3 |
|     8 | 9 |  17 |
|   100 | 1 | 101 |
#+TBLFM: $3=vsum($1..$2);f-2::@2$3=vsum(@2$1..@2$2)

and that I add to change to


|   3 | 6 | =SUM($A1:$B1) |
|   1 | 3 | =SUM($A2:$B2) |
|   1 | 2 | =SUM($A3:$B3) |
| 100 | 1 | =SUM($A4:$B4) |

and then to export to csv, and then to important to LO, it worked.
I wounder whether there could be any faster method, maybe a lisp function that converts

vsum($1..$2); to =SUM($A1:$B1)

I will also asks this on the org-mailing list. Not that I make myself too many illustions.

@ouboub
Copy link

ouboub commented Jul 13, 2022

Thanks, I attach a zip file with a README file that explains the purpose of each file
HG-CVS-Formular.zip

Repository owner deleted a comment from ouboub Oct 11, 2023
Repository owner deleted a comment from ouboub Oct 11, 2023
kjambunathan added a commit that referenced this issue Oct 13, 2023
…ToTSV, OptimizeColumnWidth): New

* contrib/odt/LibreOffice/src/OrgModeUtilities.bas
(OptimizeColumnWidth): Add support for optimizing column width in
spreadsheets.

(CreateODTFileWithAllFactoryStyles): Create a ODT file with all
vanilla LibreOffice styles.  In a single shot, one can see the XML
representation of all factory styles that LibreOffice ships with.
The styles that are of specific interest to the current state of
the ODT exporter are: (a) the modern List Styles (b) the table
template definition, and its related styles.

(ODSExportToFormulaTSV): Export a ODS or other spreadsheet document to
TSV file; unlike, the regular "data-only" TSV export, the exported
TSV file will have the actual Calc formula in formula cells.

(ODSExportToTSV, ODSExportToCSV): Export a ODS or other spreadsheet
document to a "data-only" TSV or CSV formats.

(ODSDoExportToTSV): Workhorse routine that does the above TSV / CSV
conversions.

(fnWhichComponent, OptimizeColumnWidthOfSheet, CreateIndexMark)
(RepeatText, AnchoringParaText, AddPageNumber)
(ApplyTableTemplate, ApplyTableTemplates): Misc. helper functions
used in above previous three macro entry points.

`ODSExportToTSV' addresses #94 and #235.

`CreateODTFileWithAllFactoryStyles' is related to #251 (Upgrading
of ODT exporter's default `styles.xml' to the current state of art
XML definitions)

`OptimizeColumnWidth' fixes #245.

With this change, one can accomplish the following:

- Deprecate or re-define current ODT deployed list styles
  `OrgNumberedList', `OrgBulletedList'--these are "legacy" list
  definitions--to to modern list styles.

  This in turn /may/ allow one fine-grained control over how lists are
  indented. See #56 and #79.

- New LO-supplied styles `List 1', `List 1 start', `List 1 Cont' and
  `List 1 End' etc can be used as starting points for paragraph styles
  that can be used within lists.  See #250.

- Import fancy table cell definitions used in autoformatting of
  Tables.  See #199.

* lisp/ox-ods.el (org-ods-import-spreadsheet-file): New command.
Export all sheets of ODS file in to TSV format using the Basic Macro
`ODSExportToTSV'.  When visiting an `org-mode' file, import the
exported TSV files as well.

This change addresses #94 and #235.

* lisp/ox-odt.el (org-odt-convert-capabilities): Add support for export to TSV format(s)

* lisp/ox-odt.el (org-odt-map): New pcase pattern.

(org-odt-convert-capabilities): Introduce new options
`:lo-export-filter-options' and `:out-file-extension' in conversion
rules.  These options refine the earlier `%x' format specifier.
Revise entry for "odt" to "doc" conversion.  Add entry for "ods" to
"formula.tsv" and "ods" to "tsv" conversion.

(org-odt-convert-processes): Remove the `%x' specifier for in the
command line for LibreOffice entry.

(org-odt-do-convert): Modified.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants