You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The deal here is that Excel doesn't write every formula to disk. Formulas that differ only by the cell references, (e.g. =A1 and =A2 are basically the same, bar a row-offset), are only written once, and other cells refer to that one instance. The application is expected to parse the 'master' formula, and then reconstruct the others based on their relative position.
That's fine for spreadsheet applications, which have to parse the formula anyway for calculation, but for tidyxl, which leaves the formulas as strings, it's a pain.
There are a few open-source parsers around, which I refer to in comments below. Most are handwritten. The only one that uses a parser generator and a grammar is XLParser, which targets analysis rather than calculation. Microsoft publishes a grammar, Excel (.xlsx) extensions to the office openxml spreadsheetml file format p.24, but it is 25 pages long, and horrible.
The minimum-viable parser for tidyxl would simply separate cell references from the rest of the formula, offset them, and put the pieces back together. I've written the grammar, so just need to design a formula object to hold the pieces and handle the offsetting.
Parsers
C++ (MIT licence), which I have forked for a couple of fixes, and wrapped in an R package.
C# parser-generator grammar and my attempt to port it to C++/Rcpp using the PEGTL header-only C++ parser generator. Another parser generator I could use is Boost C++ included in the BH package and used in readr, with similar downsides to PEGTL (it backtracks after 'successful' intermediate matches have acted their side-effects, which is too late). My port now also has a much simpler parser that simply extracts cell references from the rest of a formula.
The deal here is that Excel doesn't write every formula to disk. Formulas that differ only by the cell references, (e.g.
=A1
and=A2
are basically the same, bar a row-offset), are only written once, and other cells refer to that one instance. The application is expected to parse the 'master' formula, and then reconstruct the others based on their relative position.That's fine for spreadsheet applications, which have to parse the formula anyway for calculation, but for tidyxl, which leaves the formulas as strings, it's a pain.
There are a few open-source parsers around, which I refer to in comments below. Most are handwritten. The only one that uses a parser generator and a grammar is XLParser, which targets analysis rather than calculation. Microsoft publishes a grammar, Excel (.xlsx) extensions to the office openxml spreadsheetml file format p.24, but it is 25 pages long, and horrible.
The minimum-viable parser for tidyxl would simply separate cell references from the rest of the formula, offset them, and put the pieces back together. I've written the grammar, so just need to design a formula object to hold the pieces and handle the offsetting.
Parsers
xlsx_cell_expr_begin
function.Recursive shared formula
. Uh-oh.getRevV7
, rather than using range_translate.The text was updated successfully, but these errors were encountered: