Skip to content

Putting plain data into excel sheet

foglcz edited this page Apr 23, 2013 · 28 revisions

Now we will finally get to the point, where we generate some real excel badass. The important file is xl/worksheets/sheet1.xml file. Open it within your favourite XML editor, search for the text you written into the sheet and... Notice that there's none.

Intermezzo

Excel uses shared strings concept for all strings within the entire file. While that makes sense from automated standpoint, all we want to do is to generate the worksheet & don't get insane at the same time.

We don't have to use the sharedStrings, and the XSL Excel engine does not support it. We generate all strings directly within the worksheet instead, as it makes navigation & maintenance a lot easier.

Abandoning the shared strings concept might seem as a bad idea, while it isn't. It does nothing to Excel file validity, so your reports will be valid (unlike when generating Excel 2003 XML format & forcing it via extension of the file) - therefore, the experience to your users will be completely seamless.

Also, when this kind of file get's re-saved by Excel, the nice people from Microsoft will restore the shared strings concept for us. So abandoning it actually affects only initial export from your system; handling of the file by subsequent users will 'repare' this defect.

It would be pain in the ass to achieve shared strings within XSLT (ie. having it within the engine) - while having the navigation within worksheets still sane for a developer working with it. Therefore, we abandoned the concept and don't want to look back. Ever.

For the transformation between shared-strings worksheet and inline worksheet, we have developed a tiny script in php. So, let's convert the worksheet into human-readable format:

  1. Copy xl/worksheets/sheet1.xml into utils direcotry, and name it as sheet.xml
  2. Copy xl/sharedStrings.xml into utils directory
  3. $ php sharedStringsConvertor.php

This will produce parsedSheet.xml, which we are going to use as a blueprint for your sheet. As first step should be only producing valid excel worksheet from the system, we will copy/paste it's contents into template.xsl file, into the sheetContents variable:

<?xml version='1.0'?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:include href="2012.engine.xsl" />
    <xsl:include href="static.xsl" />

    <xsl:template match="/">
        <xsl:call-template name="generate_excel">
            <xsl:with-param name="themes"><xsl:copy-of select="$asset_theme" /></xsl:with-param>
            <xsl:with-param name="styles"><xsl:copy-of select="$asset_styles" /></xsl:with-param>
            <xsl:with-param name="sheetContents">
                <!-- contents of the parsedSheet.xml -->
            </xsl:with-param>
        </xsl:call-template>
    </xsl:template>
</xsl:stylesheet>

There are two things to watch out at this point - for so-called rIds at the very end of the worksheet:

<hyperlinks>
    <hyperlink ref="H8" r:id="rId1"/>
    <hyperlink ref="H5:L5" r:id="rId2" display="@foglcz"/>
    <hyperlink ref="H6:L6" r:id="rId3" display="in/foglcz"/>
</hyperlinks>
<printOptions horizontalCentered="1" verticalCentered="1"/>
<pageMargins left="0.70866141732283472" right="0.70866141732283472" top="0.74803149606299213" bottom="0.74803149606299213" header="0.31496062992125984" footer="0.31496062992125984"/>
<pageSetup paperSize="9" orientation="landscape" r:id="rId4"/>
<drawing r:id="rId5"/>

First, the hyperlinks are not yet supported, so get rid of them. (there's an issue for that.) Second:

  • the drawing MUST have rId1
  • the pageSetup MUST have rId2

So, we will rewrite above into following:

<printOptions horizontalCentered="1" verticalCentered="1"/>
<pageMargins left="0.70866141732283472" right="0.70866141732283472" top="0.74803149606299213" bottom="0.74803149606299213" header="0.31496062992125984" footer="0.31496062992125984"/>
<pageSetup paperSize="9" orientation="landscape" r:id="rId2"/>
<drawing r:id="rId1"/>

If you don't have any of these, don't worry. They're optional. More on the page setup & drawings on separate sections of this documentation.

Moving forward

Recompile! Recopy binary files! Repackage into zip! Rename the zip! Done!

The goal is to generate .xlsx file, which Excel finds comfortable - ie. does not scream about invalid formats. That's entirely possible, it's just tidy work.

Now, the entire generation happens within the sheetContents variable. There's several XML options, that are listed in respective section, but for following will suffice:

  • all rows are <row r="{row_number}">
  • all cells within respective rows are shown as <c r="{cell_number}" s="style_number_dont_touch_this">, where cell_number is AE150 - so, the generation is particularly easy.
  • when you want to have text or anything within the cell, here's the snippet:
<c r="it_should_already_be_there" s="you_dont_want_to_touch_this" t="inlineStr_also_not_relevant">
    <is>
        <t>YOUR TEXT BELONGZ TO UZ</t>
    </is>
</c>

One reason for creating the excel file in advance is to save a shit load of time by not-having to deal with styles, thinking about structure etc. If you have put placeholders into respective parts of the sheet, which you want to fill dynamically from the .xml file, then all you essentially need to do is to search for those placeholders & replace them with <xsl:value-of> elements.

It was important to keep the file sill-valid up until now, because now you will immediately see, if you made some typo which excel has to correct. You don't need to deal with all the dimensions & stuff, you just append the for-each and all the markup you are used to.

Oh, and if you actually make a typo, you have to trace back your steps. I've never seen more cryptic messages than here - and hurra, there's no debugger. I mean - even Office SDK does not help - it actually says "Cannot open, it's invalid, idiot!" and that's it.

Just do re-check every few-editations, and you'll be fine. Once you start doing more than one thing at a time (eg. you don't recheck after filling the contents and start messing up with images), then you'll have trouble on your hands.

Anyway, there should be excel being sent out from your system now.

Generic editing options

The <c>'s attribute t defines type of value. inlineStr is inline string, e is function, and there's couple more in documentation.

  • For standard text, use: <c t="inlineStr"><is><t>the_text_inhere</t></is></c>
  • For numeric values, feel free to use inlineStr. There's no added value in using correct syntax.
  • For functions, eg sum(), use: <c t="e"><f ca="1">SUM(...)</f></c>
  • For date, we recommend: <c s="if_its_date_theres_something_here"><f ca="1">DATEVALUE("2013-04-22")</f></c>

These should deal with those 99,999% of cases.

Oh - about the date hack - if you would use the date type, Excel would expect float, which is not unix timestamp, it's some-kind-of-weirdo-microsoft-thingie. We honestly have no idea what Microsoft is thinking on this one, but we kinda didn't come-up with XSL helper function, which would generate it correctly. DATEVALUE does the job pretty well anyway.

Let's move on.

Navigation

  1. Preparing excel files
  2. Preparing the first output
  3. Putting plain data into excel sheet
  4. Appending images
  5. Creating multiple worksheets
  6. Printer settings
  7. Macro-enabled workbooks
  8. Under the hood
  9. Engine parameters documentation
  10. Best practice for writing Excel XML
  11. Authors, license, contributing

Clone this wiki locally