-
Notifications
You must be signed in to change notification settings - Fork 0
/
2009-09-30-splitting-xml-well-with-xslt-2.html
43 lines (36 loc) · 1.57 KB
/
2009-09-30-splitting-xml-well-with-xslt-2.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
---
layout: post
title: "Splitting XML Well with XSLT 2"
permalink: splitting-xml-well-with-xslt-2.html
categories: [xml, xslt, xslt2, split]
---
<p>I recently had the need to split up a result set from a <a
href="http://lucene.apache.org/solr/">Solr</a> query into a collection
of smaller groups of <code>add</code> requests for POSTing into a
different core.
There are some ways to make the split work with text processing tools
(<code>split</code> and friends), but it's always an open question
whether an ad hoc approach will trip over some markup — it's
just better to use XML tooling. By no coincidence (based on features missing from
<a href="http://www.w3.org/TR/xslt"></a>), <a href="http://www.w3.org/TR/xslt20/">XSLT 2</a> makes it
easy to do the right thing.</p>
<p>First up is grouping in chunks of 2000 records:</p>
<pre class="code"><xsl:for-each-group select="/response/result/doc"
group-by="round(position() div 2000)">
...
</xsl:for-each-group></pre>
<p>Outputting each hunk to a file named for the index of the
group is also a one-liner:</p>
<pre class="code"><xsl:result-document href="{current-grouping-key()}_out.xml">
<add>
<xsl:for-each select="current-group()">
<doc>
<xsl:apply-templates />
</doc>
</xsl:for-each>
</add>
</xsl:result-document></pre>
<p>And that's it. The only trick is choosing an XSLT  processor,
and the superlative <a
href="http://saxon.sourceforge.net/">Saxon</a> (from <a
href="http://www.saxonica.com/">Saxonica</a>) is my default choice.</p>