Branch | Status |
---|---|
Master | |
Develop |
XML Splitter is a small library which splits one xml into different parts. As an example take a look at following xml structure:
<notes>
<note scope="private">
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body><![CDATA[Don't forget me this <b>weekend</b>!]]></body>
</note>
<note scope="public">
<to>Pete</to>
<from>Sarah</from>
<heading>Urgent</heading>
<body><![CDATA[Don't forget me this <b>weekend</b>!]]></body>
</note>
</notes>
If you want to split up those list of notes in it different note element (e.g. to have them in separate files) you can just use this library.
Just look at the usage section to get a brief overview of how you can use xml-splitter.
The most recent release is xml-splitter 0.2.0, released May 01, 2017.
<dependency>
<groupId>com.github.spucman</groupId>
<artifactId>xml-splitter</artifactId>
<version>0.2.0</version>
</dependency>
compile 'com.github.spucman:xml-splitter:0.2.0'
Download the latest release from the maven central repository and add the jar file to you classpath.
xml-splitter is compiled against JDK7+ and has only slf4j as a required dependency. Nevertheless it is recommended to use woodstox as a stax implementation.
At the moment you have two different choices to use xml-splitter. As a file based splitter
or as an in-memory splitter.
To use them transparent in your project just include the XmlSplitter
interface with
one of the following implementations:
FileStaxNodeSpitter
InMemoryStaxNodeSplitter
In the future examples, we will just look into the FileStaxNodeSplitter
because the
in-memory sibling is much simpler ;).
For the upcoming samples we are assuming following xml structure:
<listElement>
<global>globalValue</global>
<global1>globalValue1</global1>
<element id="0">
<name>name0</name>
<address>address0</address>
<to>to0</to>
<from>from0</from>
<email>email0</email>
<other>other0</other>
<stuff>stuff0</stuff>
</element>
<element id="1">
<name>name1</name>
<address>address1</address>
...
</element>
...
</listElement>
We have a simple list of elements which also contains some arbitrary attributes of the top level.
If we want to split up the node sample from the beginning, we can just configure the xml-splitter like this:
FileStaxNodeSplitter fileStaxNodeSplitter = new FileStaxNodeSplitter();
fileStaxNodeSplitter.setOutputFolder("path/to/output/folder");
fileStaxNodeSplitter.setSplittingNodeName(new QName("element"));
fileStaxNodeSplitter.init();
XmlSplitStatistic statistic;
try (InputStream is = JavaXmlSplitterTest.class.getResourceAsStream("/xml/testInput.xml")) {
statistic = splitter.split("simpleElementSplit", is);
}
So what is happening up there?
- you are instantiating a instance of a FileStaxNodeSplitter
- after that you are defining where the result files should be stored
- you are defining, which is the element where you want to split
- you are initializing the
FileStaxNodeSplitter
and it will take care that the specified folder is existing on your FileSystem - last but not least you are just calling split with your base target name
and an
InputStream
of the XML structure.
The gernated output will look like following:
<?xml version='1.0' encoding='UTF-8'?>
<element id="0">
<name>name0</name>
<address>address0</address>
<to>to0</to>
<from>from0</from>
<email>email0</email>
<other>other0</other>
<stuff>stuff0</stuff>
</element>
The filename is simply the the number of the list element, so in this case the filename would
be simpleElementSplit_0.xml
If you want to use the same xml structure you used before you split you can also do that in a very simple way:
FileStaxNodeSplitter fileStaxNodeSplitter = new FileStaxNodeSplitter();
fileStaxNodeSplitter.setOutputFolder("path/to/output/folder");
fileStaxNodeSplitter.setSplittingNodeName(new QName("element"));
fileStaxNodeSplitter.setDocumentEventHandler(new XmlSurroundingNodeDocumentEventHandler(new QName("parentElement")));
fileStaxNodeSplitter.init();
...
This sample looks very similar to the sample before just with one additional command setDocumentEventHandler
.
With this hook you have a simple way to modify your output xml by placing custom stuff in following events:
- afterStartDocument
- beforeEndDocument
- finishedDocument
As one implementation there is a XmlSurroundingNodeDocumentEventHandler
existing which simply takes
one QName and places it around the result.
Now our xml will look like this:
<?xml version='1.0' encoding='UTF-8'?>
<parentElement>
<element id="0">
<name>name0</name>
<address>address0</address>
<to>to0</to>
<from>from0</from>
<email>email0</email>
<other>other0</other>
<stuff>stuff0</stuff>
</element>
</parentElement>
Sometimes you are in the mess that despite of splitting you must provide to each result element additional information of a higher level in your xml tree.
FileStaxNodeSplitter fileStaxNodeSplitter = new FileStaxNodeSplitter();
fileStaxNodeSplitter.setOutputFolder("path/to/output/folder");
fileStaxNodeSplitter.setSplittingNodeName(new QName("element"));
fileStaxNodeSplitter.setGlobalDataCollectorNameList(Lists.newArrayList(new QName("global"), new QName("global1")));
XmlSurroundingNodeDocumentEventHandler eventHandler = new XmlSurroundingNodeDocumentEventHandler();
eventHandler.setNode(new QName("root"));
eventHandler.setGlobalValueList(Lists.<QName>newArrayList(new QName("global1")));
fileStaxNodeSplitter.setDocumentEventHandler(eventHandler);
fileStaxNodeSplitter.init();
...
As before we defined our FileStaxNodeSplitter in the same way as we have done it before.
The difference from before is in the setGlobalDataCollectorNameList
and the setGlobalValueList
of
the FileStaxNodeSplitter
and the XmlSurroundingNodeDocumentEventHandler
At the FileStaxNodeSplitter
you are defining which data you want to reuse later on (e.g. in one EventHanlder).
In the XmlSurroundingNodeDocumentEventHandler
you are just defining, which data you have collected before hand
should be added to your result xml.
So if we are no looking at our new results it looks like following:
<?xml version='1.0' encoding='UTF-8'?>
<parentElement>
<global1>globalValue1</global1>
<element id="0">
<name>name0</name>
<address>address0</address>
<to>to0</to>
<from>from0</from>
<email>email0</email>
<other>other0</other>
<stuff>stuff0</stuff>
</element>
</parentElement>
This section will briefly covers the same configuration we have done before with java via spring xml files. Just for convenience we are defining the all xml node beforehand and will just be referenced in the different sections.
<bean id="globalValue" class="javax.xml.namespace.QName">
<constructor-arg name="localPart" value="global"/>
</bean>
<bean id="globalValue1" class="javax.xml.namespace.QName">
<constructor-arg name="localPart" value="global1"/>
</bean>
<bean id="tagElement" class="javax.xml.namespace.QName">
<constructor-arg name="localPart" value="element"/>
</bean>
<bean id="tagParentElement" class="javax.xml.namespace.QName">
<constructor-arg name="localPart" value="parentElement"/>
</bean>
For a detailed explanation of this sample look at the java configuration section.
<bean id="simpleFileStaxNodeSplitter" class="com.github.spuchmann.xml.splitter.stax.FileStaxNodeSplitter" init-method="init">
<property name="outputFolder" value="#{ T(com.google.common.io.Files).createTempDir().getAbsolutePath() }"/>
<property name="splittingNodeName" ref="tagElement"/>
</bean>
For a detailed explanation of this sample look at the java configuration section.
<bean id="surroundingFileStaxNodeSplitter" class="com.github.spuchmann.xml.splitter.stax.FileStaxNodeSplitter"
init-method="init">
<property name="outputFolder" value="#{ T(com.google.common.io.Files).createTempDir().getAbsolutePath() }"/>
<property name="splittingNodeName" ref="tagElement"/>
<property name="documentEventHandler">
<bean class="com.github.spuchmann.xml.splitter.stax.XmlSurroundingNodeDocumentEventHandler">
<property name="node" ref="tagParentElement"/>
</bean>
</property>
</bean>
For a detailed explanation of this sample look at the java configuration section.
<bean id="globalValueFileStaxSplitter" class="com.github.spuchmann.xml.splitter.stax.FileStaxNodeSplitter"
init-method="init">
<property name="outputFolder" value="#{ T(com.google.common.io.Files).createTempDir().getAbsolutePath() }"/>
<property name="splittingNodeName" ref="tagElement"/>
<property name="globalDataCollectorNameList">
<list>
<ref bean="globalValue"/>
<ref bean="globalValue1"/>
</list>
</property>
<property name="documentEventHandler">
<bean class="com.github.spuchmann.xml.splitter.stax.XmlSurroundingNodeDocumentEventHandler">
<property name="node" ref="tagParentElement"/>
<property name="globalValueList">
<list>
<ref bean="globalValue1"/>
</list>
</property>
</bean>
</property>
</bean>
The xml-splitter is released under version 2.0 of the Apache License.