A simple XML file splitter, written in Rust.
The file splitter expects the input file to have multiple elements hanging off a root node, e.g.
<catalog>
<record>...</record>
<record>...</record>
<record>...</record>
<record>...</record>
</catalog>./xml_file_splitter \
--input input.xml.gz \
--element entry \
--chunk-size 1000 \
--output-prefix path/to/output/dir/chunkThe output will be saved in the directory path/to/output/dir/ with file names of the form chunk_00001.xml, chunk_00002.xml, chunk_00003.xml, and so on.
Produce gzip-compressed output files:
./xml_file_splitter \
--input input.xml.gz \
--element entry \
--chunk-size 1000 \
--output-prefix path/to/output/dir/chunk
--gzipOutput:
path/to/output/dir/chunk_00001.xml.gz
path/to/output/dir/chunk_00002.xml.gz
path/to/output/dir/chunk_00003.xml.gzinput: the input file; should be a gzipped XML fileelement, defaultentry: the XML tag name to collect for the output filechunk-size, default100000: number of elements per output fileoutput-prefix, defaultpart: path to the output file and the prefix to use for each output file. The prefix is appended with_nnnnn.xml, a zero-padded digit representing the number of the file in the sequence.gzip: whether or not the output files should be gzip compressed; include the parameter if the output files should be gzipped.
- Add
gzipparameter to produce gzip-compressed output files.