Skip to content

Reading structures from XML

incoder edited this page Oct 17, 2019 · 15 revisions

Assuming you have a program that has some configuration, and this configuration must be stored in some file and this file can be edited manually. You have a few configurations and only one of them is used, but you don't what to use more than one file to store them. Classical solution for this task would be properties ini files. The core issue of such an approach, user can make a typo in such a file, and if this file is not really a trivial, user cannot check for the error before running a program.

Well known alternative for properties file - is XML. XML can be validated before your program is started.

So we will use following XML for the demonstration proposes:

<?xml version="1.0" encoding="UTF-8"?>
<configurations>
	<configuration id="0" enabled="true">
		<name>Test configuration 0</name>
	</configuration>
	<configuration id="1" enabled="false">
		<name>Test configuration 1</name>
	</configuration>
</configurations>

And we would like to read those configurations to following POD structures

struct configuration
{
	std::size_t id;
	bool enabled;
	std::string name;
};

As well as, we have more than one configuration in our file we can store them in the STL container, like std::vector for example. IO XML parser is a streaming - unlike DOM approach you don't have to parse a whole XML document into a DOM tree and then extract the data you interested from this tree. DOM approach have many advantages when you are working some kind of documents of non-fixed format i.e. word processor or spreadsheet document files, and in the same time it uses big amount of memory. IO parser is also a pull parser. Unlike SAX approach you don't have to provide any callbacks, you are asking parser for next operation instead like you do with an iterator. Streaming pull parsing refers to a programming model in which a client application calls methods on an XML parsing library when it needs to interact with an XML infoset; that is, the client only gets (pulls) XML data when it explicitly asks for it.

How to parse the XML using IO.

  • Open a file and construct an XML reader
  • Define a function to read XML attributes and nesting tags, this function will use parser and not the wises versa
  • Read XML elements and their attributes into structures, and save them into std::vector

The code will looks like following:

//  cast_traits is used for converting characters into std::size_t
// it is optional and can be replaced with: boost lexical_cast, std::string_stream etc, std::strtoull etc
typedef io::xml::lexical_cast_traits<std::size_t> size_t_cast;
// same this for bool type
typedef io::xml::lexical_cast_traits<bool> bool_cast;

// This function reads a single configuration structure from XML reader
// \param rd an XML reader unsafe wrapper which is used to parse XML stream data 
static configuration read_config(io::unsafe<io::xml::reader>& rd)
{
	configuration ret;
	// taking next start element event and check whether it <configuration>
	io::xml::start_element_event sev = rd.next_expected_tag_begin("","configuration");
	// obtain id="123" attribute value
	io::const_string tmp = sev.get_attribute("","id").first;
	ret.id = size_t_cast::from_string( tmp.data() );
	// obtain enabled="true|false" attribute value
	tmp = sev.get_attribute("","enabled").first;
	ret.enabled = bool_cast::from_string( tmp.data() );
	// read nesting <name>name</name> tag value
	rd.next_expected_tag_begin("","name");
		ret.name = std::string( rd.next_characters().data() );
	rd.next_expected_tag_end("","name");
	// check next is </configuration> tag end
	rd.next_expected_tag_end("","configuration");
	return ret;
}

int main(int argc, const char** argv)
{
        // Open a configuration xml file, and construct XML reader object
	io::file sf("test-config.xml");
	std::error_code ec;
	io::xml::s_source src = io::xml::source::create(ec, sf.open_for_read(ec) );
        // check for error coder, and exit the program if there were any
	io::check_error_code( ec );
        // construct the low-level streaming parser
	io::xml::s_event_stream_parser psr = io::xml::event_stream_parser::open(ec, std::move(src) );
	io::check_error_code( ec );
	try {
          // unsafe wrapper used to avoid checking for error code after each XML parsing operation
          // which may fail   
	  io::unsafe<io::xml::reader> rd( std::move(psr) );
          // goto XML root element <configurations>, if root element have another name i.e.
          // we are parsing wrong or corrupt XML file reader will throw
          rd.next_expected_tag_begin("","configurations");
          // this vector will contain XML parsing result
	  std::vector<configuration> configurations;

	  // read all configurations from the XML stream one by one
	  rd.to_next_state();
	  while( rd.is_tag_begin_next() ) {
		configurations.emplace_back( read_config(rd) );
		rd.to_next_state();
	  }
	// simply to check XML root element name 
	rd.next_expected_tag_end("","configurations");

	// Display results
	for(configuration cnf: configurations) {
	    std::cout << '\t' << cnf << std::endl;
	}

     } catch(std::exception& exc) {
        // if there were any parsing errors - output it into error stream and exist application 
	std::cerr << exc.what() << std::endl;
	return -1;
     }
}

Complete example

More complex example with nested element list.

An XML:

<?xml version="1.0"?>
<data>
    <country name="Liechteinstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"></neighbor>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

Parsing code

#include <iostream>
#include <vector>

#include <files.hpp>
#include <xml_reader.hpp>
#include <xml_lexcast.hpp>

typedef io::xml::lexical_cast_traits<std::size_t> size_t_cast;
typedef io::xml::lexical_cast_traits<int> int_cast;
typedef io::xml::lexical_cast_traits<uint16_t> short_cast;


struct neighbor
{
	io::const_string name;
	char direction;
};

struct country {
	io::const_string name;
	int rank;
	uint16_t year;
	std::size_t gdppc;
	std::vector<neighbor> neighbors;
};

static neighbor read_neighbor(io::unsafe<io::xml::reader>& rd)
{
	io::xml::start_element_event sel = rd.next_tag_begin();
	// if <neighbor name="Austria" direction="E"></neighbor>
	// drop to </neighbor>
	if( !sel.empty_element() )
		rd.next_tag_end();
	neighbor ret;
	ret.name = sel.get_attribute("","name").first;
	ret.direction = sel.get_attribute("","direction").first.data()[0];
	return ret;
}

static country read_country(io::unsafe<io::xml::reader>& rd)
{
	io::xml::start_element_event ev = rd.next_tag_begin();

	country ret;

	// read name from attribute
	ret.name = ev.get_attribute("","name").first;

	// read rank from tag
	ev = rd.next_tag_begin();
	ret.rank = int_cast::from_string( rd.next_characters().data() );
	rd.next_tag_end();

	// read year
	ev = rd.next_tag_begin();
	ret.year = short_cast::from_string( rd.next_characters().data() );
	rd.next_tag_end();

	// read gdppc
	ev = rd.next_tag_begin();
	ret.gdppc = size_t_cast::from_string( rd.next_characters().data() );
	rd.next_tag_end();

	do {
		if( rd.is_characters_next() ) {
			if( !rd.next_characters().blank() ) {
				throw new std::runtime_error("XML markup error");
			}
		}
		if( rd.is_tag_begin_next() )
			ret.neighbors.emplace_back( read_neighbor(rd) );
	} while( !rd.is_tag_end_next() );

	// drop to </country>
	rd.next_tag_end();

	return ret;
}

std::ostream& operator<<(std::ostream& to,const country& cnt)
{
	to << " name: " << cnt.name;
	to << "\n rank: " << cnt.rank;
	to << "\n year: " << cnt.year;
	to << "\n gdppc: " << cnt.gdppc;
	if( ! cnt.neighbors.empty() ) {
		to << "\n " << cnt.neighbors.size() << " neighbors:" << std::endl;
		for(neighbor n: cnt.neighbors) {
			to << "\t name: " << n.name;
			to << " direction: " << n.direction;
			to << std::endl;
		}
	}
	return to;
}


int main(int argc, const char** argv)
{

	io::file sf("countries.xml");
	if( !sf.exist() ) {
		std::cerr << sf.path() << " is not exist" << std::endl;
		return -1;
	}
	std::error_code ec;
	io::xml::s_event_stream_parser psr = io::xml::event_stream_parser::open(ec, sf.open_for_read(ec) );
	io::check_error_code( ec );

	io::unsafe<io::xml::reader> rd( std::move(psr) );
	// goto <data>
	io::xml::start_element_event start_el = rd.next_tag_begin();
	if( !start_el.name().equal("","data") ) {
		std::cerr << "Unexpected element: " << start_el.name().local_name() << std::endl;
		return -1;
	}

	std::vector<country> countries;
	do {
		if( rd.is_characters_next() ) {
			if( !rd.next_characters().blank() ) {
				throw new std::runtime_error("XML markup error");
			}
		}
		if( rd.is_tag_begin_next() )
			countries.emplace_back( read_country(rd) );
	} while( !rd.is_tag_end_next() );
    // drop to </data>
	rd.next_tag_end();

	// show result
	std::cout << countries.size() << " countries read from XML" << std::endl;

	for(auto cnt: countries) {
		std::cout << cnt;
	}

	return 0;
}
Clone this wiki locally