Skip to content

Reference

incoder edited this page Jan 7, 2021 · 10 revisions

(Under development)

About IO

IO is a Jupiter satellite (Jupiter Moon), and in the same time the abbreviation of Input/Output. Library is designed to provide C++ universal input and output with well known industry standard data formats, like: text XML and JSON or binary ASN1 or Google Protocol buffers.

At the moment only XML is implemented. Another data-formats is under construction.

IO is cross platform and cross compiler C++ 11+ run-time library. At the moment following configuration supported (tested):

Building and installing library

IO is a run-time library can be used as static or dynamic (DLL or shared) library. In order to use it, you need to build it first. There are several build options, which were explained in README.md This section explains preconditions.

Microsoft Windows with GCC (MinGW64) and MSYS2

Download and install MSYS2 if not yet. Install Gnu Compiler Collection (GCC) as described at
MSYS2 documentation.

Then install GNU lib-iconv and gnutls-devel packages using pacman, using following commands:

pacman -S mingw-w64-x86_64-gnutls

Now you can build library using GNU make or cmake like described in README.md

Microsoft Windows with Microsoft Visual C++ (Visual Studio)

To build library using Microsoft C++ compiler you need Visual Studio or Vistual Studio Build tools version 15 or newer with C++ compiler option installed. Building with nmake described at README.md. If you'd like to build with cmake, it is considered to use cmake-gui.

Linux/POSIX

Make sure that you have installed following:

  1. GCC with G++ packages, version 4.7 is minimal for C++ 11. It is recommended to use GCC 5 +
  2. Development package of GNU TLS library, version 3.0 is minimal required version
  3. Check pkg-config package installed

Now you can build with GNU make or CMake as described at README.md.

Smart pointer with intrusive reference counting strategy

C ++ Technical Report 1 introduces generic smart pointer library addition into C ++ standard library. Interfaces was moved from Boost smart pointer. However for intrusive reference counting strategy implemented over the make_shared/enable_shared_from_this instead of boost :: intrusive_ptr template. make_shared/enable_shared_from_this together with shared_ptr have a benefit - it is generic useful with the legacy code without modifying it. And in the same time is uses 8 or 16 additional bytes peer each smart reference for 32 and 64 bit CPU architecture correspondingly. IO is new library, so it can save some memory using boost:intrusive_ptr.

If you using boost in your project, you can define IO_HAS_BOOST macro and let IO build system know about boost (cmake will pick up it automatically if boost is available). Otherwise IO will use embedded intrusive_ptr extracted from boost.
IO provides type definitions for smart references to avoid long type names like c++ boost::intrusive_ptr<io::read_channel> or multiple unreadable definitions like c++ auto rch = f.open_for_read(ec);

Short names always have next pattern s_[reference_type_name] for example s_read_channel.

IO also provides base class for simplify implementing reference on implementation with intrusive reference counting strategy called io::object.

If you implementing your own interface which is expected to be used with intrusive smart pointer you can use following technique:

 class foo: public io::object {
 public:

    constexpr foo() noexept:
       io::object()
    {}
 
   void bar() 
   {
     // TODO: do something
   }

   virtual ~foo() noexcept override
   {}

 };
 ...
 DECLARE_IPTR(foo);
 ...
 s_foo buz( new foo );
 buz->bar();

Byte buffer

Byte buffer is dynamic memory array container with uint8_t background data array. Unlike std::vector or another STL containers byte buffer designed especially for input and output operations.

Difference from std::vector<uint8_t>

  • Byte buffer is non copyable, but movable. E.g you if you need put a buffer a function parameter you should use reference or move reference;
  • Unlike Vector byte buffer grow strategy e.g. memory allocation and reallocation is under your direct control;
  • Unlike any STL containers all byte buffer method never throws, including std::bad_alloc, and can be used when exceptions support is off by compiler options;
  • Byte buffer have no begin and end iterators, position and last iterators exist instead. Position is set on the first byte where data can be pushed into, and the last shows on last pointer just after last buffer filled byte. Method flip moves position iterator into memory block starting address and leaves last iterator on it's current position. This allows you to effectively implement write and read sequences which needs some temporary memory block.

Typical usage of byte_buffer:

 std::error_code ec;
 io::byte_buffer buff = io::byte_buffer::allocate(ec, 1024);
 io::check_error_code(ec);
 buff.put(123); // puts integer binary data
 buff.put("Hello world!"); // puts C zero ending string binary
 buff.put(123.456); // puts float binary data
 buff.flip(); // moves position pointer into buffer begin, leaving last on it's current position
 out->write(ec, buff.position.get(), buffer.size() ); // writes content between position and last iterators into writeable channel
 io::check_error_code(ec);`

or

 std::error_code ec;
 io::byte_buffer buff = io::byte_buffer::allocate(ec, 1024);
 // read some bytes from readable channel
 std::size_t read = in->read(ec, buff.position.get(), buff.capacity() );
 io::check_error_code(ec); // check for IO error, throw / abort process if there were any
 buff.move(read);  // move position iterator on read bytes count
 buff.flip(); // moves position pointer into buffer begin, leaving last on it's current position
 out->write(ec, buff.position.get(), buffer.size() ); // writes content between position and last iterators into writeable channel
 io::check_error_code(ec);`

Error handling

Generally IO API using system error C++ standard library functionality for handing errors, rather then exceptions. If a function can fail for some reason, for example hardware or networking issues during input/output or out of memory state this function taking reference to a std::error_code as a first argument. You can handle the error according your requirement, and in the same time when you'd like to simply stop the program on an error you can usecheck_error_code(std::error_code&) function. In case of an error - this function throws std::system_error when you have exceptions enabled, otherwise it prints error message into error stream and exit the program with std::exit, process exit code would be error code value. Windows GUI applications will show this error using MessageBoxEx pop-up dialog, instead of error stream.

Generic Input and output interface

IO input output design principles

  • A generic API for input/output operations whenever data comes from or should be put into, without any specific buffer classes;
  • Resource Acquisition Is Initialization (RAII) - call to a constructor or factory method obtains input/output resource when destructor call closes resource;
  • Input/Output errors is not an exceptional case and should be easily handled and processed, without unexpected abnormal termination;
  • Call to any input or output method should be exceptions safe, i.e. should not throw (noexcept) and guaranty for not throwing;
  • Should be an option to build input/output error handling i.e. use or not to use C++ exceptions;
  • Textual input/output should be based on top of binary API without specific constructor flags;
  • Smart pointers (smart references) rather than raw pointer on reference or scoped resource owning like foo(std::fostream&) for polymorphic class objects

Channel abstaction

IO input output API interface based on a channel abstraction. A channel is a pure virtual class which provides a binary input output capabilities.

  • read_channel - provides basic synchronous read operation;
  • write_channel - provides basic synchronous write operation;
  • read_write_channel - combines read and write interfaces, for the resources which can be used for read and write at time like network sockets or pipes;
  • random_access_channel - extends read_write_channel with read/write position moving operations for the random access resources such as files or shared memory blocks.

Unsafe wrapper template

If you don't want to call check_error_code each time you are calling some read or wite method sort of API have a unsafe wrapper template. For example code without unsafe will looks like following:

   std::error_code ec;
   std::size_t read = in->read(ec, array, bytes);
   io::check_error_code(ec);
   std::size_t written = 0, wrt;
   while(written != read) {
     wrt = out->write(ec, array, read-written);
     io::check_error_code(ec);
     writtein += wrt;
     array += wrt;
   }

The same code can be implemented using unsafe template as following:

  io::unsafe<io::read_channel> src( std::move(in) );
  io::unsafe<io::write_channel> dst( std::move(out) );
  std::size_t read = src.read(ec, array, bytes);
  std::size_t written = 0, wrt;
  while(written != read) {
    wrt = out.write(ec, array, read-written);
    writtein += wrt;
    array += wrt;
 }

Transferring data between read and write channels.

If you need some simple operation like copy a file or write socket input into a file you can use transfer function. transfer taking a source input channel and destination write channel, together with error code and temporary buffer size as function arguments, and doing data transferring loop.

Compatibility with C++ standard library streams

<stream.hpp> header constants templates which can be used to fast build std::istream and std::ostream streams on top of read and write channels.

There are pre-defined type definitions for char, wchar_t, char16_t and char32_t streams.

// TBD

Files

Most common usage of input/output system is files. files.hpp header provides operating system depended file descriptor API. File class has same API for Microsoft Windows or POSIX like operating system (GNU/Linux, MacOS X, FreeBSD etc) but different implementations.

// TBD

Network Sockets

IO provides TCP/IP sockets channel. Implementation is bases on system sockets API. I.e. winsocks2 on MS Windows and Berklay sockets on POSIX. At the moment only synchronous TCP client side socket channel implemented.

sockets.hpp header declares networking interfaces

For example:

std::error_code ec;
const io::net::socket_factory *sf = io::net::socket_factory::instance(ec);
io::check_error_code(ec); // check for system error
io::s_socket s = sf->client_tcp_socket(ec, "google.com", 80); // creates blocking client tcp socket
io::check_error_code(ec); // check for the system or network error
io::s_read_write_channel raw_ch = tpc_socket->connect(ec); // connect to the server 
io::check_error_code(ec);  // check for connection error, such as connection timeout 

SSL and TLS security channels (not provided for MS VC++)

If you need a secured encrypted TCP/IP i.e. TLS/SSL sockets you can use the <net/secure_channel.hpp> implementation. This implementation is build on top GNU TLS library.

Generally obtaining a secure connection channel looks like following:

// FIXME : refactor the API to simplify

const io::net::secure::service *sec_service = io::net::secure::service::instance(ec);
io::check_error_code( ec );
io::s_read_write_channel raw_ch = tpc_socket->connect(ec); // creates blocking client tcp socket
io::check_error_code( ec );
io::s_read_write_channel sch = sec_service->new_client_connection(ec, std::move(raw_ch) ); // TLS handshake 

// TBD

Unique Resource Identifier (URI)

IO has a class for work with Unique Resource Identifiers (URI/URL). Interface can be found at <net/uri.hpp> header. Class is able to split/parse a URI on sections and contains a list of ports for well known protocols.

// TBD: description and usage examples, default scheme ports etc

Console framework

IO provides a console read and write channels for console (terminal) mode. To access console API include the console.hpp header. IO console have next advantages over the standard library std::cin/std::cout/std::cerr streams.

Console framework have following advantages

  • Locale is fully under your control, i.e. nether C library nor default std::imbique used.
  • Support for TTY colors
  • Support for UNICODE input and output, including windows console applications
  • You can output a huge amount of text into console without multiple flushing stream buffers on their overflow

Console have following traits

  • On Windows piping is not working. I.e. myapp.exe >> log.txt will produce a 0 bytes file;
  • On Windows a GUI application will allocate a console i.e. application windows + an additional console window;
  • If you put some binary data like float or integers directly to console binary channels, without converting them into string values result is undefined;

Alternative cin, cout and cerr

There are standard library like console streams provided by io::console class. All of them supports UTF-8 input data, character set reconverting (trans-coding) will be done automatically.

Text

Character sets and UNICODE

// TBD: better descriptions examples etc.

IO contains the API for converting string data between different code-pages. Conversations is build on top if iconv raw C API. POSIX libc/libc++ provides iconv out of the box, when MS Windows needs iconv as an additional dll. If you simply need to convert between const char*/const wchar_t*/const char16_t*/const char32_t* raw C character arrays you can use transcode family functions can be found at charsetcvt.hpp header. API for standard library strings std::string can be found in the text.hpp header.

Non cryptographic string hashing

If you need to have a predictable and fast non cryptographic hash functions for strings or arrays of raw data, you can found an API in hashing.hpp header. hash_bytes function provides MurMur32 hash function for 32-bit CPU architecture, and Google City Hash for 64-bit CPU architecture.

// TBD: example

Constant string

constant_string is container for dynamically allocated raw C style zero ending string. const_string is nether std::string nor C++ 17 string_view. It is considered you will store some UTF-8 character in this string. Benefit from this class is following:

  • immutability - can be used as a class field
  • Works like intrusive smart pointer
  • Ability to convert into mutable standard library string of UTF-8, UTF-16[LE|BE] and UTF-32[LE|BE] (not exception safe)

Find out more

Save memory with dynamic string pooling

If you program expecting to work with big amount of same strings allocated in dynamic memory, you can use IO string pooling. String pooling is build on top std::unordered_map and storing cached_string classes. cached_string in many aspects is similar to const_string, but unlike const_string a few functions implemented differently. For example comparing to cache_string object will bring to comparing to underlying pointers instead of referencing to std::strcmp.

// TBD: example

XML

IO contains functionality is reading and writing eXtensible Markup Language (XML) data format. IO XML is differ from most another C/C++ librarians for XML parsing and XML processing.

What is inside

  • Java like Streaming API for XML parsing StAX (Pool parsing API)
  • No any C/C++ dependencies on another XML libraries. E.g. IO XML is not a expat/msxml/libxml2 etc wrapper.
  • XML reader API to simplify reading XML into POD structures or a primitive classes
  • Support for exceptions and rtti compiler options off for XML parsing and XML reading
  • Writing XML from POD classes using template meta-programming techniques, XML format can be specified i.e. use tags or tag attributes
  • Generating XSD schema from POD types using template meta-programming techniques
  • Auto detecting latin1/ASCII/CP-1252/UTF-8/UTF-[16|32][LE|BE] XML file encoding
  • Lexical cast API for XML

Differences from full XML processors

This is non validating parser i.e. no XML structure validation using DTD or XSD yet provided.

XML syntax will be validated, i.e validation for valid XML characters, correct XML prologue and initial section, correct XML names and W3C attributes rolls like only one attribute with the same name and balanced root node.

XML Parsing with StAX XML (XML Pull Parsing)

A pull API for parsing XML. Unlike SAX or SAX like parsers (for example expat) you don't have to put any callbacks into parser. Parsing flow is fully under your control. This API is useful is you need to process some huge XML files, or need to extract only a specific data from a huge common XML. Memory used internally is limited to 16 mib as max. Initially parser uses a OS page size memory buffer (4k in most cases), buffer growing exponentially each time parser need more data unless 16 mib limit

// TBD

Reading XML into POCO classes or raw C structures structures

There is a facade on top for event reader API to simplify reading data into POCO structures. A complete parsing example with commentaries can be found

// TBD

Writing XML and generating XSD schema from POD types

// TBD

Library using template meta-programming for a reflection like serializing POCO into XML. All what you need is provide the required XML stricture to XML writer complete example can be found at xml_marshalling . When you have a C++ RTTI on, which is good idea for debugging build you can also generate XSD schema. NOTE! This functionality is not exception safe, unlike most parts of the library.