Updated Turbo CSV : https://github.com/gopi487krishna/turbo-csv-2
turbo_csv is a header only simple,flexible and fast csv parser written for Boost.xml competency test
C++20 (Currently C++20 is only required at few places in turbo_csv. This restriction will be removed in future )
The design of turbo_csv consists of 4 main components
- Reader
- Dialect
- Record
- File Reader
Reader :
- Manager class reponsible for directing other components and supplying them with the appropriate data
- Responsible for providing an external interface to the user to interact with other components
- Allows the programmer to plug in custom components (custom dialect,file_reader) for parsing the csv files
Dialect :
- Specifies the dialect of csv_parsing to be used.
- Dialect allows us to set the following properties associated with csv file
- record_seperator
- field_seperator
- ignore_characters (Characters to not include in record/skip them such as \t)
- escape_character
"
- turbo_csv allows custom dialect classes that must follow the interface
Record :
- Stores a record of csv file
- Does not parse the raw record data unless used once (Lazy)
- Provides operations on the fields of records
- Allows user to query metadata associated with the record
File Reader:
- Reads data from the file and supplies to the reader as per requirement
- turbo_csv allows custom FileReader classes that must follow the interface but can be implemented in any way (eg. memory_mapping technique, mutithreaded input buffer)
As turbo-csv is a header only library making it a part of a project is just a matter of seconds
- Download the source from the releases section
- Unzip the source
- Place the include folder in some directory
- Set the compiler include path to point to that directory
Another way to include turbo-csv is to add the turbo-csv directory (includes,test,benchmarks) as a subdirectory in cmake
In case you wanna run the bundled tests inside turbo-csv folder
mkdir build
cd build
cmake ../
cmake --build .
ctest
If you want to have an exact development environment as mine while testing, there is also a .devcontainer directory along with the dockerfile supplied in turbo-csv directory. This allows vs-code to open the folder in a container ( defined by me) containing all the necessary dependencies (vcpkg,boost,cmake), vscode extensions as well as configuration settings(cmake,intellisense etc) 😲
Note that gk487/cpp_base_image:latest currently does not have benchmark package in vcpkg. Hence you will have to install benchmark using the command
/vcpkg/vcpkg install benchmark:x64-linux
A new reader instance basically requires file path as the argument.Along with that there is also an optional treat_first_record_as_header
parameter which allows you to treat the first record as the header record
turbo_csv::reader csv_reader("path/filename.csv");
next()
function allows you to read the next record from the file. If there are no more records left in the file then next()
returns empty_record
( this can be tested using is_empty()
method of record)
auto rec=csv_reader.next();
Mutliple records can be read from the file in two ways
- Calling
next()
repeatedly until an empty record is foundwhile(true){ auto& rec=csv_reader.next(); if(rec.is_empty()){break;} }
- Using range based for loop (Much cleaner)
for(auto& rec:csv_reader){ record_count++; }
turbo_csv does allow accessing the records randomly using []
operator. If the has already cached the record at index then it simply returns the record, otherwise it keeps on reading the records until the record at index is reached.
//getting the total number of fields in 3 row/record
csv_reader[3].get_field_count();
To read a perticular column from csv file simply call get_column<T>()
. get_column() will process and deserialize the data in parallel
csv_reader.get_column<double>(1);
//Same as above
csv_reader.get_column<double>(csv_reader.get_indexof("Period"));
Some of the methods are
get_field(index)
Deserializes and returns the field data at "index"
rec.get_field<double>(1)
get_fields()
Returns all the field values as a vector of string views
rec.get_fields()
get_field_count()
Returns the number of field in the record
rec.get_field_count()
get_raw_size()
Returns the raw size of the record in bytes
rec.get_raw_size()
There is also an experimental_reader in tubo_csv.hpp which supports 2way multithreaded input buffers for better performance. Its still buggy in nature.Hence it is advised only to use it for experimental purposes
csv_file_reader.hpp is the implementation of mulithreaded input buffering system. The code is actually buggy and the quality is quite pathetic. So please do not use experimental_reader for any other purposes than testing
The users of this library can use their own custom dialect classe for supporting a number of variations in the csv format.
Similarily a custom file reader can also be written to support memory mapping the data
To know more about using the library please refer to the documentation tests ( all tests are documentation tests ) in tests folder