Binary I/O #83

ardasener · 2022-03-23T11:25:05Z

SBFF (SparseBase File Format) (naming is open for debate)

This pull request adds a custom binary format for input and output of Format objects.

Goals

This format is designed with the following goals in mind:

As simple as possible
Easy to read elsewhere without needing the library
As efficient and as small as possible
Able to deal with architectural differences
Type safe

Specifications

Overview

The general structure of the file format is shown in the figure below.

As can be seen from the figure there are 3 entities in each file:

File header (only one at the very top)
Array header (multiple)
Array (multiple)

File Header

The file header is a JSON object encoded in ASCII and padded with space characters to be exactly 1KB in length. The object contains the following fields:

name: Name of the written structure (for example: CSR, COO)
array_count: Number of arrays written to this file
dimensions: Dimensions of the structure
endian: Either "little" or "big" depending on the architecture's byte order

Array Header

Array headers are identical file headers in structure (JSON object encoded in ASCII and padded). However they contain different fields:

name: Name of the array (for example: for a CSR this could be row_ptr, col or vals)
type: Could be "signed", "unsigned" or "float" depending on the type of the array
type_size: Number of bytes used to represent on entity of the array (for example: for a double array this is 8)
array_size: Number of entities inside the array (ie, the length of the array)

Array

The actual data of the arrays are written directly to the disk. Depending on the OS this is done in two ways:

On Windows, the arrays will be cast to a char pointer and written using an output stream in standard C++.
On UNIX, the arrays will be cast in the same way but written using PIGO's faster write routines.

Since PIGO does not support Windows, reading and writing these files will be slower on Windows. There sadly isn't much we can do about this.

…ust Reader

…der, added instantiations

# Conflicts: # src/CMakeLists.txt # src/generate_explicit_instantiations.py # src/sparsebase/sparse_reader.cc # src/sparsebase/sparse_reader.h

src/sparsebase/sparse_file_format.h

AmroAlJundi

The names of the Read and Write Functions in sparse_file_format should be more clear.

Also, I recommend merging this PR with #82 (i.e., closing that one and moving its PR comments here) since the code for the latter is already merged here.

ardasener · 2022-03-23T16:10:15Z

As suggested by @AmroAlJundi, all the changes from the pull request #82 are also here due to a necessary merge. So that pull request is closed and all the features discussed there are part of this request.

Binary I/O

ardasener and others added 14 commits March 13, 2022 20:02

Added PigoReader classes and changed the SparseReader class name to j…

bda4ab0

…ust Reader

Mostly finished the binary io system

2237450

Added dimensions to the file header, moved endianness to the file hea…

e02c600

…der, added instantiations

Added dimensions to the file header, moved endianness to the file hea…

f60f236

…der, added instantiations

Added instantiations, added missing PIGO header

74d4886

Added pigo header to the install

fc9c388

Added pigo and json headers to the install

18f38e4

Added non-PIGO classes for file i/o since PIGO does not support Windows

47c3169

Fixed explicit instantiation of PigoMTXReader

3231499

Added OpenMP CMake package

51bf2a4

Made PIGO compilation conditional (and disabled it for Windows)

8c19c70

Merge branch 'feature/pigo_integration' into feature/binary_io

3d77cba

# Conflicts: # src/CMakeLists.txt # src/generate_explicit_instantiations.py # src/sparsebase/sparse_reader.cc # src/sparsebase/sparse_reader.h

Switched to using the USE_PIGO variable, small tweaks

15ae0f0

Added the missing OpenMP line to the CMakeLists

6071910

ardasener requested a review from AmroAlJundi March 23, 2022 11:25

ardasener added priority: soon High priority state: pending Taking action type: feature Brand new functionality, features, workflows, endpoints, etc labels Mar 23, 2022

AmroAlJundi reviewed Mar 23, 2022

View reviewed changes

src/sparsebase/sparse_file_format.h Outdated Show resolved Hide resolved

AmroAlJundi suggested changes Mar 23, 2022

View reviewed changes

Changed Read and Write function names to be more descriptive

dbb1727

ardasener requested a review from AmroAlJundi March 23, 2022 16:07

ardasener mentioned this pull request Mar 23, 2022

PIGO Input Integration #82

Closed

ardasener added state: review needed and removed state: pending Taking action labels Mar 23, 2022

AmroAlJundi approved these changes Mar 23, 2022

View reviewed changes

ardasener merged commit 8c55640 into develop Mar 23, 2022

ardasener deleted the feature/binary_io branch March 23, 2022 16:21

SinanEkm pushed a commit that referenced this pull request Aug 19, 2024

Merge pull request #83 from SU-HPC/feature/binary_io

2d55f1a

Binary I/O

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Binary I/O #83

Binary I/O #83

ardasener commented Mar 23, 2022

AmroAlJundi left a comment

ardasener commented Mar 23, 2022

Binary I/O #83

Binary I/O #83

Conversation

ardasener commented Mar 23, 2022

SBFF (SparseBase File Format) (naming is open for debate)

Goals

Specifications

Overview

File Header

Array Header

Array

AmroAlJundi left a comment

Choose a reason for hiding this comment

ardasener commented Mar 23, 2022