OPF file format for datasets

As LibDEEP uses the same format as LibOPF datasets, the LibOPF package contains a directory LibOPF/tools, in which you can find some useful tools.

txt2opf: a program to convert OPF files written in ASCII format to binary format.
opf2txt: a program to convert OPF files written in binary format to ASCII format.
opf_check: a program to check whether a file is in the OPF required format.
opf2svm: a program to convert binary OPF files to LibSVM format.
svm2opf: a program to convert LibSVM files to binary OPF format.

The original dataset and its parts training, evaluation and test sets must be in the following BINARY file format:

<# of samples> <# of labels> <# of features>
<0> <label> <feature 1 from element 0> <feature 2 from element 0> ...
<1> <label> <feature 1 from element 1> <feature 2 from element 1> ...
.
.
<i> <label> <feature 1 from element i> <feature 2 from element i> ...
<i+1> <label> <feature 1 from element i+1> <feature 2 from element i+1> ...
.
.
<n-1> <label> <feature 1 from element n-1> <feature 2 from element n-1> ...

The first number of each line, <0>, <1>, ... <n-1>, is a sample identifier (for n samples in the dataset), which is used in the case of precomputed distances. However, the identifier must be specified anyway. For unlabeled datasets, please use label 0 for all samples (unsupervised OPF).

Example: Suppose that you have a dataset with 5 samples, distributed into 3 classes, with 2 elements from label 1, 2 elements from label 2 and 1 element from label 3. Each sample is represented by a feature vector of size 2. So, the OPF file format should look like as below:

5 3 2
0 1 0.21 0.45
1 1 0.22 0.43
2 2 0.67 1.12
3 2 0.60 1.11
4 3 0.79 0.04

Comment #1: Note that, the file must be binary with no blank spaces. This ASCII representation is just for illustration.

Comment #2: The first line of the file, 5 3 2, contains, respectively, the dataset size, the number of labels (classes) and the number of features in the feature vectors. The remaining lines contain the sample identifier (integer from 0 to n-1, in which n is the dataset size), its label and the feature values for each sample.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OPF file format for datasets

Clone this wiki locally