Utility to convert from CSV data files to VW Vowpal Wabbit format.
go get github.com/stmichaelis/datatools/csv2vw to download and build the conversion tool. Run
csv2vw -h from the GOPATH bin directory to see all comandline options.
csv2vw -i input.csv -o output.csv -l 2 -i ID_ATTRIBUTE -n 4,5,6 -r 8
input.csv to VW format using column with index 2 as training attribute, column named
ID_ATTRIBUTE as tag, treat columns 4,5,6 as nominal values and remove column 8 from output file.
-i="": -i <Inputfile>
-o="": -o <Outputfile>
-header=true: Use first line in CSV file as header.
-l="0": Column index (starting with 0) or column name (needs header=true) of label attribute. Set to -1 to use last value per row as label. For numerical column names prepend with _. Example: -l _23 for using column named 23 as label column instead of column with index 23.
-id="-1": Column index (starting with 0) or column name (needs header=true) of id (tag) attribute. For numerical column names prepend with _. Set to -1 for data without an index attribute.
-n="": Column indices (starting with 0) or column names (needs header=true) of nominal/categorial attributes. List separated by comma. Example: -n 1,2,5,6. For numerical column names prepend with _.
-r="": Remove comlumns with given indices (starting with 0) or column names (needs header=true) . List separated by comma. Example: -r 2,5,8,9. For numerical column names prepend with _.
-w="-1": Column index (starting with 0) or column name (needs header=true) of importance weight column. For numerical column names prepend with _. Set to -1 for data without a importance column.
-ncpu=0: Set number of cores to use. Values <= 0 use all availabe cores. Important: Set to 1 for preserving the same example order as in the input data set.
-nolabel=false: Omit label in output data for performing prediction only in VW.
-headernames=false: Use names of header columns as attribute names instead of index number. May increase output file size. Implies -header=true.
The headernames flag may be helpful in case you want to process csv files with a different number and ordering of columns, e.g. train and test files where the column indices differ due to a missing label.
-q=false: Quiet mode.
Copyright (C) 2015 Stefan Michaelis http://www.stefan-michaelis.name
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.