Skip to content

A simple command-line program for splitting CSV files.

License

Notifications You must be signed in to change notification settings

mifrandir/csv-split

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

94 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CSV Split

Build Tests

A fast command line program to split CSV files.

Usage

$ csv-split -h
Split CSV files by lines

VERSION: v0.0.2

USAGE:
        csv-split [OPTIONS] <FILE_TO_SPLIT>

OPTIONS:
        -n, --new-file-name <NEW_FILE>                          Name of the new files. This will be appended with an incremented number (default: "split")
        -e, --exclude-headers                                   Exclude headers in new files (default: false)
        -l, --line-count <COUNT>                                Number of lines per file (default: 1)
        -d, --delimiter <DELIMITER>                             Character used for column separation (default: ',')
        -r, --remove-columns <COL1>[<DELIM><COL2><DELIM>...]    Specify column names to be removed during processing.
        -i, --include-remainders                                Include remainder rows in the split files (default: false).
        -h, --help                                              Display this message

Simply run the executable with the desired inputs and flags. Then it will create a set of files of the form <NEW_FILE><INDEX>.csv that contain the rows of the original file in the range [INDEX+2,INDEX+2+COUNT) where line 1 is the input header.

The program processes the input line by line using getline, which is provided by GCC. Therefore

  1. the memory usage is only affected by the line length and desired split length and
  2. the time complexity is approximately O(L*I*W) where L is the number of lines in the input, I is the number of columns in the input and W is the number of characters in the longest word in the input. This is obviously an upper bound, so the runtime is going to be more accurately represented if W is chosen to be the average word length in the input.

Installation

If you are using Arch Linux, you can use the AUR package.

Otherwise, it is recommended to use the latest release. You can simply download the binary and run it or use the archive and extract it and then follow the other instructions.

If you are absolutely sure you want the newest (yet unstable) version, you can also clone the repository and change into directory:

$ git clone https://github.com/miltfra/csv-split
$ cd csv-split

To create the binary in bin/:

$ make

To install the binary to /usr/local/bin/ (requires root priviliges):

# make install

To install the binary to local directory (e.g. $HOME/.local/bin):

$ make DESTDIR=$HOME/.local/bin install-local

Installations can be undone with the correspodning uninstall and uninstall-local commands.

To force a complete rebuild, use

$ make clean
$ make build

Testing

You can verify that everything compiled successfully by running

$ make check

after make.

If you want to verify that the software behaves as intended, run

$ make test

and check the output. If you find a bug, please file a bug report on GitHub.

About

This program is heavily inspired by imartingraham/csv-split. It makes use of the already existing interface and aims to improve on speed by switching from Ruby to C.

This software has not yet been fully tested and should therefore be used with care.