Vim plugin: Highlight columns in CSV and TSV files and run queries in SQL-like language
Clone or download
Latest commit 654c541 Sep 27, 2018

README.md

Overview

Rainbow CSV has 2 main features:

  • Highlight csv columns in different rainbow colors.
  • Provide SELECT and UPDATE queries in RBQL: SQL-like transprogramming query language.

There are 2 ways to enable csv columns highlighting:

  1. CSV autodetection based on file content. File extension doesn't have to be .csv or .tsv
  2. Manual CSV delimiter selection with :RainbowDelim command (So you can use rainbow_csv for non-table files, e.g. to highlight function arguments in different colors)

To run an RBQL query either press F5 or enter the query in vim command line e.g. :Select a1, a2

Extension is written in pure vimscript/python, no additional libraries required.

Demonstration of rainbow_csv highlighting and RBQL queries

demo_screencast

The demo table is demo/movies.tsv. There are also some other test datasets in python/test_datasets. In this demo python expressions were used, but JavaScript is also available.

Plugin description

Rainbow highlighting for non-table files

You can use rainbow highlighting and RBQL even for non-csv/tsv files.
E.g. you can highlight records in log files, one-line xmls and other delimited records.
You can even highlight function arguments in your programming language using comma as a delimiter for :RainbowDelim command. And you can always turn off the rainbow highlighting using :NoRainbowDelim command.

Here is an example of how to extract some fields from a bunch of uniform single-line xmls:

demo_xml_screencast

Mappings

Key Action
F5 Start query editing for the current csv file
F5 Execute currently edited query

Commands

:Select ...

Allows to enter RBQL select query as vim command. e.g. :Select a1, a2 order by a1

:Update ...

Allows to enter RBQL update query as vim command. e.g. :Update a1 = a1 + " " + a2

:RainbowDelim

Mark current file as a table and highlight it's columns in rainbow colors. Character under the cursor will be used as a delimiter. The delimiter will be saved in a config file for future vim sessions.

You can also use this command for non-csv files, e.g. to highlight function arguments
in source code in different colors. To return back to original syntax highlighting run :NoRainbowDelim

:RainbowDelimQuoted

Same as :RainbowDelim but allows delimiters inside fields if the field is double quoted by rules of Excel / RFC 4180

:RainbowMonoColumn

Mark the current file as rainbow table with a single column without delimiters. You will be able to run RBQL queries on it using a1 column variable.

:NoRainbowDelim

This command will disable rainbow columns highlighting for the current file. Use it to cancel :RainbowDelim, :RainbowDelimQuoted and :RainbowMonoColumn effects or when autodection mechanism has failed and marked non-table file as a table

:RainbowName <name>

Assign any name to the current table. You can use this name in join operation instead of the table path. E.g.

JOIN customers ON a1 == b1

intead of:

JOIN /path/to/my/customers/table ON a1 == b1

Configuration

g:rcsv_colorpairs

List of color name pairs to customize rainbow highlighting.
Each entry in the list is a pair of two colors: the first color is for terminal mode, the second one is for GUI mode.
Example:

let g:rcsv_colorpairs = [['red', 'red'], ['blue', 'blue'], ['green', 'green'], ['magenta', 'magenta'], ['NONE', 'NONE'], ['darkred', 'darkred'], ['darkblue', 'darkblue'], ['darkgreen', 'darkgreen'], ['darkmagenta', 'darkmagenta'], ['darkcyan', 'darkcyan']]

g:rbql_output_format

Default: input
Allowed values: tsv, csv, input

Format of RBQL result set tables.

  • input: same format as the input table
  • tsv: doesn't allow quoted tabs inside fields.
  • csv: is Excel-compatible and allows quoted commas.

Essentially format here is a pair: delimiter + quoting policy. This setting for example can be used to convert files between tsv and csv format:

  • To convert csv to tsv: 1. open csv file. 2. :let g:rbql_output_format='tsv' 3. :Select *
  • To convert tsv to csv: 1. open tsv file. 2. :let g:rbql_output_format='csv' 3. :Select *

g:rbql_backend_language

Default: python

Scripting language to use in RBQL expression. Either 'js' or 'python' To use JavaScript add let g:rbql_backend_language = 'js' to .vimrc

g:disable_rainbow_csv_autodetect

csv autodetection mechanism can be disabled by setting this variable value to 1.
Manual delimiter selection would still be possible.

g:rcsv_max_columns

Default: 30

Autodetection will fail if buffer has more than g:rcsv_max_columns columns.
You can increase or decrease this limit.

Optional "Header" file feature

Rainbow csv allows you to create a special "header" file for any of your spreadsheet table files. It must have the same name as the table file but with ".header" suffix (e.g. for "table.tsv" table the header file is "table.tsv.header"). The only purpose of header file is to provide csv column names. It is also possible to use :RainbowSetHeader <file_name> command to set a differently named file as a header for the current table.

Installation

Install with your favorite plugin manager.
If you want to use RBQL with JavaScript expressions, make sure you have Node.js installed

RBQL (RainBow Query Language) Description

RBQL is a technology which provides SQL-like language that supports SELECT and UPDATE queries with Python or JavaScript expressions.

Official Site

Main Features

  • Use Python or Java Script expressions inside SELECT, UPDATE, WHERE and ORDER BY statements
  • Result set of any query immediately becomes a first-class table on it's own.
  • Output entries appear in the same order as in input unless ORDER BY is provided.
  • Input csv/tsv spreadsheet may contain varying number of entries (but select query must be written in a way that prevents output of missing values)
  • Works out of the box, no external dependencies.

Supported SQL Keywords (Keywords are case insensitive)

  • SELECT [ TOP N ] [ DISTINCT [ COUNT ] ]
  • UPDATE [ SET ]
  • WHERE
  • ORDER BY ... [ DESC | ASC ]
  • [ [ STRICT ] LEFT | INNER ] JOIN
  • GROUP BY
  • LIMIT N

All keywords have the same meaning as in SQL queries. You can check them online

RBQL-specific keywords, rules and limitations

  • JOIN statements must have the following form: <JOIN_KEYWORD> (/path/to/table.tsv | table_name ) ON ai == bj
  • UPDATE SET is synonym to UPDATE, because in RBQL there is no need to specify the source table.
  • UPDATE has the same meaning as in SQL, but it also can be considered as a special type of SELECT query.
  • TOP and LIMIT have identical meaning. Use whichever you like more.
  • DISTINCT COUNT is like DISTINCT, but adds a new column to the "distinct" result set: number of occurences of the entry, similar to uniq -c unix command.
  • STRICT LEFT JOIN is like LEFT JOIN, but generates an error if any key in left table "A" doesn't have exactly one matching key in the right table "B".

Special variables

Variable Name Variable Type Variable Description
a1, a2,..., a{N} string Value of i-th column
b1, b2,..., b{N} string Value of i-th column in join table B
NR integer Line number (1-based)
NF integer Number of fields in line

Aggregate functions and queries

RBQL supports the following aggregate functions, which can also be used with GROUP BY keyword:
COUNT(), MIN(), MAX(), SUM(), AVG(), VARIANCE(), MEDIAN()

Limitations

  • Aggregate function are CASE SENSITIVE and must be CAPITALIZED.
  • It is illegal to use aggregate functions inside Python (or JS) expressions. Although you can use expressions inside aggregate functions. E.g. MAX(float(a1) / 1000) - legal; MAX(a1) / 1000 - illegal.

Examples of RBQL queries

With Python expressions

  • select top 100 a1, int(a2) * 10, len(a4) where a1 == "Buy" order by int(a2)
  • select * order by random.random() - random sort, this is an equivalent of bash command sort -R
  • select top 20 len(a1) / 10, a2 where a2 in ["car", "plane", "boat"] - use Python's "in" to emulate SQL's "in"
  • select len(a1) / 10, a2 where a2 in ["car", "plane", "boat"] limit 20
  • update set a3 = 'US' where a3.find('of America') != -1
  • select * where NR <= 10 - this is an equivalent of bash command "head -n 10", NR is 1-based')
  • select a1, a4 - this is an equivalent of bash command "cut -f 1,4"
  • select * order by int(a2) desc - this is an equivalent of bash command "sort -k2,2 -r -n"
  • select NR, * - enumerate lines, NR is 1-based
  • select * where re.match(".*ab.*", a1) is not None - select entries where first column has "ab" pattern
  • select a1, b1, b2 inner join ./countries.txt on a2 == b1 order by a1, a3 - an example of join query
  • select distinct count len(a1) where a2 != 'US'
  • select MAX(a1), MIN(a1) where a2 != 'US' group by a2, a3

With JavaScript expressions

  • select top 100 a1, a2 * 10, a4.length where a1 == "Buy" order by parseInt(a2)
  • select * order by Math.random() - random sort, this is an equivalent of bash command sort -R
  • select top 20 a1.length / 10, a2 where ["car", "plane", "boat"].indexOf(a2) > -1
  • select a1.length / 10, a2 where ["car", "plane", "boat"].indexOf(a2) > -1 limit 20
  • update set a3 = 'US' where a3.indexOf('of America') != -1
  • select * where NR <= 10 - this is an equivalent of bash command "head -n 10", NR is 1-based')
  • select a1, a4 - this is an equivalent of bash command "cut -f 1,4"
  • select * order by parseInt(a2) desc - this is an equivalent of bash command "sort -k2,2 -r -n"
  • select NR, * - enumerate lines, NR is 1-based
  • select a1, b1, b2 inner join ./countries.txt on a2 == b1 order by a1, a3 - an example of join query
  • select distinct count a1.length where a2 != 'US'
  • select MAX(a1), MIN(a1) where a2 != 'US' group by a2, a3

FAQ

How does RBQL work?

Python module rbql.py parses RBQL query, creates a new python worker module, then imports and executes it.

Explanation of simplified Python version of RBQL algorithm by example.

  1. User enters the following query, which is stored as a string Q:
    SELECT a3, int(a4) + 100, len(a2) WHERE a1 != 'SELL'
  1. RBQL replaces all a{i} substrings in the query string Q with a[{i - 1}] substrings. The result is the following string:
    Q = "SELECT a[2], int(a[3]) + 100, len(a[1]) WHERE a[0] != 'SELL'"
  1. RBQL searches for "SELECT" and "WHERE" keywords in the query string Q, throws the keywords away, and puts everything after these keywords into two variables S - select part and W - where part, so we will get:
    S = "a[2], int(a[3]) + 100, len(a[1])"
    W = "a[0] != 'SELL'"
  1. RBQL has static template script which looks like this:
    for line in sys.stdin:
        a = line.rstrip('\n').split('\t')
        if %%%W_Expression%%%:
            out_fields = [%%%S_Expression%%%]
            print '\t'.join([str(v) for v in out_fields])
  1. RBQL replaces %%%W_Expression%%% with W and %%%S_Expression%%% with S so we get the following script:
    for line in sys.stdin:
        a = line.rstrip('\n').split('\t')
        if a[0] != 'SELL':
            out_fields = [a[2], int(a[3]) + 100, len(a[1])]
            print '\t'.join([str(v) for v in out_fields])
  1. RBQL runs the patched script against user's data file:
    ./tmp_script.py < data.tsv > result.tsv

Result set of the original query (SELECT a3, int(a4) + 100, len(a2) WHERE a1 != 'SELL') is in the "result.tsv" file. It is clear that this simplified version can only work with tab-separated files.

Is this technology reliable?

It should be: RBQL scripts have only 1000 - 2000 lines combined (depending on how you count them) and there are no external dependencies. There is no complex logic, even query parsing functions are very simple. If something goes wrong RBQL will show an error instead of producing incorrect output, also there are currently 5 different warning types.

Standalone CLI Apps

You can also use two standalone RBQL Apps: with JavaScript and Python backends

rbql-js

Installation:

$ npm i rbql

Usage:

$ rbql-js --query "select a1, a2 order by a1" < input.tsv

rbql-py

Installation:

$ pip install rbql

Usage:

$ rbql-py --query "select a1, a2 order by a1" < input.tsv

Generic info

Comparison of Rainbow CSV technology with traditional graphical column alignment

Advantages

  • WYSIWYG
  • Familiar editing environment of your favorite text editor
  • Zero-cost abstraction: Syntax highlighting is essentially free, while graphical column alignment can be computationally expensive
  • High information density: Rainbow CSV shows more data per screen because it doesn't insert column-aligning whitespaces.
  • Works with non-table and semi-tabular files (text files that contain both table(s) and non-table data like text)
  • Ability to visually associate two same-colored columns from two different windows. This is not possible with graphical column alignment

Disadvantages

  • Rainbow CSV technology may be less effective for CSV files with many (> 10) columns
  • Current Rainbow CSV implementations do not support newlines inside double-quoted csv fields. Adding multiline fields support is technically possible under certain conditions but would impair other Rainbow CSV features and advantages.

References

Rainbow CSV in other editors:

RBQL

  • rbql-js CLI App for Node.js - npm
  • rbql-py CLI App in python

RBQL alternatives:

Related vim plugins:

Rainbow CSV name and original implementation was significantly influenced by rainbow_parentheses Vim plugin.

There also exists an old vim syntax file csv_color which, despite it's name, can highlight only *.tsv files.
And, of course, there is csv.vim