Skip to content

svaloumas/gowc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gowc

CI codecov Go Report Card License

Just another GNU wc clone, written in Go.

Overview

gowc is a simple, zero-dependency command line tool for counting bytes, characters, words and newlines in each given file. It leverages the language's built-in support for concurrency by processing the given input files in chunks. The buffer size of each chunk is configurable and can be set via -bs, --buffer-size flag. It reads one chunk ahead while processing the previously read one.

Installation

make build-linux

Other available options: build-mac, build-win

Usage

By default, gowc will count lines, words, and bytes. You can specify the counters you'd like by using the available flags and options from the table below.

Flag Description
-c, --bytes Print the byte counts
-m, --chars Print the character counts
-l, --lines Print the newline counts
-l, --lines Print the newline counts
-w, --words Print the word counts
-L, --max-line-length Print the length of the longest line
-h, --help Display help and exit
-V, --version Output version information and exit
Option Description
-bs, --buffer-size Configure the buffer size of each chunk to be processed (defaults to 4096)
--files-from Read input from the files specified by a newline-terminated list of filenames in the given file
gowc [FLAGS] [OPTIONS] [FILE]...

Performance

hyperfine is used to perform the benchmarks. The file used is a 595MB CSV with 5m rows.

# New lines only count
$ hyperfine  --warmup 3 './gowc -l -bs 1000000 ./5mSalesRecords.csv' 'wc -l ./5mSalesRecords.csv'                                                                        [±main ●]
Benchmark 1: ./gowc -l -bs 1000000 ./5mSalesRecords.csv
  Time (mean ± σ):     160.2 ms ±   6.5 ms    [User: 118.5 ms, System: 126.6 ms]
  Range (min … max):   148.8 ms … 167.4 ms    17 runs
 
Benchmark 2: wc -l ./5mSalesRecords.csv
  Time (mean ± σ):     494.3 ms ±  12.3 ms    [User: 397.0 ms, System: 93.8 ms]
  Range (min … max):   480.8 ms … 517.6 ms    10 runs
 
Summary
  './gowc -l -bs 1000000 ./5mSalesRecords.csv' ran
    3.08 ± 0.15 times faster than 'wc -l ./5mSalesRecords.csv'

# Default lines, words and bytes count
hyperfine  --warmup 3 './gowc -bs 1000000 ./5mSalesRecords.csv' 'wc ./5mSalesRecords.csv'                                                                              [±main ●]
Benchmark 1: ./gowc -bs 1000000 ./5mSalesRecords.csv
  Time (mean ± σ):      1.542 s ±  0.008 s    [User: 1.554 s, System: 0.239 s]
  Range (min … max):    1.532 s …  1.559 s    10 runs
 
Benchmark 2: wc ./5mSalesRecords.csv
  Time (mean ± σ):      2.045 s ±  0.009 s    [User: 1.946 s, System: 0.097 s]
  Range (min … max):    2.033 s …  2.058 s    10 runs
 
Summary
  './gowc -bs 1000000 ./5mSalesRecords.csv' ran
    1.33 ± 0.01 times faster than 'wc ./5mSalesRecords.csv'

# Word only count
$ hyperfine  --warmup 3 './gowc -w -bs 1000000 ./5mSalesRecords.csv' 'wc -w ./5mSalesRecords.csv'                                                                        [±main ●]
Benchmark 1: ./gowc -w -bs 1000000 ./5mSalesRecords.csv
  Time (mean ± σ):      1.537 s ±  0.012 s    [User: 1.548 s, System: 0.240 s]
  Range (min … max):    1.520 s …  1.566 s    10 runs
 
Benchmark 2: wc -w ./5mSalesRecords.csv
  Time (mean ± σ):      2.041 s ±  0.011 s    [User: 1.941 s, System: 0.097 s]
  Range (min … max):    2.029 s …  2.063 s    10 runs
 
Summary
  './gowc -w -bs 1000000 ./5mSalesRecords.csv' ran
    1.33 ± 0.01 times faster than 'wc -w ./5mSalesRecords.csv'

# Characters only count
$ hyperfine  --warmup 3 './gowc -m -bs 1000000 ./5mSalesRecords.csv' 'wc -m ./5mSalesRecords.csv'                                                                        [±main ●]
Benchmark 1: ./gowc -m -bs 1000000 ./5mSalesRecords.csv
  Time (mean ± σ):     751.9 ms ±   6.4 ms    [User: 707.1 ms, System: 149.5 ms]
  Range (min … max):   741.9 ms … 759.5 ms    10 runs
 
Benchmark 2: wc -m ./5mSalesRecords.csv
  Time (mean ± σ):      5.667 s ±  0.094 s    [User: 5.539 s, System: 0.113 s]
  Range (min … max):    5.578 s …  5.794 s    10 runs
 
Summary
  './gowc -m -bs 1000000 ./5mSalesRecords.csv' ran
    7.54 ± 0.14 times faster than 'wc -m ./5mSalesRecords.csv'

# Multiple files
$ hyperfine  --warmup 3 './gowc -bs 1000000 ./5mSalesRecords.csv ./5mSalesRecords.csv' 'wc ./5mSalesRecords.csv ./5mSalesRecords.csv'                [±main ●]
Benchmark 1: ./gowc -bs 1000000 ./5mSalesRecords.csv ./5mSalesRecords.csv
  Time (mean ± σ):      1.698 s ±  0.009 s    [User: 3.271 s, System: 0.515 s]
  Range (min … max):    1.684 s …  1.708 s    10 runs
 
Benchmark 2: wc ./5mSalesRecords.csv ./5mSalesRecords.csv
  Time (mean ± σ):      4.082 s ±  0.013 s    [User: 3.886 s, System: 0.192 s]
  Range (min … max):    4.062 s …  4.102 s    10 runs
 
Summary
  './gowc -bs 1000000 ./5mSalesRecords.csv ./5mSalesRecords.csv' ran
    2.40 ± 0.01 times faster than 'wc ./5mSalesRecords.csv ./5mSalesRecords.csv

Tests

Run the test suite.

make test