phrase_counter

How to run:

from sources:

$ go run parse.go  path/to/file1 /path/to/file2

or

$ cat path/to/file | go run parse.go

binary

$ ./parse-{{ arch }}  path/to/file1 /path/to/file2

or

$ cat path/to/file parse-{{ arch }}

{{ arch }} - 'amd64-linux' and 'arm64' (Mac M1)

amd64-linux binary was compiled on arm64 system with Go cross-compiling option: $ OOS=linux GOARCH=amd64 go build -o parse-amd64-linux parse.go

Docker

to build:

$ docker build -t phrase_counter ./

to run with stdin as input source:

$ cat file.txt | docker run -i  phrase_counter

to run with file(s) as argument(s)

$ docker run  phrase_counter "/phrase_counter/samples/file1.txt" "/phrase_counter/samples/fileN.txt"

Performance

2 algorithms were considered:

consequent: parse.go - datasources processed consequently
concurent: parse-conc.go - datasources processed concurently, then results merged

Serial algorithm is more efficient on large volumes ( ~1000 Moby Dicks). Intermediate results merging takes more than a half of total run time.

TODO (what to improve)

more tests (spent too much time trying to find most effective algorithm)
make it more user friendly: "--help", arguments check etc

Known bugs

Unicode support on Linux. Development was done on arm64 and code perfectly works with unicode. There are 3 formatting errors in processing unicode-test.txt file
NOT BUG: all words with punct. symbols inside without surrounding space(s) are not splitted ie H.M.S, a:b, a-b, don't etc...

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
samples		samples
Dockerfile		Dockerfile
README.md		README.md
parse-amd64-linux		parse-amd64-linux
parse-arm64		parse-arm64
parse-conc-amd64-linux		parse-conc-amd64-linux
parse-conc-arm64		parse-conc-arm64
parse-conc.go		parse-conc.go
parse-conc_test.go		parse-conc_test.go
parse.go		parse.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

phrase_counter

How to run:

Docker

Performance

TODO (what to improve)

Known bugs

About

Releases

Packages

Languages

s0b01ev/phrase_counter

Folders and files

Latest commit

History

Repository files navigation

phrase_counter

How to run:

Docker

Performance

TODO (what to improve)

Known bugs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages