Programming exercise/presentation for the Fun-Club meetup on 2013-02-28
Erlang
Latest commit 29b352b Feb 24, 2013 @martinrehfeld parallel: one process per 50 lines, frequency in ets
real    0m0.513s
user    0m0.928s
sys     0m0.066s
Permalink
Failed to load latest commit information.
deps
priv
src
.gitignore
Makefile
README
rebar
rebar.config

README

here is the task:
----------------------------

You'll be given a simple textfile written in English language (like this one: http://www.textlibrary.com/download/moby-dic.txt).

Your task is to write a litte program that counts the occurences of words and print the 10 most frequent words with their number of occurences to stdout, like so (numbers are not correct!):

$ mysolution < moby-dic.txt
the: 50123
of: 10236
and: 9999
to: 4024
a: 3901
in: 2561
that: 2400
i: 2331
was: 2114
he: 1738

What is a word?
---------------------
- we'll assume that a word consists just of the the characters from a-z
- we don't distinguish uppercase and lowercase, so it's okay to convert everything to lowercase
- everything that is not in a-z can be considered a word boundary, so it's easiert for you to deal with commas, colons and the like.

What is the minimal requirement?
---------------------------------------
1. Write a minimal solution in your language that solves the task for the moby-dic.txt
2. Make your solution presentable (comment your source or prepare a little slide)
3. Be able to explain in a few sentences
- how your solution works
- what dependencies it has (non standard libraries etc)
- in what way your solution benefits from something special about your language
- and what its drawbacks are (if there are any)

What else can be done (optional)?
---------------------------------------------
Performance:
- Benchmark your solution in regards of time consumption. Either use the time command (man time) or even show us how to benchmark in your language.
- What is your solution spending time with? IO? Garbage collection?
- Improve that.
- Benchmark again...

Memory consumption:
- It is fine to read moby-dic.txt all at one into memory, but what if we give you a corpus that does exceed your machines memory? Fix this. Tell us about.
- Since this is functional programming: What datastructure did you use? Is it functional? Does it trigger heavy allocation and garbage collection? Find out and tell us about.