Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Exercise for a FunClub meetup
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
Haskell
Python
README.md

README.md

wordCount

Exercise for a FunClub meetup

You'll be given a simple textfile written in English language (like this one: http://www.textlibrary.com/download/moby-dic.txt).

Your task is to write a litte program that counts the occurences of words and print the 10 most frequent words with their number of occurences to stdout, like so (numbers are not correct!):

$ mysolution < moby-dic.txt the: 50123 of: 10236 and: 9999 to: 4024 a: 3901 in: 2561 that: 2400 i: 2331 was: 2114 he: 1738

What is a word?

  • we'll assume that a word consists just of the the characters from a-z
  • we don't distinguish uppercase and lowercase, so it's okay to convert everything to lowercase
  • everything that is not in a-z can be considered a word boundary, so it's easiert for you to deal with commas, colons and the like.

What is the minimal requirement?

  1. Write a minimal solution in your language that solves the task for the moby-dic.txt
  2. Make your solution presentable (comment your source or prepare a little slide)
  3. Be able to explain in a few sentences
  4. how your solution works
  5. what dependencies it has (non standard libraries etc)
  6. in what way your solution benefits from something special about your language
  7. and what its drawbacks are (if there are any)

What else can be done (optional)?

Performance:

  • Benchmark your solution in regards of time consumption. Either use the time command (man time) or even show us how to benchmark in your language.
  • What is your solution spending time with? IO? Garbage collection?
  • Improve that.
  • Benchmark again...

Memory consumption:

  • It is fine to read moby-dic.txt all at one into memory, but what if we give you a corpus that does exceed your machines memory? Fix this. Tell us about.
  • Since this is functional programming: What datastructure did you use? Is it functional? Does it trigger heavy allocation and garbage collection? Find out and tell us about.
Something went wrong with that request. Please try again.