Go to Ladder

Triplet Challenge

Triplet Challenge's goal is to encourage people to implement a small task in the programming language of their choice aiming at speed and low memory consumption. It borns from reading this post about the tech challenge of an interview. The output of this experiment is to see how even a simple challenge can be implemented in many different ways with a wide variety of results. It would be interesting to see how much we can squeeze of the CPU to get the program to complete its task as fast as possible while maintaining a low memory footprint.

If you have any doubts or suggestions about the challenge, please do not hesite and open an issue to discuss it.

Definition of the problem

First let's define what we are going to consider a triplet for this challenge: a triplet is a tuple of 3 consecutive words in the text.

The program takes an UTF-8 file in English as the only parameter. e.g../triplet_challenge pg2009.txt.
The output of the program are the top 3 triplet of words used along with the number of times they are repeated:
```
this is fine - 1337
call me maybe - 567
bring them back - 133
```
Since we assume English, ignore any non alphanumeric character except '. This means this - is?= fine should be interpreted as this is fine.
Triplets are case-insensitive. Hence, CaLL mE mAyBE and call me maybe should be considered as the same.

Requirements

You can use any programming language, as long as it can be run on Linux. Why Linux? Because the Continuous Integration system that we are all going to use will use a Linux machine. This is done like that so that we all play with the same context. Do take into account that using languages that require a virtual machine (e.g. Java, C#, Python, JavaScript, Ruby) will most likely result in a non-negligible startup time and overhead in terms of RAM usage. Feel free to tweak the parameters of the virtual machine to optimize that.
You can't use any 3rd-party library for the task. Even a simple one. That means, no npm, pip, conan, or whatever. The reasoning for this is that the task is small enough to be done with what the language offers and so that others can easily understand in the vanilla version of the language what's happening under the hoods without any extra knowledge of frameworks or libraries.
The file we will use to benchmark our program will be the free book The Origin of Species by Means of Natural Selection by Charles Darwin. The file is already included in the repo, so there is no need to download it separately. Its size is 1270822 bytes, containing 212199 words and 157858 triplets. The output of triplet_challenge using this file in my solution is:
```
of the same - 320
the same species - 130
conditions of life - 125
```
Do not make any assumptions regarding the content or size of the file.

How to compete

Fork this repo.
Sign-in to Travis CI using your GitHub credentials. Enable the triplet_challenge project in Travis CI repositories so that it starts tracking the project to build new commits. If you don't see there your triplet_challenge project yet, try the Sync account button on the top left.
Create a new branch for your work. e.g git checkout -b java. Do not commit to the master branch, so that you can easily fetch for updates from original repo, rebase, etc. Also, this makes it easier to others to add your remote to their already-existing repo and check out your work.
Implement your solution in the language of your choice. Use a different directory for unit tests and the rest, since the src directory is the one that will be used (by default) to count the lines of code of your solution. Please ensure you license all your code with the GPLv3 license and give credit if you are modifying an existing solution. One of this exercise's goal is that people can learn from other's code and improve it, without taking advantage of it.
Make sure you are closing the file pg2009.txt only at the very end of your application, since this will be used to benchmark your algorithm.
Test your solution against the pg2009.txt file, write unit tests if needed, etc.
Once you are happy with your solution, modify the .travis.yml file that Travis CI will use to build and benchmark your solution. Take into account that all our solutions will use the same hardware running Ubuntu 16.04 on a 2-core Intel(R) Xeon(R) CPU @ 2.30GHz with 7.5GB of RAM. You may need a few attempts to get the Travis CI configuration right if you have never used it before. I'd suggest to use an Ubuntu 16.04 docker container to test which packages need to be downloaded.
Push your new branch to your GitHub fork.
Check the build on https://travis-ci.org/[user]/triplet_challenge/builds. Pay attention the log to see whether it succeded or not and check out the benchmark results.
If you want to participate in the public ladder, create a PR to the original triplet_challenge repo or open an issue with all the details needed to add a new row with your results.
Feel free to have a different solution for each category in the ladder. The easiest way is to create a new branch to compete in each category.

Benchmark on Travis CI

The script part of Travis CI is the one that compiles your program. This is the part you should modify, along with the addons:apt:packages that you need to install on Ubuntu. The after_success part should never be modified, since it ensures we all execute the benchmark with the same conditions. This part does the following:

Report the CPU info, in case it ever changes.
Count the number of lines of code in the src directory using cloc.
Benchmark the speed using hyperfine. We run the application 3 times before the benchmark to ensure the IO is cached (warmp up runs) and then compute the average of the next 30 runs. In this step we inject the triplet_challenge_stats library to hijack the open calls so that we measure the time your solution takes without considering the time it takes to boot the VM, initialize the runtime, etc. A previous implementation stopped the timer with the close function, but now the timer stops when the shared library is unloaded to make sure we all play using the same rules. e.g. Rust an its std::fs::read_to_string opens the file, reads its contents and closes it right away.
Benchmark the memory usage using GNU time. We consider the Maximum resident set size as the value to compare against. As a side note, Valgrind Massif has been proven not to work well (or even work at all) in a wide variety of languages that use VM such as NodeJS, C#, Python. Even on languages with no VM and small runtime such as Rust it doesn't seem to work properly.

Ladder

There are 3 main areas that I think would be interesting to compare: speed, memory and lines of code (without comments):

Speed

#	User	Language	Algorithm/Process Time	Memory	LOC	Source	Build
1	pausan	C	6.65/7.6 ms	4.91 MiB	650	Repo	Build
2	caragones	C++	21.5/26.8 ms	7.79 MiB	377	Repo	Build
3	pausan	Rust	21.98/23.3 ms	8.22 MiB	205	Repo	Build
4	pamarcos	Rust	23.32/24.5 ms	9.28 MiB	141	Repo	Build
5	caragones	C++	31.0/33.8 ms	9.86 MiB	442	Repo	Build
6	pausan	Go	37.2 ms	15.84 MiB	148	Repo	Build
7	pamarcos	C++	102.03/104.2 ms	24.37 MiB	207	Repo	Build
8	pamarcos	C++	118.43/121.2 ms	12.81 MiB	162	Repo	Build
9	pausan	Go	124.8 ms	19.37 MiB	84	Repo	Build
10	pausan	Nim	173.9 ms	21.64 MiB	70	Repo	Build
11	pausan	Dart (native)	176.4 ms	56.6.64 MiB	77	Repo	Build
12	pausan	NodeJS	295.8 ms	88.772 MiB	140	Repo	Build
13	pausan	NodeJS	363.5 ms	70.728 MiB	78	Repo	Build
14	pausan	Python	469.5 ms	42.690 MiB	66	Repo	Build
15	javsanchez	C#	848.6 ms	-	100	Repo	Build
16	javsanchez	Python	1199 ms	8.425 MiB	47	Repo	Build

Memory

#	User	Language	Memory	Algorithm/Process Time	LOC	Source	Build
1	pausan	C	4.91 MiB	6.65/7.6 ms	650	Repo	Build
2	caragones	C++	7.79 MiB	21.5/26.8 ms	377	Repo	Build
3	pausan	Rust	8.22 MiB	21.98/23.3 ms	205	Repo	Build
4	javsanchez	Python	8.425 MiB	1199 ms	47	Repo	Build
5	pamarcos	Rust	9.28 MiB	23.32/24.5 ms	141	Repo	Build
6	caragones	C++	9.86 MiB	31.0/33.8 ms	442	Repo	Build
7	pamarcos	C++	12.81 MiB	118.43/121.2 ms	162	Repo	Build
8	pausan	Go	15.84 MiB	37.2 ms	148	Repo	Build
9	pausan	Go	19.37 MiB	124.8 ms	84	Repo	Build
10	pausan	Nim	21.64 MiB	173.9 ms	70	Repo	Build
11	pamarcos	C++	24.37 MiB	102.03/104.2 ms	207	Repo	Build
12	pausan	Python	42.690 MiB	469.5 ms	66	Repo	Build
13	pausan	Dart (native)	56.6.64 MiB	176.4 ms	77	Repo	Build
14	pausan	NodeJS	70.728 MiB	363.5 ms	78	Repo	Build
15	pausan	NodeJS	88.772 MiB	295.8 ms	140	Repo	Build
16	javsanchez	C#	-	848.6 ms	100	Repo	Build

Lines of code

#	User	Language	LOC	Algorithm/Process Time	Memory	Source	Build
1	luismedel	C#	11	N/A	N/A	Repo	N/A
2	javsanchez	Python	47	1199 ms	8.425 MiB	Repo	Build
3	pausan	Python	66	469.5 ms	42.690 MiB	Repo	Build
4	pausan	Nim	70	173.9 ms	21.64 MiB	Repo	Build
5	pausan	Dart (native)	77	176.4 ms	56.6.64 MiB	Repo	Build
6	pausan	NodeJS	78	363.5 ms	70.728 MiB	Repo	Build
7	pausan	Go	84	124.8 ms	19.37 MiB	Repo	Build
8	javsanchez	C#	100	848.6 ms	-	Repo	Build
9	pamarcos	Rust	141	23.32/24.5 ms	9.28 MiB	Repo	Build
10	pausan	NodeJS	140	295.8 ms	88.772 MiB	Repo	Build
11	pausan	Go	148	37.2 ms	15.84 MiB	Repo	Build
12	pamarcos	C++	162	118.43/121.2 ms	12.81 MiB	Repo	Build
13	pausan	Rust	205	21.98/23.3 ms	8.22 MiB	Repo	Build
14	pamarcos	C++	207	102.03/104.2 ms	24.37 MiB	Repo	Build
15	caragones	C++	377	21.5/26.8 ms	7.79 MiB	Repo	Build
16	caragones	C++	442	31.0/33.8 ms	9.86 MiB	Repo	Build
17	pausan	C	650	6.65/7.6 ms	4.91 MiB	Repo	Build

License

GPLv3

triplet_challenge is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

triplet_challenge is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with triplet_challenge. If not, see http://www.gnu.org/licenses/.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
after_success.sh		after_success.sh
check_result.sh		check_result.sh
expected_result.txt		expected_result.txt
pg2009.txt		pg2009.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.travis.yml

.travis.yml

LICENSE

LICENSE

README.md

README.md

after_success.sh

after_success.sh

check_result.sh

check_result.sh

expected_result.txt

expected_result.txt

pg2009.txt

pg2009.txt

Repository files navigation

Go to Ladder

Triplet Challenge

Definition of the problem

Requirements

How to compete

Benchmark on Travis CI

Ladder

Speed

Memory

Lines of code

License

About

Releases

Packages

Contributors 4

Languages

License

pamarcos/triplet_challenge

Folders and files

Latest commit

History

Repository files navigation

Triplet Challenge

Definition of the problem

Requirements

How to compete

Benchmark on Travis CI

Ladder

Speed

Memory

Lines of code

License

About

Resources

License

Stars

Watchers

Forks

Languages