Add initial Elixir implementation #44

crbelaus · 2016-03-25T11:40:30Z

I've added a basic Elixir implementation. The steps are explained to make clear what the script is doing in each step.

It is currently the slowest implementation across all the tested languages :( . I'm betting it is caused by Elixir creating and destroying the dictionary in each iteration, buy I'm not really sure about it.
Hope someone with more knowledge can help improving this solution.

juditacs · 2016-03-25T12:26:58Z

No worries, it's good to have a new one :)

Unfortunately Elixir is not available in the main repos

E: Unable to locate package elixir

Could you please fix this in the Dockerfile. Also no need for sudo.

crbelaus · 2016-03-25T14:49:00Z

Dockerfile updated to install Elixir using the instructions in their website :)

crbelaus · 2016-03-25T15:21:53Z

The results are looking like this.

Rank	Experiment	CPU seconds	User time	Maximum memory
1	rust/wordcount/wordcount	4.06	3.98	331916
2	cpp/wc_vector	5.66	5.51	247084
3	python/wordcount_py2gabor.py	6.73	6.64	247900
4	go/bin/wordcount	7.65	7.53	370876
5	cpp/wc_hash_nosync	9.94	7.71	331788
6	php7.0 php/wordcount.php	10.24	8.06	267984
7	python/wordcount_py2.py	11.07	10.88	561716
8	cpp/wc_baseline_hash	14.25	11.91	348084
9	java -classpath java WordCount	15.41	14.8	841320
10	java -classpath java WordCountEntries	15.78	15.12	834452
11	mono csharp/WordCountList.exe	17.69	12.65	335900
12	perl/wordcount.pl	20.58	20.41	452276
13	python/wordcount_py3.py	20.98	20.76	490012
14	cpp/wc_baseline	21.12	18.63	361000
15	php5.6 php/wordcount.php	23.65	21.14	791796
16	nodejs javascript/wordcount.js	25.11	25.04	381696
17	julia julia/wordcount.jl	28.06	27.9	514804
18	nodejs javascript/wordcount2.js	33.14	29.85	360360
19	bash/wordcount.sh	62.62	70.42	12848
20	elixir elixir/wordcount.ex	151.82	154.96	1303708

NobbZ · 2016-03-26T01:55:48Z

elixir/wordcount.ex

+|> Enum.reduce(%{}, fn (line, acc) ->
+  line
+  |> String.strip
+  |> String.split(~r/\s/)


You should omit String.split and pass split: true, will probably make it a little bit faster.

Ilyes512 · 2016-03-26T09:53:17Z

I have only took a glance at the solution, but it reminded me of a exercism excersice. You could browse the solution to come up with some idea. Here is mine: http://exercism.io/submissions/a06a51b9c4f54177a29401d0df74b959

NobbZ · 2016-03-26T10:01:00Z

It's similar in the spirit, but one of the biggest difference is the type
and size of the input, as such we need some more optimisations on this.

I've already pulled and doing furthers inspects.

For now I try to speed up this single process implementation, since single
process is what is wanted by the rules, we can add a multiprocess later on,
to show Elixirs strengths.

I don't assume, that we will get into top ten with single process though.

Ilyes notifications@github.com schrieb am Sa., 26. März 2016 um 10:53 Uhr:

I have only took a glance at the solution, but it reminded me of a
exercism excersice. You could browse the solution to come up with some
idea. Here is mine:
http://exercism.io/submissions/a06a51b9c4f54177a29401d0df74b959

—
You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#44 (comment)

The Dict module is now deprecated in favour of the Map module.

We are using pattern-matching for the annonymous functions, which makes the code more readable and yields a small performance improvement.

crbelaus · 2016-03-26T11:00:09Z

@NobbZ I've upgraded the pull request with your suggestions (thank you a big time!) . I've also started to use Map instead of Dict, since this last one is deprecated.
I've observed a small performance improvement after using pattern matching for the anonymous functions, but we are still in last place :(

I'm taking a look at the possibility of using Streams to delay the computation until it is really necessary and avoid creating intermediate lists.

NobbZ · 2016-03-26T11:14:10Z

I'm still struggling with setting everything up. After I realized that my project folder isn't shared between docker and system but cloned from git fresh, I have now your branch tested once, but running scripts/test_all.sh does result in a lot of failing tests for nearly every other language.

crbelaus · 2016-03-26T11:16:42Z

@NobbZ for testing my pull request I've deleted the /wordcount directory in the container. Then, to share the project folder with the container, launch it like this

docker run -it -v $PWD:/wordcount wordcount bash

We now use streams to process input before creating a dictionary to count appearances. This allows to delay computation and reduces the number of intermediate lists created.

NobbZ · 2016-03-26T16:27:01Z

Have you already done a new benchmark after doing the latest changes? Any recognizable improvements?

NobbZ · 2016-03-26T16:28:20Z

Oh, and I just realized, you are doing all the work during compiletime. This is considered to be much slower. I will take a look into that issue when home.

crbelaus · 2016-03-26T16:40:26Z

I've measured a small performance improvement, but nothing too noticeable :(

NobbZ · 2016-03-27T01:48:20Z

Okay, still the last in the list, but this is the result of my last run:

$ time cat data/huwikisource-latest-pages-meta-current.xml | elixir elixir/wordcount.ex > /dev/null

real    2m0.904s
user    1m57.025s
sys 0m1.200s

Run on a “Intel(R) Core(TM) i5-4210M CPU @ 2.60GHz” with 8 GiB of Memory.

When I started I was at roughly 2m30s user-time. I will extract a patch after I got some sleep (~8 h).

1. Removed `Stream.reject/2` call and moved it's functionality into the `Enum.reduce/3` (doing a pattern match). 2. Removed explicit conversion of the map to a list. Maps are already proper `Enum`s and can be fed into `Enum.sort/2`. 3. Preparing an IO-list and feeding this into `IO.puts/1`. This way does safe us up some function calls (less interference by the scheduler, also it reduces the amount of buffer flushes.

NobbZ · 2016-03-27T12:10:06Z

OK, I already wrote it, but it seems as if I forgot to submit... And I totally screwed my local stuff, I'll try something else…

NobbZ · 2016-03-27T12:14:02Z

OK, I created a PR to @belaustegui's repo: belaustegui/wordcount#1, after he has merged over there, it should appear here.

Improved performance of elixir solution a bit

crbelaus · 2016-03-27T12:49:32Z

This last patch has made a great performance improvement. 30 seconds faster!!!!
Thank you very much @NobbZ

Rank	Experiment	CPU seconds	User time	Maximum memory
1	rust/wordcount/wordcount	4.1	4.04	331772
2	cpp/wc_vector	5.97	5.8	248148
3	python/wordcount_py2gabor.py	6.84	6.74	247692
4	go/bin/wordcount	7.8	7.62	370864
5	php7.0 php/wordcount.php	10.26	7.94	267588
6	cpp/wc_hash_nosync	10.88	8.3	331728
7	scala -J-Xmx2g -classpath scala Wordcount	11.68	18.0	755312
8	python/wordcount_py2.py	12.39	11.37	562364
9	cpp/wc_baseline_hash	14.93	12.48	348076
10	java -classpath java WordCount	16.0	15.0	839192
11	java -classpath java WordCountEntries	16.4	14.99	836576
12	mono csharp/WordCountList.exe	18.06	12.82	339608
13	python/wordcount_py3.py	21.66	21.4	489732
14	cpp/wc_baseline	21.68	19.23	360916
15	perl/wordcount.pl	21.86	21.65	452288
16	php5.6 php/wordcount.php	23.95	21.41	791572
17	nodejs javascript/wordcount.js	25.51	25.4	402320
18	julia julia/wordcount.jl	30.56	29.11	515452
19	nodejs javascript/wordcount2.js	34.05	30.62	376292
20	haskell/WordCount	37.07	36.76	1059820
21	bash/wordcount.sh	60.78	67.72	13044
22	elixir elixir/wordcount.ex	121.66	120.83	1259244

juditacs · 2016-03-27T12:52:31Z

Please see this issue: #50

I'm looking forward to your contribution, let me know if it's ready to be merged.

NobbZ · 2016-03-27T15:29:23Z

I'm still fiddling around locally, but currently I can't report any further improvements. The very last thing hat comes to my mind is a rewrite of the way we currently read the input, but I can't start that today, at the end it's Easter and some family time. So I think it can get merged for now after @belaustegui rebased.

Conflicts: Dockerfile

Using `IO.puts` caused an empty line to be added at the end of the output, which broke the tests.

crbelaus · 2016-03-27T16:07:17Z

I think that it is ready to be merged.
Thank you both!!!

juditacs · 2016-03-27T16:09:31Z

scripts/build.sh

@@ -4,6 +4,10 @@ cd cpp
 g++ wordcount_baseline.cpp -std=c++11 -o wordcount_baseline -O3
 g++ wordcount.cpp -std=c++11 -o wordcount -O3

+cd ../elixir
+# Elixir has to run the script to compile it (http://stackoverflow.com/questions/35722248/the-command-elixirc-is-compiling-and-executing-the-code)
+echo "wadus" | elixir elixir/wordcount.ex > /dev/null


Is this necessary?

Nope, in fact its not. because there is no binary or whatever created.

The way it is used now, is very bad meta-programming voodoo and does not need any precompilation.

So these 3 lines can be removed for now.

But while I am experimenting locally I already transfered this unidiomatic use of the meta programming capabilities into a proper mix-project which also does create a proper executable (this doesn't give any perf. improvements at all!). But it is much more clean.

But I will only continue that version of mine, when I this got merged and I figured out how to manually controll IO-Streams in a more efficient way ;) StdLib does a good job in keeping manual flushing away from me ;)

Can you remove it before merge-ing?

I think it's easier if @belaustegui removes it himself, creating another PR on his fork or providing a patch-file here might produce much of unnecessary noise. Also he seems still to actively follow this issue.

juditacs · 2016-03-27T16:13:11Z

Tests pass, will merge soon.

I know you'll hate me because of this, but I added a helper file for the evaluator script: https://github.com/juditacs/wordcount#adding-a-new-program

This way the contributors are listed for each file (git blame needs to know the source file).

Could you please add a line for elixir?

NobbZ · 2016-03-27T16:24:17Z

Since there is currently no binary created, is that line still needed?

juditacs · 2016-03-27T16:25:42Z

Good idea, I'll fix it. No, don't add anything. Can I merge?

crbelaus · 2016-03-27T17:22:35Z

Of course. Thanks!

juditacs · 2016-03-27T19:05:24Z

I'll merge this and we can remove those lines later.

crbelaus added 6 commits March 25, 2016 12:04

Add script for wordcount in Elixir

ca018ec

Fix elixir script to pass all tests

92147a1

Add required infrastructure for Elixir script

3d21e6b

Merge branch 'elixir-script'

a3740ef

Fix rust compilation

7314bd1

Don't change to base dir after compilation

d07654c

Fix elixir installation

491cece

NobbZ reviewed Mar 26, 2016
View reviewed changes

crbelaus added 3 commits March 26, 2016 11:32

Remove unnecesary String.strip call

2f7ea9f

Substitute Dict usages with Map

66e92bb

The Dict module is now deprecated in favour of the Map module.

Use double headed annonymous functions

dae556d

We are using pattern-matching for the annonymous functions, which makes the code more readable and yields a small performance improvement.

Use streams to process input

3982dd4

We now use streams to process input before creating a dictionary to count appearances. This allows to delay computation and reduces the number of intermediate lists created.

NobbZ mentioned this pull request Mar 26, 2016

Reorganize setup #47

Open

Merge pull request #1 from NobbZ/elixir

8aeb641

Improved performance of elixir solution a bit

crbelaus added 2 commits March 27, 2016 18:03

Merge remote-tracking branch 'upstream/master'

878a36f

Conflicts: Dockerfile

Fix Elixir tests

abeee02

Using `IO.puts` caused an empty line to be added at the end of the output, which broke the tests.

juditacs reviewed Mar 27, 2016
View reviewed changes

juditacs merged commit 7ad19d7 into juditacs:master Mar 27, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add initial Elixir implementation #44

Add initial Elixir implementation #44

crbelaus commented Mar 25, 2016

juditacs commented Mar 25, 2016

crbelaus commented Mar 25, 2016

crbelaus commented Mar 25, 2016

NobbZ Mar 26, 2016

Ilyes512 commented Mar 26, 2016

NobbZ commented Mar 26, 2016

crbelaus commented Mar 26, 2016

NobbZ commented Mar 26, 2016

crbelaus commented Mar 26, 2016

NobbZ commented Mar 26, 2016

NobbZ commented Mar 26, 2016

crbelaus commented Mar 26, 2016

NobbZ commented Mar 27, 2016

NobbZ commented Mar 27, 2016

NobbZ commented Mar 27, 2016

crbelaus commented Mar 27, 2016

juditacs commented Mar 27, 2016

NobbZ commented Mar 27, 2016

crbelaus commented Mar 27, 2016

juditacs Mar 27, 2016

NobbZ Mar 27, 2016

juditacs Mar 27, 2016

NobbZ Mar 27, 2016

juditacs commented Mar 27, 2016

NobbZ commented Mar 27, 2016

juditacs commented Mar 27, 2016

crbelaus commented Mar 27, 2016

juditacs commented Mar 27, 2016

Add initial Elixir implementation #44

Add initial Elixir implementation #44

Conversation

crbelaus commented Mar 25, 2016

juditacs commented Mar 25, 2016

crbelaus commented Mar 25, 2016

crbelaus commented Mar 25, 2016

NobbZ Mar 26, 2016

Choose a reason for hiding this comment

Ilyes512 commented Mar 26, 2016

NobbZ commented Mar 26, 2016

crbelaus commented Mar 26, 2016

NobbZ commented Mar 26, 2016

crbelaus commented Mar 26, 2016

NobbZ commented Mar 26, 2016

NobbZ commented Mar 26, 2016

crbelaus commented Mar 26, 2016

NobbZ commented Mar 27, 2016

NobbZ commented Mar 27, 2016

NobbZ commented Mar 27, 2016

crbelaus commented Mar 27, 2016

juditacs commented Mar 27, 2016

NobbZ commented Mar 27, 2016

crbelaus commented Mar 27, 2016

juditacs Mar 27, 2016

Choose a reason for hiding this comment

NobbZ Mar 27, 2016

Choose a reason for hiding this comment

juditacs Mar 27, 2016

Choose a reason for hiding this comment

NobbZ Mar 27, 2016

Choose a reason for hiding this comment

juditacs commented Mar 27, 2016

NobbZ commented Mar 27, 2016

juditacs commented Mar 27, 2016

crbelaus commented Mar 27, 2016

juditacs commented Mar 27, 2016