Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial Elixir implementation #44

Merged
merged 15 commits into from
Mar 27, 2016
Merged

Add initial Elixir implementation #44

merged 15 commits into from
Mar 27, 2016

Conversation

crbelaus
Copy link
Contributor

I've added a basic Elixir implementation. The steps are explained to make clear what the script is doing in each step.

It is currently the slowest implementation across all the tested languages :( . I'm betting it is caused by Elixir creating and destroying the dictionary in each iteration, buy I'm not really sure about it.
Hope someone with more knowledge can help improving this solution.

@juditacs
Copy link
Owner

No worries, it's good to have a new one :)

Unfortunately Elixir is not available in the main repos

E: Unable to locate package elixir

Could you please fix this in the Dockerfile. Also no need for sudo.

@crbelaus
Copy link
Contributor Author

Dockerfile updated to install Elixir using the instructions in their website :)

@crbelaus
Copy link
Contributor Author

The results are looking like this.

Rank Experiment CPU seconds User time Maximum memory
1 rust/wordcount/wordcount 4.06 3.98 331916
2 cpp/wc_vector 5.66 5.51 247084
3 python/wordcount_py2gabor.py 6.73 6.64 247900
4 go/bin/wordcount 7.65 7.53 370876
5 cpp/wc_hash_nosync 9.94 7.71 331788
6 php7.0 php/wordcount.php 10.24 8.06 267984
7 python/wordcount_py2.py 11.07 10.88 561716
8 cpp/wc_baseline_hash 14.25 11.91 348084
9 java -classpath java WordCount 15.41 14.8 841320
10 java -classpath java WordCountEntries 15.78 15.12 834452
11 mono csharp/WordCountList.exe 17.69 12.65 335900
12 perl/wordcount.pl 20.58 20.41 452276
13 python/wordcount_py3.py 20.98 20.76 490012
14 cpp/wc_baseline 21.12 18.63 361000
15 php5.6 php/wordcount.php 23.65 21.14 791796
16 nodejs javascript/wordcount.js 25.11 25.04 381696
17 julia julia/wordcount.jl 28.06 27.9 514804
18 nodejs javascript/wordcount2.js 33.14 29.85 360360
19 bash/wordcount.sh 62.62 70.42 12848
20 elixir elixir/wordcount.ex 151.82 154.96 1303708

|> Enum.reduce(%{}, fn (line, acc) ->
line
|> String.strip
|> String.split(~r/\s/)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should omit String.split and pass split: true, will probably make it a little bit faster.

@Ilyes512
Copy link

I have only took a glance at the solution, but it reminded me of a exercism excersice. You could browse the solution to come up with some idea. Here is mine: http://exercism.io/submissions/a06a51b9c4f54177a29401d0df74b959

@NobbZ
Copy link
Contributor

NobbZ commented Mar 26, 2016

It's similar in the spirit, but one of the biggest difference is the type
and size of the input, as such we need some more optimisations on this.

I've already pulled and doing furthers inspects.

For now I try to speed up this single process implementation, since single
process is what is wanted by the rules, we can add a multiprocess later on,
to show Elixirs strengths.

I don't assume, that we will get into top ten with single process though.

Ilyes notifications@github.com schrieb am Sa., 26. März 2016 um 10:53 Uhr:

I have only took a glance at the solution, but it reminded me of a
exercism excersice. You could browse the solution to come up with some
idea. Here is mine:
http://exercism.io/submissions/a06a51b9c4f54177a29401d0df74b959


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#44 (comment)

The Dict module is now deprecated in favour of the Map module.
We are using pattern-matching for the annonymous functions, which
makes the code more readable and yields a small performance improvement.
@crbelaus
Copy link
Contributor Author

@NobbZ I've upgraded the pull request with your suggestions (thank you a big time!) . I've also started to use Map instead of Dict, since this last one is deprecated.
I've observed a small performance improvement after using pattern matching for the anonymous functions, but we are still in last place :(

I'm taking a look at the possibility of using Streams to delay the computation until it is really necessary and avoid creating intermediate lists.

@NobbZ
Copy link
Contributor

NobbZ commented Mar 26, 2016

I'm still struggling with setting everything up. After I realized that my project folder isn't shared between docker and system but cloned from git fresh, I have now your branch tested once, but running scripts/test_all.sh does result in a lot of failing tests for nearly every other language.

@crbelaus
Copy link
Contributor Author

@NobbZ for testing my pull request I've deleted the /wordcount directory in the container. Then, to share the project folder with the container, launch it like this

docker run -it -v $PWD:/wordcount wordcount bash

We now use streams to process input before creating a dictionary
to count appearances.  This allows to delay computation and reduces
the number of intermediate lists created.
@NobbZ NobbZ mentioned this pull request Mar 26, 2016
@NobbZ
Copy link
Contributor

NobbZ commented Mar 26, 2016

Have you already done a new benchmark after doing the latest changes? Any recognizable improvements?

@NobbZ
Copy link
Contributor

NobbZ commented Mar 26, 2016

Oh, and I just realized, you are doing all the work during compiletime. This is considered to be much slower. I will take a look into that issue when home.

@crbelaus
Copy link
Contributor Author

I've measured a small performance improvement, but nothing too noticeable :(

@NobbZ
Copy link
Contributor

NobbZ commented Mar 27, 2016

Okay, still the last in the list, but this is the result of my last run:

$ time cat data/huwikisource-latest-pages-meta-current.xml | elixir elixir/wordcount.ex > /dev/null

real    2m0.904s
user    1m57.025s
sys 0m1.200s

Run on a “Intel(R) Core(TM) i5-4210M CPU @ 2.60GHz” with 8 GiB of Memory.

When I started I was at roughly 2m30s user-time. I will extract a patch after I got some sleep (~8 h).

1. Removed `Stream.reject/2` call and moved it's functionality into the
   `Enum.reduce/3` (doing a pattern match).
2. Removed explicit conversion of the map to a list. Maps are already
   proper `Enum`s and can be fed into `Enum.sort/2`.
3. Preparing an IO-list and feeding this into `IO.puts/1`. This way
   does safe us up some function calls (less interference by the
   scheduler, also it reduces the amount of buffer flushes.
@NobbZ
Copy link
Contributor

NobbZ commented Mar 27, 2016

OK, I already wrote it, but it seems as if I forgot to submit... And I totally screwed my local stuff, I'll try something else…

@NobbZ
Copy link
Contributor

NobbZ commented Mar 27, 2016

OK, I created a PR to @belaustegui's repo: belaustegui/wordcount#1, after he has merged over there, it should appear here.

Improved performance of elixir solution a bit
@crbelaus
Copy link
Contributor Author

This last patch has made a great performance improvement. 30 seconds faster!!!!
Thank you very much @NobbZ

Rank Experiment CPU seconds User time Maximum memory
1 rust/wordcount/wordcount 4.1 4.04 331772
2 cpp/wc_vector 5.97 5.8 248148
3 python/wordcount_py2gabor.py 6.84 6.74 247692
4 go/bin/wordcount 7.8 7.62 370864
5 php7.0 php/wordcount.php 10.26 7.94 267588
6 cpp/wc_hash_nosync 10.88 8.3 331728
7 scala -J-Xmx2g -classpath scala Wordcount 11.68 18.0 755312
8 python/wordcount_py2.py 12.39 11.37 562364
9 cpp/wc_baseline_hash 14.93 12.48 348076
10 java -classpath java WordCount 16.0 15.0 839192
11 java -classpath java WordCountEntries 16.4 14.99 836576
12 mono csharp/WordCountList.exe 18.06 12.82 339608
13 python/wordcount_py3.py 21.66 21.4 489732
14 cpp/wc_baseline 21.68 19.23 360916
15 perl/wordcount.pl 21.86 21.65 452288
16 php5.6 php/wordcount.php 23.95 21.41 791572
17 nodejs javascript/wordcount.js 25.51 25.4 402320
18 julia julia/wordcount.jl 30.56 29.11 515452
19 nodejs javascript/wordcount2.js 34.05 30.62 376292
20 haskell/WordCount 37.07 36.76 1059820
21 bash/wordcount.sh 60.78 67.72 13044
22 elixir elixir/wordcount.ex 121.66 120.83 1259244

@juditacs
Copy link
Owner

Please see this issue: #50

I'm looking forward to your contribution, let me know if it's ready to be merged.

@NobbZ
Copy link
Contributor

NobbZ commented Mar 27, 2016

I'm still fiddling around locally, but currently I can't report any further improvements. The very last thing hat comes to my mind is a rewrite of the way we currently read the input, but I can't start that today, at the end it's Easter and some family time. So I think it can get merged for now after @belaustegui rebased.

Using `IO.puts` caused an empty line to be added at the end of the
output, which broke the tests.
@crbelaus
Copy link
Contributor Author

I think that it is ready to be merged.
Thank you both!!!

@@ -4,6 +4,10 @@ cd cpp
g++ wordcount_baseline.cpp -std=c++11 -o wordcount_baseline -O3
g++ wordcount.cpp -std=c++11 -o wordcount -O3

cd ../elixir
# Elixir has to run the script to compile it (http://stackoverflow.com/questions/35722248/the-command-elixirc-is-compiling-and-executing-the-code)
echo "wadus" | elixir elixir/wordcount.ex > /dev/null
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, in fact its not. because there is no binary or whatever created.

The way it is used now, is very bad meta-programming voodoo and does not need any precompilation.

So these 3 lines can be removed for now.

But while I am experimenting locally I already transfered this unidiomatic use of the meta programming capabilities into a proper mix-project which also does create a proper executable (this doesn't give any perf. improvements at all!). But it is much more clean.

But I will only continue that version of mine, when I this got merged and I figured out how to manually controll IO-Streams in a more efficient way ;) StdLib does a good job in keeping manual flushing away from me ;)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove it before merge-ing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's easier if @belaustegui removes it himself, creating another PR on his fork or providing a patch-file here might produce much of unnecessary noise. Also he seems still to actively follow this issue.

@juditacs
Copy link
Owner

Tests pass, will merge soon.

I know you'll hate me because of this, but I added a helper file for the evaluator script: https://github.com/juditacs/wordcount#adding-a-new-program

This way the contributors are listed for each file (git blame needs to know the source file).

Could you please add a line for elixir?

@NobbZ
Copy link
Contributor

NobbZ commented Mar 27, 2016

Since there is currently no binary created, is that line still needed?

@juditacs
Copy link
Owner

Good idea, I'll fix it. No, don't add anything. Can I merge?

@crbelaus
Copy link
Contributor Author

Of course. Thanks!

@juditacs
Copy link
Owner

I'll merge this and we can remove those lines later.

@juditacs juditacs merged commit 7ad19d7 into juditacs:master Mar 27, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants