Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dlang solution #16

Open
wants to merge 7 commits into
base: fastest-implementations-print-or-count
Choose a base branch
from

Conversation

renatoathaydes
Copy link
Owner

Adding a D solution to the print-or-count version of the problem.

@cyrusmsk
Copy link

Can you test this solution please?

import std.stdio;
import std.outbuffer;
import std.ascii;
import std.algorithm;
import std.range;
import std.array;
import std.container.array;
import std.conv;

alias Dictionary = string[][ubyte[]];

ubyte char_to_digit(char ch)
{
	final switch (ch.toLower)
	{
	case 'e':
		return 0;
	case 'j', 'n', 'q':
		return 1;
	case 'r', 'w', 'x':
		return 2;
	case 'd', 's', 'y':
		return 3;
	case 'f', 't':
		return 4;
	case 'a', 'm':
		return 5;
	case 'c', 'i', 'v':
		return 6;
	case 'b', 'k', 'u':
		return 7;
	case 'l', 'o', 'p':
		return 8;
	case 'g', 'h', 'z':
		return 9;
	}
}

immutable(ubyte[]) word_to_number(string word)
{
	auto res = word
		.filter!(a => a.isAlpha)
		.map!(a => char_to_digit(cast(char)a))
		.map!(a => cast(ubyte)(a + '0'))
		.array;
	return cast(immutable(ubyte[])) res;
}

Dictionary load_dict(string words_file)
{
	Dictionary dict;
	foreach (word; File(words_file).byLineCopy)
	{
		auto key = word_to_number(word);
		dict[key] ~= word;
	}
	return dict;
}

bool lastItemIsDigit(Array!(string) words)
{
	if (words.empty)
		return false;
	const back = words.back();
	return back.length == 1 && back[0].isDigit;
}

void find_translation(
	string res_opt,
	ref size_t counter,
	char[] num,
	immutable(ubyte[]) digits,
	Array!(string) words,
	Dictionary dict,
	OutBuffer writer)
{
	if (digits.length == 0)
	{
		handle_solution(res_opt, counter, num, words, writer);
		return;
	}
	bool found_word = false;
	foreach (i; 0 .. digits.length)
	{
		auto key = digits[0 .. i + 1];
		auto rest_of_digits = digits[i + 1 .. $];
		auto found_words = key in dict;
		if (found_words !is null)
		{
			found_word = true;
			foreach (word; *found_words)
			{
				words.insertBack(word);
				find_translation(res_opt, counter, num, rest_of_digits, words, dict, writer);
				words.removeBack();
			}
		}
	}
	if (found_word)
		return;
	auto last_is_digit = words.lastItemIsDigit;
	if (!last_is_digit)
	{
		auto digit = [(digits[0] - '0').to!string];
		words.insertBack(digit);
		find_translation(res_opt, counter, num, digits[1 .. $], words, dict, writer);
		words.removeBack();
	}
}

void handle_solution(string res_opt, ref size_t counter, char[] num, Array!(string) words, OutBuffer writer)
{
	if (res_opt == "count")
	{
		counter++;
		return;
	}
	writer.write(num);
	if (words.empty)
	{
		writeln(writer.toString());
		writer.clear();
		return;
	}
	foreach (word; words)
	{
		writer.writef(" %s", word);
	}
	writeln(writer.toString());
	writer.clear();
}

void main(string[] args)
{
	auto res_opt = args.length > 1 ? args[1] : "print";
	auto words_file = args.length > 2 ? args[2] : "tests/words.txt";
	auto input_file = args.length > 3 ? args[3] : "tests/numbers.txt";

	size_t counter;
	auto dict = load_dict(words_file);

	auto writer = new OutBuffer();
	writer.reserve(4096);

	Array!(string) words;
	words.reserve(128);

	foreach( num; File(input_file).byLine)
	{
		immutable(ubyte[]) digits = num
			.filter!(a => a.isDigit)
			.map!(a => cast(ubyte) a)
			.array;
		find_translation(res_opt, counter, num, digits, words, dict, writer);
	}
	if (res_opt == "count")
		writeln(counter);
}

@renatoathaydes
Copy link
Owner Author

Hi @cyrusmsk . There is some error in your solution as it does not pass the tests in ./benchmark.sh.
Pls try to run the benchmark script, modifying it as necessary to only run your solution perhaps? To test it manually, I normally run it on the root of the project without arguments, which will print a few solutions. Compare that against the output of my current solution or the Java/CommonLisp ones. If that is not the same (order does not matter) you have a problem. If it's still the same, it could be some edge case is only being picked up by the benchmark test.

@renatoathaydes
Copy link
Owner Author

It took me embarrassingly long to notice why it was wrong: the solutions were missing the semi-colon between the number and solutions :D.

I've run your solutions but had to abort it as it was taking too long... here's the first few results:

./dencoder-cyrusmsk,232050688,250
./dencoder-cyrusmsk,232050688,3779
./dencoder-cyrusmsk,232050688,21931
./dencoder-cyrusmsk,232050688,47906

I am profiling it right now to see why it's slow, but I am guessing your code has a few tiny things that should be easy to fix - like allocating a string for each replacement digit, using a string instead of enum for res_opt, and flushing the buffer for every solution...

@renatoathaydes
Copy link
Owner Author

@cyrusmsk

Callgrind for your solution after 1min running (using the count option):

--------------------------------------------------------------------------------
10,550,536,299 (100.0%)  PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir                      file:function
--------------------------------------------------------------------------------
5,847,563,560 (55.42%)  ???:_d_isbaseof2 [/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-cyrusmsk]
1,284,129,198 (12.17%)  ???:_d_dynamic_cast [/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-cyrusmsk]
  369,510,423 ( 3.50%)  ???:rt.aaA.Impl.findSlotLookup(ulong, scope const(void*), scope const(TypeInfo)) inout [/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-cyrusmsk]
  360,651,828 ( 3.42%)  ???:dencoder.find_translation(immutable(char)[], ref ulong, char[], immutable(ubyte[]), std.container.array.Array!(immutable(char)[]).Array, immutable(char)[][][const(ubyte)[]], std.outbuffer.OutBuffer)'2 [/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-cyrusmsk]
  341,701,856 ( 3.24%)  ./string/../sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S:__memcmp_avx2_movbe [/usr/lib/x86_64-linux-gnu/libc.so.6]
  260,570,661 ( 2.47%)  ???:core.internal.hash.bytesHash!(false).bytesHash(scope const(ubyte)[], ulong) [/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-cyrusmsk]
  237,178,288 ( 2.25%)  ???:_aaInX [/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-cyrusmsk]
  222,865,398 ( 2.11%)  ???:_D6object10getElementFNaNbNeNkMNgC8TypeInfoZNgQn [/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-cyrusmsk]
  212,252,760 ( 2.01%)  ???:object.getArrayHash(scope const(TypeInfo), scope const(void*), const(ulong)).hasCustomToHash(scope const(TypeInfo)) [/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-cyrusmsk]

Nearly all the time is spent on d_isbaseof2 and some cast?! Not sure why... comparing the bytes' hashes also takes considerable time but that's expected. Trying to improve on it. Let me know if you want to try something.

FYI: the solution as posted (plus printing the missing :) is around 5x slower than the fastest D solution so far:

./dencoder-gdc count words-quarter.txt phones_200.txt   5.69s  user 0.01s system 100% cpu 5.702 total
avg shared (code):         0 KB
avg unshared (data/stack): 0 KB
total (sum):               0 KB
max memory:                11 MB
page faults from disk:     0
other page faults:         2485
(gdc-4.8.5)➜  prechelt-phone-number-encoding git:(dlang-int128) ✗ time ./dencoder-cyrusmsk count words-quarter.txt phones_200.txt 
12739646
./dencoder-cyrusmsk count words-quarter.txt phones_200.txt   27.98s  user 0.01s system 99% cpu 28.021 total
avg shared (code):         0 KB
avg unshared (data/stack): 0 KB
total (sum):               0 KB
max memory:                11 MB
page faults from disk:     0
other page faults:         2483

I am compiling with GDC now because the code runs a bit faster than with LDC.

@cyrusmsk
Copy link

cyrusmsk commented Jan 15, 2024

Yes, I forgot to add small improvement.

import std.stdio;
import std.outbuffer;
import std.ascii;
import std.algorithm;
import std.range;
import std.array;
import std.container.array;
import std.conv;

alias Dictionary = string[][ubyte[]];

ubyte char_to_digit(char ch)
{
	final switch (ch.toLower)
	{
	case 'e':
		return 0;
	case 'j', 'n', 'q':
		return 1;
	case 'r', 'w', 'x':
		return 2;
	case 'd', 's', 'y':
		return 3;
	case 'f', 't':
		return 4;
	case 'a', 'm':
		return 5;
	case 'c', 'i', 'v':
		return 6;
	case 'b', 'k', 'u':
		return 7;
	case 'l', 'o', 'p':
		return 8;
	case 'g', 'h', 'z':
		return 9;
	}
}

immutable(ubyte[]) word_to_number(string word)
{
	auto res = word
		.filter!(a => a.isAlpha)
		.map!(a => char_to_digit(cast(char)a))
		.map!(a => cast(ubyte)(a + '0'))
		.array;
	return cast(immutable(ubyte[])) res;
}

Dictionary load_dict(string words_file)
{
	Dictionary dict;
	foreach (word; File(words_file).byLineCopy)
	{
		auto key = word_to_number(word);
		dict[key] ~= word;
	}
	return dict;
}

bool lastItemIsDigit(Array!(string) words)
{
	if (words.empty)
		return false;
	const back = words.back();
	return back.length == 1 && back[0].isDigit;
}

void find_translation(
	string res_opt,
	ref size_t counter,
	char[] num,
	immutable(ubyte[]) digits,
	ref Array!(string) words,
	ref Dictionary dict,
	ref OutBuffer writer)
{
	if (digits.length == 0)
	{
		handle_solution(res_opt, counter, num, words, writer);
		return;
	}
	bool found_word = false;
	foreach (i; 0 .. digits.length)
	{
		auto key = digits[0 .. i + 1];
		auto rest_of_digits = digits[i + 1 .. $];
		auto found_words = key in dict;
		if (found_words !is null)
		{
			found_word = true;
			foreach (word; *found_words)
			{
				words.insertBack(word);
				find_translation(res_opt, counter, num, rest_of_digits, words, dict, writer);
				words.removeBack();
			}
		}
	}
	if (found_word)
		return;
	auto last_is_digit = words.lastItemIsDigit;
	if (!last_is_digit)
	{
		auto digit = [(digits[0] - '0').to!string];
		words.insertBack(digit);
		find_translation(res_opt, counter, num, digits[1 .. $], words, dict, writer);
		words.removeBack();
	}
}

void handle_solution(string res_opt, ref size_t counter, char[] num, ref Array!(string) words, ref OutBuffer writer)
{
	if (res_opt == "count")
	{
		counter++;
		return;
	}
	writer.writef("%s:",num);
	if (words.empty)
	{
		writeln(writer.toString());
		writer.clear();
		return;
	}
	foreach (word; words)
	{
		writer.writef(" %s", word);
	}
	writeln(writer.toString());
	writer.clear();
}

void main(string[] args)
{
	auto res_opt = args.length > 1 ? args[1] : "print";
	auto words_file = args.length > 2 ? args[2] : "tests/words.txt";
	auto input_file = args.length > 3 ? args[3] : "tests/numbers.txt";

	size_t counter;
	auto dict = load_dict(words_file);

	auto writer = new OutBuffer();

	Array!(string) words;

	foreach( num; File(input_file).byLine)
	{
		immutable(ubyte[]) digits = num
			.filter!(a => a.isDigit)
			.map!(a => cast(ubyte) a)
			.array;
		find_translation(res_opt, counter, num, digits, words, dict, writer);
	}
	if (res_opt == "count")
		writeln(counter);
}

This code use : after number. The main difference with the previous version:

  • In function I've added "ref" to heavy data structures, to let compiler use data more efficiently

I also tried to change Array!string to string[] and I've got the same speed. If previous version was about 20s, so this one should be around 11 on my machine

@cyrusmsk
Copy link

cyrusmsk commented Jan 15, 2024

word_to_number is using a lot of casts.. which is unnecessary. Probably it is possible to rewrite this function to not use any of it. The issue was that isAlpha producing dchar instead of char and the AA requires immutable key. But probably it is possible to simplify this function.
Also .array parts allocating memory. And in theory it is possible to modify the algorithm that it will based on Ranges instead of allocated arrays.

@renatoathaydes
Copy link
Owner Author

Because the code was spending a lot of time on hashing, I tried to make sure the hash was pre-computed by using a Key struct for the keys.
Ended up with this modification of your code:

import std.stdio;
import std.outbuffer;
import std.ascii;
import std.algorithm;
import std.range;
import std.array;
import std.container.array;
import std.conv;

struct Key {
    const ubyte[] value;
    const ulong hash;
    
    this(const ubyte[] value) {
        this.value = value;
        this.hash = hashOf(value);
    }
    
    bool opEquals()(auto ref const Key other) const {
        return this.value == other.value;
    }
    
    ulong toHash() nothrow @safe {
        return hash;
    }
}

alias Dictionary = string[][immutable(Key)];

ubyte char_to_digit(char ch)
{
	final switch (ch.toLower)
	{
	case 'e':
		return 0;
	case 'j', 'n', 'q':
		return 1;
	case 'r', 'w', 'x':
		return 2;
	case 'd', 's', 'y':
		return 3;
	case 'f', 't':
		return 4;
	case 'a', 'm':
		return 5;
	case 'c', 'i', 'v':
		return 6;
	case 'b', 'k', 'u':
		return 7;
	case 'l', 'o', 'p':
		return 8;
	case 'g', 'h', 'z':
		return 9;
	}
}

immutable(Key) word_to_number(string word)
{
	ubyte[] res = word
		.filter!(a => a.isAlpha)
		.map!(a => char_to_digit(cast(char)a))
		.map!(a => cast(ubyte)(a + '0'))
		.array;
	return cast(immutable(Key)) Key(res);
}

Dictionary load_dict(string words_file)
{
	Dictionary dict;
	foreach (word; File(words_file).byLineCopy)
	{
		dict[word_to_number(word)] ~= word;
	}
	return dict;
}

bool lastItemIsDigit(Array!(string) words)
{
	if (words.empty)
		return false;
	const back = words.back();
	return back.length == 1 && back[0].isDigit;
}

void find_translation(
	OutputOption res_opt,
	ref size_t counter,
	char[] num,
	immutable(ubyte[]) digits,
	Array!(string) words,
	Dictionary dict,
	OutBuffer writer)
{
	if (digits.length == 0)
	{
		handle_solution(res_opt, counter, num, words, writer);
		return;
	}
	bool found_word = false;
	foreach (i; 0 .. digits.length)
	{
		auto key = digits[0 .. i + 1];
		auto k = Key(key);
		auto found_words = (cast(immutable(Key))k) in dict;
		if (found_words !is null)
		{
			found_word = true;
			auto rest_of_digits = digits[i + 1 .. $];
			foreach (word; *found_words)
			{
				words.insertBack(word);
				find_translation(res_opt, counter, num, rest_of_digits, words, dict, writer);
				words.removeBack();
			}
		}
	}
	if (found_word)
		return;
	auto last_is_digit = words.lastItemIsDigit;
	if (!last_is_digit)
	{
		auto digit = [(digits[0] - '0').to!string];
		words.insertBack(digit);
		find_translation(res_opt, counter, num, digits[1 .. $], words, dict, writer);
		words.removeBack();
	}
}

enum OutputOption { PRINT, COUNT  }

void handle_solution(OutputOption res_opt, ref size_t counter, char[] num, Array!(string) words, OutBuffer writer)
{
	if (res_opt == OutputOption.COUNT)
	{
		counter++;
		return;
	}
	writer.write(num);
	writer.write(':');
	if (words.empty)
	{
		writeln(writer.toString());
		writer.clear();
		return;
	}
	foreach (word; words)
	{
		writer.writef(" %s", word);
	}
	writeln(writer.toString());
	writer.clear();
}

void main(string[] args)
{
	auto res_opt = args.length > 1 ? args[1] : "print";
	auto words_file = args.length > 2 ? args[2] : "tests/words.txt";
	auto input_file = args.length > 3 ? args[3] : "tests/numbers.txt";
	
	OutputOption opt;
	final switch(res_opt) {
        case "print": opt = OutputOption.PRINT; break;
        case "count": opt = OutputOption.COUNT; break;
	};

	size_t counter;
	auto dict = load_dict(words_file);

	auto writer = new OutBuffer();
	writer.reserve(4096);

	Array!(string) words;
	words.reserve(128);

	foreach( num; File(input_file).byLine)
	{
		immutable(ubyte[]) digits = num
			.filter!(a => a.isDigit)
			.map!(a => cast(ubyte) a)
			.array;
		find_translation(opt, counter, num, digits, words, dict, writer);
	}
	if (opt == OutputOption.COUNT)
		writeln(counter);
}

This made the code a little bit slower :(.

The profiler still blames the same two functions:

5,385,824,251 (51.52%)  ???:_d_isbaseof2 [/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder]
1,183,221,768 (11.32%)  ???:_d_dynamic_cast [/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder]

But I don't know where these are coming from... any idea?

word_to_number is using a lot of casts.. which is unnecessary.

Doesn't matter because that is only used in loading the dictionary... Nearly all of the time is spent on the find_translation function, not loading the dictionary.

Your current solution (named src/d/encoder) VS fastest D solution (dencoder-gdc):

Proc,Run,Memory(bytes),Time(ms)
===> src/d/dencoder
src/d/dencoder,232058880,239
src/d/dencoder,232058880,3829
src/d/dencoder,232058880,17320
===> ./dencoder-gdc
./dencoder-gdc,221339648,66
./dencoder-gdc,232153088,1021
./dencoder-gdc,232153088,5008

@cyrusmsk
Copy link

Can you add refs to find_translation and handle_solution arguments please?

@renatoathaydes
Copy link
Owner Author

@cyrusmsk unless you can make your solution run 5x faster, I'm sorry but it's not fast enough.

@renatoathaydes
Copy link
Owner Author

I've run the current fastest D solution against Rust again, this time building D with release-nobounds as there's a lot of bounds checking going on...

It's a bit better, as using GDC also made it a bit better... but overall, nowhere near Rust:

Proc,Run,Memory(bytes),Time(ms)
===> ./rust
./rust,25178112,51
./rust,25178112,276
./rust,25178112,1034
./rust,25178112,2126
./rust,11067392,4512
./rust,11067392,33590
===> src/d/dencoder
src/d/dencoder,221179904,77
src/d/dencoder,232042496,857
src/d/dencoder,232042496,3914
src/d/dencoder,232042496,7904
src/d/dencoder,221179904,15470
src/d/dencoder,221179904,99989

My conclusion is that for this task, D is at the same level as Java and Common Lisp, though it probably beats both by a small margin, but around 3 or 4 times slower than Rust.

@cyrusmsk
Copy link

cyrusmsk commented Jan 15, 2024

Maybe. Or algorithm is just should be changed for D with GC.
I saw there are in the repo algos implemented in Go.. maybe D could do better with that version of algorithm.

I'm sure D could be as fast as Rust (at least in betterC mode) - the question is just in implementation of the algorithm.

@renatoathaydes
Copy link
Owner Author

I'm sure D could be as fast as Rust (at least in betterC mode) - the questions is just in implementation of the algorithm.

The Rust and D algorithms are as close as they can get. Please show me if you can make it even closer as I can't see how.

I do agree that rewriting the D code in betterC would possibly make it reach Rust speed, but I am not going to do it because that would require a complete change in how the solution is written... this problem requires associative arrays and growable arrays, both of which are difficult to do in betterC, as well as either a BigInt impl or int128 (not sure if int128 would work in betterC?), and most of D wouldn't help (would need to use C libs probably - defeating the purpose of using D).

@ssvb
Copy link

ssvb commented Jan 15, 2024

You can also try https://gist.github.com/ssvb/147e7ca4a4fde729fe99e5f1c487aa8d and https://gist.github.com/ssvb/38a14d3d7323ecfaf580e5482f657068

The latter variant supports extreme types of input, such as generated by the following Ruby script:

#!/usr/bin/env ruby

evildict_gen = Enumerator.new do |e|
  def evil_helper(e, prefix, todo)
    ["c", "i", "v", "C", "I", "V"].each do |letter|
      if todo > 0
        evil_helper(e, prefix + letter, todo - 1)
      else
        e.yield prefix + letter
      end
    end
  end
  0.upto(50) {|n| evil_helper(e, "", n) }
end

File.write("evildict.txt", evildict_gen.take(75000).sort_by(&:downcase).join("\n") + "\n")
File.write("evilinput.txt", "6" * 50 + "\n")
$ time ./phone_numbers_optimized_dp count evildict.txt evilinput.txt 
318690589760729696504368487120721446255204392100481024

real	0m2.200s
user	0m2.215s
sys	0m0.000s

@cyrusmsk
Copy link

cyrusmsk commented Jan 15, 2024

On M1 Mac I've generated on Ruby script 75000 dict and 8 '6' input. Results:
Renato's fastest code: 1.90
My Rust's port: 3.59
Siarhei's first link: 1.98
Siarhei's second link (with long, and 1[2] factor): 0.85

@renatoathaydes
Copy link
Owner Author

The long solution is not acceptable as it overflows even within the original problems' bounds.
If that were ok, Rust and Java would also get several times faster.

@cyrusmsk
Copy link

The long solution is not acceptable as it overflows even within the original problems' bounds.
If that were ok, Rust and Java would also get several times faster.

On my example with BigInt the speed is the same. But I don't have Java installed to try it on official tests

@ssvb
Copy link

ssvb commented Jan 15, 2024

The long solution is not acceptable as it overflows even within the original problems' bounds. If that were ok, Rust and Java would also get several times faster.

The Rust solution uses usize for the same counter and the Java solution uses int:

I'll modify my solution to "switch gears" from long to BigInt automatically for best speed in the default configuration. But it is expected to win even with the current BigInt implementation.

@renatoathaydes
Copy link
Owner Author

renatoathaydes commented Jan 16, 2024

@ssvb I was talking about the type of the keys, not the counter. The key must be at least 90 bits IIRC which is why I used Int128. Please run the benchmarks I've included to check your solution, running just a few examples like you're doing will give you a wrong impression of how your algorithm performs. You can remove Common Lisp but you'll need Java and Rust because both are used for generating inputs (of various characteristics - which gives you a much better view of overall performance) and the performance monitoring.

@renatoathaydes
Copy link
Owner Author

@ssvb I see that you're implementing your own hash table and doing some pretty advanced stuff to get some speed... I hope you understand that doing that goes against the spirit of this exercise, which is to write a solution that's readable and idiomatic in each language, not just that runs as fast as possible regardless. Even if you could make this code run very very fast, there's no point once you go to great lengths like that.

If you really want a very fast solution in D that's still readable, I suggest you use a Trie solution like I did in Java. With that algorithm, Java was faster than Rust (because Rust was using the hash table solution, like D in this PR), so I expect a D solution based on that will be faster than even the Java solution (though if you also write Rust using Trie, Rust will still lead by a large margin given all we've learned about performance so far).

@ssvb
Copy link

ssvb commented Jan 16, 2024

@renatoathaydes

I see that you're implementing your own hash table and doing some pretty advanced stuff to get some speed... I hope you understand that doing that goes against the spirit of this exercise

How does this go against the spirit of this exercise? The https://flownet.com/ron/papers/lisp-java/instructions.html page sets the rules:

  • Please don'y try to look up the original until you are done coding.
  • I will accept entries for one week (i.e. until midnight next Monday). This is not a super-hard deadline, but I do want to keep this from dragging on indefinitely.
  • Work as carefully as you would as a professional software engineer and deliver a correspondingly high grade program. Specifically, thoroughly comment your source code (design ideas etc.).
  • We will automatically test your program with hundreds of thousands of phone numbers and it should not make a single mistake, if possible -- in particular it must not crash. Take youself as much time as is required to ensure correctness.
  • Your program need not be robust against incorrect formats of the dictionary file or the phone number file.
  • Please use only a single source program file, not several source modules. And please work alone.

So basically the idea was that each participant spends up to one week to implement their solution. People were expected to work alone instead of doing direct line-by-line conversions of the already existing C++ or Java solutions from the earlier experiment. And they were encouraged to operate similar to how one would implement this code in a professional commercial grade software rather than keeping it as a simplistic and idiomatic lazy exercise.

which is to write a solution that's readable and idiomatic in each language, not just that runs as fast as possible regardless.

What makes you think so? I think that it's possible to achieve optimal balance between code readability and performance without dumbing it down too much.

Even if you could make this code run very very fast, there's no point once you go to great lengths like that.

Implementing slightly more sophisticated algorithms actually isn't too difficult. Why is it necessary to do a half-hearted work?

I spent around ~2 hours to get the initial simplistic solution: https://gist.github.com/ssvb/abe821b3cdba7fcb7f3c3cecde864153
Tried it with your test framework, but it wasn't compatible with your updated rules (which separated the count and print subtasks). That's why two more revisions of my solution followed. Took around 5-6 hours total (on and off over the span of 2 days). But it's still more like a preview and I have some ideas how to improve it further. Then I had plans to add detailed comments to finish it up.

If you really want a very fast solution in D that's still readable, I suggest you use a Trie solution like I did in Java. With that algorithm, Java was faster than Rust (because Rust was using the hash table solution, like D in this PR), so I expect a D solution based on that will be faster than even the Java solution

Thanks for this suggestion. But I still prefer to design my own solution based on my own judgement. The others are encouraged to try different algorithms in their favourite programming languages. Just like you did it with your optimized Trie solution for Java.

(though if you also write Rust using Trie, Rust will still lead by a large margin given all we've learned about performance so far).

I think that this is a common stereotype and we can prove it wrong :-) Rust people can also port my D code to Rust if they want to.

@ssvb
Copy link

ssvb commented Jan 16, 2024

I was talking about the type of the keys, not the counter. The key must be at least 90 bits IIRC which is why I used Int128.

Int128 is not enough to handle longer words. They can be up to 50 characters (of course unless you relaxed this rule in your setup). And there's Arbeitslosenversicherungsbeitra"ge word in the German dictionary, which is not compatible with your Int128 solution. Check the test input from #17 (comment)

@renatoathaydes
Copy link
Owner Author

@ssvb

What makes you think so? I think that it's possible to achieve optimal balance between code readability and performance without dumbing it down too much.

Your entry is ignoring D's standard data structures and just doing everything from scratch. I really hope you don't write code like that in the real world. Every entry needs to balance performance with readability and code should look like what people write in the real world, not what you write for absolute performance. Please read my blog post about the problem and the intended scope of this "challenge", or the actual Prechelt's paper if you want more details. Performance is just one of the things that were considered, but not even the main one (but understandably, most people who engage, like you, only care about showing how clever their solutions can be).

Int128 is not enough to handle longer words.

You're right, I messed up and unfortunately there is no test that picks this up (and I had forgotten about the maximum length of each solution, it's been a long time since I worked on this problem).

I am using 4 bits for each character because only 'a' .. 'z' are allowed according to the rules. Still, that means Int128 can hold only 32 characters (128 / 4). I will have to make the key longer in my latest D solution to make it acceptable (which will just make it even slower), perhaps by using two int128 (though we only need 200 4-bit words). The Common Lisp and Rust solutions were within the rules as far as I can see.

I think that this is a common stereotype and we can prove it wrong

You're only showing that D can run fast only if you're up to rolling out your own data structures in C-like D style. Now translate your solution to Rust, which can easily be done, and see how it goes.

Don't get me wrong, you're using very clever stuff in your solution, specially the way you found to not compute a full hash for all digits when adding each digit - that speeds things up immensely and if performance if of utmost importance (it isn't for this challenge), that should've been done in all solutions... But in this "D analysis" I am doing here, that's the kind of stuff I don't care at all about - all I cared about was to find out how to port the Common Lisp and Rust solutions to D, and how that would perform as I was interested in learning where D is good and where it's bad (Which I definitely did), how readable the D code looks (very readable if you stick with conventional style) and how it performs using the approximately same code (much closer to Common Lisp and Java than to Rust).

Every time you show a benchmark to people they immediately say "ah but that's not the same algorithm", or "it's comparing libraries not the language" or whatever... so when I compare languages, I try to avoid those problems by translating line by line, as close as possible, solutions in the different languages, and then try to change the code to use each language's idioms and stdlib as much as possible so it looks clean and idiomatic. That's where I stop as that's all that interests me.

So, go ahead and call my work "half-hearted work", I don't really care... but don't expect me to give a damn about how clever your D solution was.

@renatoathaydes
Copy link
Owner Author

I want to clarify something: I came back to this problem years after I actually "ran the study" just to use D on it as I got interested in D for random reasons.
I then asked on the D Forum if people could show me why the D code was so slow compared to Rust and even Common Lisp - despite seemingly using the same algorithm almost exactly.

What I didn't ask was for people to participate in the study with their own entries, sorry @ssvb if you got that impression! Hope you at least had fun spending all these hours on this.

@renatoathaydes
Copy link
Owner Author

For the record: I fixed my int128 solution by using a special Key struct and pre-computing the hashes... and also incrementally building the hashes as solutions are searched for.

So, here is my latest solution.

Here's its performance compared with Rust:

Proc,Run,Memory(bytes),Time(ms)
===> ./rust
./rust,23986176,24
./rust,24346624,118
./rust,24543232,525
./rust,24346624,1027
./rust,8011776,3830
./rust,8486912,18403
===> src/d/dencoder
src/d/dencoder,13615104,30
src/d/dencoder,24723456,374
src/d/dencoder,24592384,1750
src/d/dencoder,24788992,3472
src/d/dencoder,11517952,10235
src/d/dencoder,11517952,51211

Here's how @ssvb 's solution performs:

Proc,Run,Memory(bytes),Time(ms)
===> ./rust
./rust,23986176,24
./rust,24461312,135
./rust,24494080,514
./rust,24526848,981
./rust,8175616,3647
./rust,8011776,15404
===> src/d/dencoder
src/d/dencoder,16433152,30
src/d/dencoder,16613376,125
src/d/dencoder,16613376,468
src/d/dencoder,16613376,941
src/d/dencoder,5570560,12
src/d/dencoder,6701056,18

Not going to lie: this is freaking amazing. I am not sure how the solution works to be honest, as I can't really understand the code ... but if this is within the specifications of the problem, this is probably as fast as it gets :)!

The last two runs use the count option, but still, how can it find millions of solutions in 12ms??? I couldn't find anything wrong with the counts though, so I assume it's correctly counting all solutions as it should.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants