Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CalculateAverage_JesseVanRooy (Submission 1) #335

Merged
merged 5 commits into from
Jan 14, 2024

Conversation

Jesse-Van-Rooy
Copy link
Contributor

Check List:

  • Tests pass (./test.sh <username> shows no differences between expected and actual outputs)
  • All formatting changes by the build are committed
  • Your launch script is named calculate_average_<username>.sh (make sure to match casing of your GH user name) and is executable
  • Output matches that of calculate_average_baseline.sh
  • Execution time: 2.7s
  • Execution time of reference implementation: 2m26.472s

Hardware: AMD Ryzen 7 5700G 8C16T, 32GB RAM

Description: See how a very simplistic scalar version stacks up to manually vectorized solutions

@Jesse-Van-Rooy Jesse-Van-Rooy changed the title CalculateAverage_jessevanrooy (Submission 1) CalculateAverage_JesseVanRooy (Submission 1) Jan 11, 2024
@Jesse-Van-Rooy
Copy link
Contributor Author

My Github username contains dashes, but dashes are not allowed in .java file names


static void process(MemorySegment memorySegment, ThreadResult threadResult) {
// initialize hash table
final int[] keys = new int[MAP_SIZE];
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this resized somewhere? There can be up to 10K station names.

Copy link
Contributor Author

@Jesse-Van-Rooy Jesse-Van-Rooy Jan 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it just wasn't properly sized for the requirements.
Fixed it.
It doesn't seem to affect performance on my machine, so it should still be good to try.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder though how this passed the 10K test case there is. Could you revert it to the previous value and see why that's the case? Maybe a bug still somewhere?

Copy link
Contributor Author

@Jesse-Van-Rooy Jesse-Van-Rooy Jan 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't, I forgot to run that test ... embarrasing.

I ran some of the tests manually intelliJ because I had trouble with UTF-8 encoding of the output when diffing it with the expected result. (I'm running this on Windows).

I now fixed the UTF-8 issues by adding '-Dsun.stdout.encoding=UTF-8' to the calculate_average_.sh file (now all 11 tests succeed on my machine).
This also revealed that adding %n does not print the expected \n in the output (it prints \r\n instead of \n), so I changed that too.

In order to verify that my output matches the output of the baseline program I had to do similar changes to the baseline program & script, but did not commit those as we are probably not expected to change these files.

I do hope the -Dsun.stdout.encoding=UTF-8 does not mess with the output on other OSes, but I would not expect it to.

If this doesn't affect the outcome on other OSes at all, you could consider adding this to the run script of the baseline and changing the println in the baseline script to a print, while manually adding the \n. This would improve compatibility on Windows.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the thing, it passed that test for me, also with that old map size. Just tried it again by changing the size back to a value smaller than 10K. Can you reproduce this? Something seems odd here.

Copy link
Contributor Author

@Jesse-Van-Rooy Jesse-Van-Rooy Jan 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you probably only changed the map size.
In that case it would work because there are still multiple thread using that map, and since the 10k test has exactly 10k lines, it would never give more than 4096 lines to one thread, hence not exceeding the map size (we use a growing hashmap for the combination step, so there is no problem there either).
If you also change the VALUE_CAPACITY to the previous value, it would fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually thought about this some more:

Is it the case that the original solution (MAP_SIZE = 4096 and VALUE_CAPACITY = 512) worked for you?

Because in that case it still should have flagged a test failure on an 8 core machine (from the Readme: "Programs are run from a RAM disk (i.o. the IO overhead for loading the file from disk is not relevant), using 8 cores of the machine.").

Could it be the case that the test is being run with more cores then? Because the only way I see that the VALUE_CAPACITY = 512 solution works is if there are 20+ threads at work. And that would only happen if 'Runtime.getRuntime().availableProcessors()' returns more than 20.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes indeed. Tests run with 32 cores, evaluation with 8 then, as per the rules.

@gunnarmorling
Copy link
Owner

00:04.066, nice!

@gunnarmorling gunnarmorling merged commit 30987d7 into gunnarmorling:main Jan 14, 2024
@gunnarmorling
Copy link
Owner

@Jesse-Van-Rooy, it seems something is still wrong with hashes: when running against a 1B rows file with 10K keys, results are off. To reproduce, you can create a file via ./1brc/create_measurements3.sh 1000000000 and compare against the output of the baseline (or the current leaders to be faster).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants