Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create hashes using string builder instead of big integers #22

Closed
KilianB opened this issue Dec 31, 2018 · 5 comments
Closed

Create hashes using string builder instead of big integers #22

KilianB opened this issue Dec 31, 2018 · 5 comments
Labels
enhancement New feature or request
Milestone

Comments

@KilianB
Copy link
Owner

KilianB commented Dec 31, 2018

During hash creation a bigInteger object will be created for each bit present. Currently no fancy operation take place during it's creation process that justify this overhead.
Afterwards the xor operation on arbitrary hashes is required for the hamming distance calculation.

Benchmark if it's more performant to use a stringbuilder.

BigInteger hash = BigInteger.ZERO;
for (int x = 0; x < width; x++) {
	for (int y = 0; y < height; y++) {
		if (pixelValue[x][y] < compareAgainst) {
			hash = hash.shiftLeft(1);
		} else {
			hash = hash.shiftLeft(1).add(BigInteger.ONE);
		}
	}
}
return hash;

vs new

StringBuilder sb = new StringBuilder(hashResolution);
for (int x = 0; x < width; x++) {
	for (int y = 0; y < height; y++) {
		if (pixelValue[x][y] < compareAgainst) {
			sb.append(0);
		} else {
			sb.append(1);
		}
	}
}
return new BigInteger(sb.toString(),2);

In cooperation with #18

@KilianB KilianB added the enhancement New feature or request label Dec 31, 2018
@KilianB KilianB added this to the 3.0.0 milestone Dec 31, 2018
@KilianB
Copy link
Owner Author

KilianB commented Dec 31, 2018

for smaller than 64 bit hashes we could also fall back to long variables and do primitive bit shifting.

@KilianB
Copy link
Owner Author

KilianB commented Jan 6, 2019

jmh benchmarks.
Big Integer is the current implementation. Performance decreases with increased hash length

BenchmarkModeCntScoreErrorUnits
kilianB.HashCreationBenchmark.benchmarkAverageHash32BigIntegerthrpt25152626,545± 2741,163ops/s
kilianB.HashCreationBenchmark.benchmarkAverageHash32BigIntegerStringBuilderthrpt25158690,519± 2111,147ops/s
kilianB.HashCreationBenchmark.benchmarkAverageHash32MutableBigIntegerthrpt25157546,217± 2026,677ops/s
kilianB.HashCreationBenchmark.benchmarkAverageHash128BigIntegerthrpt25104932,391± 1387,338ops/s
kilianB.HashCreationBenchmark.benchmarkAverageHash128BigIntegerStringBuilderthrpt25117392,216± 1035,051ops/s
kilianB.HashCreationBenchmark.benchmarkAverageHash128MutableBigIntegerthrpt25116409,836± 2436,276ops/s
kilianB.HashCreationBenchmark.benchmarkAverageHash5000BigIntegerthrpt251387,469± 6,534ops/s
kilianB.HashCreationBenchmark.benchmarkAverageHash5000BigIntegerStringBuilderthrpt257690,280± 96,870 ops/s
kilianB.HashCreationBenchmark.benchmarkAverageHash5000MutableBigIntegerthrpt257715,949± 49,220ops/s

Mutable big integer requires method handles and reflection access which is not warranted for the performance difference compare to the stringbuilder approach (as well as unit tests to guarantee that the implementation is working as expected).

A new StringBuilder is created for every hash creation. Caching a stringbuilder would result again in a performance gain at the cost of thread safety. In relation file IO is much more heavy and the treadoff isn't worth it.

Migration to either stringbuilder or mutable big int allows to cut the first pass needed to estimate the hash length on an algorithm basis

@KilianB
Copy link
Owner Author

KilianB commented Jan 7, 2019

fixed with afc026d

@KilianB KilianB closed this as completed Jan 7, 2019
@KilianB KilianB reopened this Jan 8, 2019
@KilianB
Copy link
Owner Author

KilianB commented Jan 9, 2019

Using a custom hash builder which manipulates a byte array directly we can gain even more performance d9ef1f8
Now even the 5000 bit hash runs with 13.000 hashes created / second up from 1.3 k

@KilianB
Copy link
Owner Author

KilianB commented Jan 16, 2019

fixed with v 3.0.0

@KilianB KilianB closed this as completed Jan 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant