New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cryptonight variant 2 #4218

Merged
merged 1 commit into from Sep 11, 2018

Conversation

@SChernykh
Contributor

SChernykh commented Aug 3, 2018

Contains two modifications to improve ASIC resistance: shuffle and integer math.

Shuffle makes use of the whole 64-byte cache line instead of 16 bytes only, making Cryptonight 4 times more demanding for memory bandwidth.

Integer math adds 64:32 bit integer division followed by 64 bit integer square root, adding large and unavoidable computational latency to the main loop.

More or less complete description of these changes and performance numbers are here: https://github.com/SChernykh/xmr-stak-cpu/blob/master/README.md

Discussion that preceded this pull request: SChernykh/xmr-stak-cpu#1

@Gingeropolous

This comment has been minimized.

Show comment
Hide comment
@Gingeropolous

Gingeropolous Aug 3, 2018

Contributor

Can you clarify whether this would brick an existing ASIC or just slow it down? I.e., if an ASIC does exist for the current PoW (Cryptonight Variant 1), would these modifications make the ASIC useless or would it just decrease performance?

Contributor

Gingeropolous commented Aug 3, 2018

Can you clarify whether this would brick an existing ASIC or just slow it down? I.e., if an ASIC does exist for the current PoW (Cryptonight Variant 1), would these modifications make the ASIC useless or would it just decrease performance?

@SChernykh

This comment has been minimized.

Show comment
Hide comment
@SChernykh

SChernykh Aug 3, 2018

Contributor

It would definitely brick existing ASICs (if there are any): they don't have integer division and square root and they access memory in 16-byte chunks, not 64-byte chunks.

Contributor

SChernykh commented Aug 3, 2018

It would definitely brick existing ASICs (if there are any): they don't have integer division and square root and they access memory in 16-byte chunks, not 64-byte chunks.

@el00ruobuob

This comment has been minimized.

Show comment
Hide comment
@el00ruobuob

el00ruobuob Aug 3, 2018

Contributor

And new ASICs will be even more expensive to build and operate.

Contributor

el00ruobuob commented Aug 3, 2018

And new ASICs will be even more expensive to build and operate.

@SChernykh

This comment has been minimized.

Show comment
Hide comment
@SChernykh

SChernykh Aug 3, 2018

Contributor

New ASICs will also be many times slower because of the nature of these changes.

Contributor

SChernykh commented Aug 3, 2018

New ASICs will also be many times slower because of the nature of these changes.

@SChernykh

This comment has been minimized.

Show comment
Hide comment
@SChernykh

SChernykh Aug 3, 2018

Contributor

Well, that depends on ASIC implementation. We don't know for sure how their internals work. But according to a lot of scientific papers about hardware implementations of division and square root - yes, looks like 16x times slower.

Contributor

SChernykh commented Aug 3, 2018

Well, that depends on ASIC implementation. We don't know for sure how their internals work. But according to a lot of scientific papers about hardware implementations of division and square root - yes, looks like 16x times slower.

Show outdated Hide outdated src/crypto/slow-hash.c Outdated
Show outdated Hide outdated src/crypto/slow-hash.c Outdated
@vtnerd

A quick pass at the code semantics.

Show outdated Hide outdated src/crypto/slow-hash.c Outdated
Show outdated Hide outdated src/crypto/slow-hash.c Outdated
Show outdated Hide outdated src/crypto/slow-hash.c Outdated
@SChernykh

This comment has been minimized.

Show comment
Hide comment
@SChernykh

SChernykh Aug 3, 2018

Contributor

Regarding concerns about using floating point to calculate the integer square root: there is only 1 place in the algorithm where a rounding error happens (the sqrt() call itself), but this rounding error is too small to cause any trouble. This code was actually tested and confirmed correct for all 48-bit integers that can possibly be an input to it.

Contributor

SChernykh commented Aug 3, 2018

Regarding concerns about using floating point to calculate the integer square root: there is only 1 place in the algorithm where a rounding error happens (the sqrt() call itself), but this rounding error is too small to cause any trouble. This code was actually tested and confirmed correct for all 48-bit integers that can possibly be an input to it.

@moneromooo-monero

This comment has been minimized.

Show comment
Hide comment
@moneromooo-monero

moneromooo-monero Aug 3, 2018

Contributor

I see the >> 16 reducing the range, so yes it seems fine. Nevermind my comment then.

Contributor

moneromooo-monero commented Aug 3, 2018

I see the >> 16 reducing the range, so yes it seems fine. Nevermind my comment then.

@vtnerd

This comment has been minimized.

Show comment
Hide comment
@vtnerd

vtnerd Aug 3, 2018

Contributor

Regarding concerns about using floating point to calculate the integer square root: there is only 1 place in the algorithm where a rounding error happens (the sqrt() call itself), but this rounding error is too small to cause any trouble. This code was actually tested and confirmed correct for all 48-bit integers that can possibly be an input to it.

The issue would be portability, which cannot be easily tested. What platforms were tested for this?

Contributor

vtnerd commented Aug 3, 2018

Regarding concerns about using floating point to calculate the integer square root: there is only 1 place in the algorithm where a rounding error happens (the sqrt() call itself), but this rounding error is too small to cause any trouble. This code was actually tested and confirmed correct for all 48-bit integers that can possibly be an input to it.

The issue would be portability, which cannot be easily tested. What platforms were tested for this?

@SChernykh

This comment has been minimized.

Show comment
Hide comment
@SChernykh

SChernykh Aug 3, 2018

Contributor

The issue would be portability, which cannot be easily tested. What platforms were tested for this?

x86 and ARM (64 bit), also OpenCL on AMD and NVIDIA GPUs - all hash the same. And I also added test vectors in this pull request, so buildbot has already tested it on Windows, Linux and MacOS (32 and 64 bit) - no discrepancies.

Portability won't be a problem here. It will run the same on any IEEE-754 compliant hardware: C standard is pretty strict when it comes to integer <-> double conversions and sqrt() implementation. So once it's confirmed correct (which it is) on one platform supporting IEEE-754, it will run the same on all platforms with IEEE-754 support.

Contributor

SChernykh commented Aug 3, 2018

The issue would be portability, which cannot be easily tested. What platforms were tested for this?

x86 and ARM (64 bit), also OpenCL on AMD and NVIDIA GPUs - all hash the same. And I also added test vectors in this pull request, so buildbot has already tested it on Windows, Linux and MacOS (32 and 64 bit) - no discrepancies.

Portability won't be a problem here. It will run the same on any IEEE-754 compliant hardware: C standard is pretty strict when it comes to integer <-> double conversions and sqrt() implementation. So once it's confirmed correct (which it is) on one platform supporting IEEE-754, it will run the same on all platforms with IEEE-754 support.

@iamsmooth

This comment has been minimized.

Show comment
Hide comment
@iamsmooth

iamsmooth Aug 3, 2018

Contributor

Have you measured relative power usage on CPUs and GPUs? The liked document above only discusses speed (H/s) not power (H/W).

That's another efficiency factor to consider along with speed. If the power usage doesn't increase much that is good result. If not then some of the hypothesized relative gain against ASICs may illusory (and perhaps suggest looking at some variant that preserves a bit more power efficiency?)

Certainly still valid in terms of 'bricking' of existing designs.

Contributor

iamsmooth commented Aug 3, 2018

Have you measured relative power usage on CPUs and GPUs? The liked document above only discusses speed (H/s) not power (H/W).

That's another efficiency factor to consider along with speed. If the power usage doesn't increase much that is good result. If not then some of the hypothesized relative gain against ASICs may illusory (and perhaps suggest looking at some variant that preserves a bit more power efficiency?)

Certainly still valid in terms of 'bricking' of existing designs.

@SChernykh

This comment has been minimized.

Show comment
Hide comment
@SChernykh

SChernykh Aug 3, 2018

Contributor

Power increase is minimal. It's a bit higher than the old Cryptonight, but nothing dramatic. I've just asked people to give me numbers: SChernykh/xmr-stak-cpu#1 (comment)

Contributor

SChernykh commented Aug 3, 2018

Power increase is minimal. It's a bit higher than the old Cryptonight, but nothing dramatic. I've just asked people to give me numbers: SChernykh/xmr-stak-cpu#1 (comment)

@SChernykh

This comment has been minimized.

Show comment
Hide comment
@SChernykh

SChernykh Aug 3, 2018

Contributor

@iamsmooth I did a quick test on my Radeon RX 560: HWINFO64 reports "GPU chip power" to be 34.2 watts for original Cryptonight at 477 H/S and 38.7 watts for Cryptonight variant 2 at 447 H/S. I can't measure power at the wall - it's usually much larger than what HWINFO64 reports, but the absolute difference in wattage should be similar.

Contributor

SChernykh commented Aug 3, 2018

@iamsmooth I did a quick test on my Radeon RX 560: HWINFO64 reports "GPU chip power" to be 34.2 watts for original Cryptonight at 477 H/S and 38.7 watts for Cryptonight variant 2 at 447 H/S. I can't measure power at the wall - it's usually much larger than what HWINFO64 reports, but the absolute difference in wattage should be similar.

@iamsmooth

This comment has been minimized.

Show comment
Hide comment
@iamsmooth

iamsmooth Aug 3, 2018

Contributor

Portability won't be a problem here. It will run the same on any IEEE-754 compliant hardware: C standard is pretty strict when it comes to integer <-> double conversions and sqrt() implementation

This requires making sure that the compiler is actually configured to perform floating point in a strictly compliant manner. That (configuration) is not entirely portable, although I guess there aren't too many compilers we are dealing with in practice. Maybe this should be noted in documentation somewhere, for reference of future ports/updates.

That said its pretty hard to believe the specific case of integer <-> double conversion would ever change. I would expect more risk on other operations.

Contributor

iamsmooth commented Aug 3, 2018

Portability won't be a problem here. It will run the same on any IEEE-754 compliant hardware: C standard is pretty strict when it comes to integer <-> double conversions and sqrt() implementation

This requires making sure that the compiler is actually configured to perform floating point in a strictly compliant manner. That (configuration) is not entirely portable, although I guess there aren't too many compilers we are dealing with in practice. Maybe this should be noted in documentation somewhere, for reference of future ports/updates.

That said its pretty hard to believe the specific case of integer <-> double conversion would ever change. I would expect more risk on other operations.

@SChernykh

This comment has been minimized.

Show comment
Hide comment
@SChernykh

SChernykh Aug 3, 2018

Contributor

This requires making sure that the compiler is actually configured to perform floating point in a strictly compliant manner.

This code only needs 48 bits of precision for integer -> double conversion and sqrt. IEEE-754 guarantees 52 bits. Even if some implementations are not fully compliant and give a bit larger error for sqrt like an error in last bit sometimes (for example, some corners are cut when rounding), it will still be good enough.

Edit: I'll add test vectors that go through one of edge cases (N^2-1, N^2, N^2+1) for sqrt in the main loop. I'll first have to make something like a vanity generator, but for test vectors :) This will help catch non-compliant implementations.

Contributor

SChernykh commented Aug 3, 2018

This requires making sure that the compiler is actually configured to perform floating point in a strictly compliant manner.

This code only needs 48 bits of precision for integer -> double conversion and sqrt. IEEE-754 guarantees 52 bits. Even if some implementations are not fully compliant and give a bit larger error for sqrt like an error in last bit sometimes (for example, some corners are cut when rounding), it will still be good enough.

Edit: I'll add test vectors that go through one of edge cases (N^2-1, N^2, N^2+1) for sqrt in the main loop. I'll first have to make something like a vanity generator, but for test vectors :) This will help catch non-compliant implementations.

@iamsmooth

This comment has been minimized.

Show comment
Hide comment
@iamsmooth

iamsmooth Aug 3, 2018

Contributor

@SChernykh I agree there are not likely to be any problems here on sane platforms. Rather I was replying on the specific point that what the C standard requires of the compiler and what the compiler actually does is not always the same thing. Even when a compiler does implement strict standard compliance (not always), it may require specific options.

On that note, I noticed this in https://en.wikipedia.org/wiki/C_data_types

The actual size and behavior of floating-point types also vary by implementation. The only guarantee is that long double is not smaller than double, which is not smaller than float

So the assumption of 48/52 bits of precision is not a given. As noted above (or maybe in the commit reviews), IEEE-754 is not strictly required (though is often the case in practice, at least with the right compiler options).

Again, a good resolution of this is to clearly note the (new) dependencies in the developer documentation, so anyone porting can be made aware of them.

Contributor

iamsmooth commented Aug 3, 2018

@SChernykh I agree there are not likely to be any problems here on sane platforms. Rather I was replying on the specific point that what the C standard requires of the compiler and what the compiler actually does is not always the same thing. Even when a compiler does implement strict standard compliance (not always), it may require specific options.

On that note, I noticed this in https://en.wikipedia.org/wiki/C_data_types

The actual size and behavior of floating-point types also vary by implementation. The only guarantee is that long double is not smaller than double, which is not smaller than float

So the assumption of 48/52 bits of precision is not a given. As noted above (or maybe in the commit reviews), IEEE-754 is not strictly required (though is often the case in practice, at least with the right compiler options).

Again, a good resolution of this is to clearly note the (new) dependencies in the developer documentation, so anyone porting can be made aware of them.

@MoneroCrusher

This comment has been minimized.

Show comment
Hide comment
@MoneroCrusher

MoneroCrusher Aug 4, 2018

So practically & effectively this makes FPGAs 4x slower, making them much worse in terms of $/hash (see Xilinx FPGA, 22 kH/s CN7 for 4-5k$) than GPUs and ASICs 16x slower still being better than GPUs in terms of production cost/hash but they'll be useless within a couple months and not break even.

So this makes GPUs & CPUs the best thing to mine with, if Monero keeps the strict 6 months fork schedule.
Is this correct?

MoneroCrusher commented Aug 4, 2018

So practically & effectively this makes FPGAs 4x slower, making them much worse in terms of $/hash (see Xilinx FPGA, 22 kH/s CN7 for 4-5k$) than GPUs and ASICs 16x slower still being better than GPUs in terms of production cost/hash but they'll be useless within a couple months and not break even.

So this makes GPUs & CPUs the best thing to mine with, if Monero keeps the strict 6 months fork schedule.
Is this correct?

@SChernykh

This comment has been minimized.

Show comment
Hide comment
@SChernykh

SChernykh Aug 4, 2018

Contributor

Yes, it's correct. At least 4 times slowdown compared to Cryptonight v1 for all kinds of ASIC/FPGA.

Contributor

SChernykh commented Aug 4, 2018

Yes, it's correct. At least 4 times slowdown compared to Cryptonight v1 for all kinds of ASIC/FPGA.

@MoneroCrusher

This comment has been minimized.

Show comment
Hide comment
@MoneroCrusher

MoneroCrusher Aug 4, 2018

@SChernykh would it be possible that you could try to estimate ASIC production cost and real production time if the community would fund you a Bitmain CN 220khs miner (or is there any good take-apart video?)? As they will still be an ordner of magnitude better both in price/hash and much better in power/hash. If Bitmain can tape one out in 1 month there would still be a big problem. Everyone in the forums states 6 months production time from tape-out. How did everyone arrive at that number? Maybe it would be good to get several opinions.
@moneromooo-monero
How is it determined & who determines the last minute changes prior to fork? It would be good to know. And how is it made sure that persons with access to that info don't get bribed by Bitmain to pass it on?

MoneroCrusher commented Aug 4, 2018

@SChernykh would it be possible that you could try to estimate ASIC production cost and real production time if the community would fund you a Bitmain CN 220khs miner (or is there any good take-apart video?)? As they will still be an ordner of magnitude better both in price/hash and much better in power/hash. If Bitmain can tape one out in 1 month there would still be a big problem. Everyone in the forums states 6 months production time from tape-out. How did everyone arrive at that number? Maybe it would be good to get several opinions.
@moneromooo-monero
How is it determined & who determines the last minute changes prior to fork? It would be good to know. And how is it made sure that persons with access to that info don't get bribed by Bitmain to pass it on?

@moneromooo-monero

This comment has been minimized.

Show comment
Hide comment
@moneromooo-monero

moneromooo-monero Aug 4, 2018

Contributor

Someone will probably come up with some small simple change, either myself, othe, smooth, vtnerd, and we'lll discuss it, and post it on github, and if it passes review, it gets merged.

Contributor

moneromooo-monero commented Aug 4, 2018

Someone will probably come up with some small simple change, either myself, othe, smooth, vtnerd, and we'lll discuss it, and post it on github, and if it passes review, it gets merged.

@SChernykh

This comment has been minimized.

Show comment
Hide comment
@SChernykh

SChernykh Aug 4, 2018

Contributor

I'm not a hardware expert, no need to send Bitmain ASIC to me. I haven't seen any teardown videos for their CN miner however.

How is it determined & who determines the last minute changes prior to fork?

@moneromooo-monero I'll also do one really small change near the fork. I had it in my plans since the beginning. It will be small, won't affect performance in any way and will also improve ASIC resistance a bit. So there will be two changes from two different sources.

Contributor

SChernykh commented Aug 4, 2018

I'm not a hardware expert, no need to send Bitmain ASIC to me. I haven't seen any teardown videos for their CN miner however.

How is it determined & who determines the last minute changes prior to fork?

@moneromooo-monero I'll also do one really small change near the fork. I had it in my plans since the beginning. It will be small, won't affect performance in any way and will also improve ASIC resistance a bit. So there will be two changes from two different sources.

@MoneroCrusher

This comment has been minimized.

Show comment
Hide comment
@MoneroCrusher

MoneroCrusher Aug 4, 2018

@SChernykh @moneromooo-monero okay fair enough. It should be noted that those infos probably have dozens of millions of value to Bitmain and that they'll try anything to get it (personal speculation) and it would be good if we had a way of preventing them getting the info earlier than when the general public does a few days before fork.
As the current method implies trust in a few people (not that I don't trust you, but that's not the idea).
Maybe actively propagate changes up until 1-2 days before fork from all community members and then randomly choose one based on a pre-defined monero block hash number (last number or letter corresponding to 1 version of a proposed change). I'm sure there are better ways though.

Thanks for your efforts and taking decentralization so serious, unlike other chains...! I'll be sure to point all my GPUs & CPUs to Monero, as always :-)

MoneroCrusher commented Aug 4, 2018

@SChernykh @moneromooo-monero okay fair enough. It should be noted that those infos probably have dozens of millions of value to Bitmain and that they'll try anything to get it (personal speculation) and it would be good if we had a way of preventing them getting the info earlier than when the general public does a few days before fork.
As the current method implies trust in a few people (not that I don't trust you, but that's not the idea).
Maybe actively propagate changes up until 1-2 days before fork from all community members and then randomly choose one based on a pre-defined monero block hash number (last number or letter corresponding to 1 version of a proposed change). I'm sure there are better ways though.

Thanks for your efforts and taking decentralization so serious, unlike other chains...! I'll be sure to point all my GPUs & CPUs to Monero, as always :-)

@SChernykh

This comment has been minimized.

Show comment
Hide comment
@SChernykh

SChernykh Aug 4, 2018

Contributor

Two days before the fork is too little. Variant 2 should be finalized 2 weeks before the fork, together will all major miner software pull requests. Everyone should have enough time to upgrade.

Contributor

SChernykh commented Aug 4, 2018

Two days before the fork is too little. Variant 2 should be finalized 2 weeks before the fork, together will all major miner software pull requests. Everyone should have enough time to upgrade.

@MoneroCrusher

This comment has been minimized.

Show comment
Hide comment
@MoneroCrusher

MoneroCrusher Aug 4, 2018

True that's the other side. So ASIC manufacturer's will also have an additional 2 weeks.
Wondering if there's a way to find out how long it took Bitmain & Baikal for their old CNv0 ASICs from planning to end product.

Edit:
I did some calculations with my own imginatory assumptions.
10'000 ASICs like their old X3 miner would result in 220 MH/s. Or about 45% of our current network. 10'000 ASICs is not a lot for a multi billion dollar company like Bitmain.
If we assume 1 ASIC costs $500 in production (including R&D) (they sold for low $k and S9 used to cost several $k and look at their prices now, I think they're still selling them for a profit, they only accomodate their prices to crypto income, not actual production cost).
So if the $500 assumption is true, then that's a $5M investment, which is nothing for Bitmain. That $5M investment would yield 45% of XMR (daily emission is 3024 XMR) therefore 1360,8 XMR go to Bitmain, or $163'296 in daily proceeds. They would consume 4.65MW of power, assuming a price of 3c per kW/h would mean daily costs of $3'348 and a net profit of $159'948. That would mean a break-even time of 31.26 days. With the new algo it's 125 days if they use external memory, 500.16 days if on-chip memory (or they'll just cram more on there if they have cheap access to it, as I also don't know what type of memory they use).

Personally I think it would make sense to add an official "surprise POW tweak fork" once or twice a year to add further uncertainty to their economic models. Date would be determined by a pre-defnied block height block hash + time needed from devs to make a small tweak.

MoneroCrusher commented Aug 4, 2018

True that's the other side. So ASIC manufacturer's will also have an additional 2 weeks.
Wondering if there's a way to find out how long it took Bitmain & Baikal for their old CNv0 ASICs from planning to end product.

Edit:
I did some calculations with my own imginatory assumptions.
10'000 ASICs like their old X3 miner would result in 220 MH/s. Or about 45% of our current network. 10'000 ASICs is not a lot for a multi billion dollar company like Bitmain.
If we assume 1 ASIC costs $500 in production (including R&D) (they sold for low $k and S9 used to cost several $k and look at their prices now, I think they're still selling them for a profit, they only accomodate their prices to crypto income, not actual production cost).
So if the $500 assumption is true, then that's a $5M investment, which is nothing for Bitmain. That $5M investment would yield 45% of XMR (daily emission is 3024 XMR) therefore 1360,8 XMR go to Bitmain, or $163'296 in daily proceeds. They would consume 4.65MW of power, assuming a price of 3c per kW/h would mean daily costs of $3'348 and a net profit of $159'948. That would mean a break-even time of 31.26 days. With the new algo it's 125 days if they use external memory, 500.16 days if on-chip memory (or they'll just cram more on there if they have cheap access to it, as I also don't know what type of memory they use).

Personally I think it would make sense to add an official "surprise POW tweak fork" once or twice a year to add further uncertainty to their economic models. Date would be determined by a pre-defnied block height block hash + time needed from devs to make a small tweak.

@MoneroCrusher

This comment has been minimized.

Show comment
Hide comment
@MoneroCrusher

MoneroCrusher Aug 4, 2018

@SChernykh
I have another question: Does it make sense, hardware-wise, for an ASIC manufacturer to implement your code in a prototype ASIC and then wait for the 2 tweaks 2 weeks before fork so they would only have to do small adjustments to their ASIC before HF and therefore they would be ready to start production within the first month of the HF?
Just trying to view this from every angle a greedy company not caring about Blockchain would.

MoneroCrusher commented Aug 4, 2018

@SChernykh
I have another question: Does it make sense, hardware-wise, for an ASIC manufacturer to implement your code in a prototype ASIC and then wait for the 2 tweaks 2 weeks before fork so they would only have to do small adjustments to their ASIC before HF and therefore they would be ready to start production within the first month of the HF?
Just trying to view this from every angle a greedy company not caring about Blockchain would.

@SChernykh

This comment has been minimized.

Show comment
Hide comment
@SChernykh

SChernykh Aug 4, 2018

Contributor

These changes are big from hardware point of view, they'll require a completely new design. I think they won't even be ready with a proper design before the fork - they'll have to spend a lot of time optimizing div+sqrt logic for low latency before starting mass production.

P.S. And I still think that even optimized low latency logic will not be fast enough to ROI in 6 months.

Contributor

SChernykh commented Aug 4, 2018

These changes are big from hardware point of view, they'll require a completely new design. I think they won't even be ready with a proper design before the fork - they'll have to spend a lot of time optimizing div+sqrt logic for low latency before starting mass production.

P.S. And I still think that even optimized low latency logic will not be fast enough to ROI in 6 months.

@philipma1957

This comment has been minimized.

Show comment
Hide comment
@philipma1957

philipma1957 Aug 4, 2018

This is exciting news as I mine Monero7 or as some call it MoneroV1
I use mostly ryzen 1800 2700 and some thread rippers with a few rx560's

I will post this thread on my bitcointalk thread.

philipma1957 commented Aug 4, 2018

This is exciting news as I mine Monero7 or as some call it MoneroV1
I use mostly ryzen 1800 2700 and some thread rippers with a few rx560's

I will post this thread on my bitcointalk thread.

@SChernykh

This comment has been minimized.

Show comment
Hide comment
@SChernykh

SChernykh Aug 5, 2018

Contributor

Monero7 or as some call it MoneroV1

The mining algorithm is called Cryptonight variant 1 or CryptonightV1 or CNv1. "7" is current protocol version number, not the algorithm name. Many people including whattomine and many other sites confused them since the beginning. New algorithm will be called Cryptonight variant 2 or CryptonightV2 or CNv2.

Contributor

SChernykh commented Aug 5, 2018

Monero7 or as some call it MoneroV1

The mining algorithm is called Cryptonight variant 1 or CryptonightV1 or CNv1. "7" is current protocol version number, not the algorithm name. Many people including whattomine and many other sites confused them since the beginning. New algorithm will be called Cryptonight variant 2 or CryptonightV2 or CNv2.

@SChernykh

This comment has been minimized.

Show comment
Hide comment
@SChernykh

SChernykh Sep 30, 2018

Contributor

the square root code is perfectly correct, and matches your integer square root perfectly.

Just LOL. There is literally an error in every line in your code. Did you even try to compile it and run on sample inputs? You did not.

Contributor

SChernykh commented Sep 30, 2018

the square root code is perfectly correct, and matches your integer square root perfectly.

Just LOL. There is literally an error in every line in your code. Did you even try to compile it and run on sample inputs? You did not.

@philtimmes

This comment has been minimized.

Show comment
Hide comment
@philtimmes

philtimmes Sep 30, 2018

I gave simplified code... It is not targetting the sample inputs, just showing simpler ways (and less computational) to do exactly what you are attempting in the proposed patch. Add to that the fact that I have not enclosed my code in code markup, should tell you clearly, it is example code... Would you like me to submit a patch that works?

philtimmes commented Sep 30, 2018

I gave simplified code... It is not targetting the sample inputs, just showing simpler ways (and less computational) to do exactly what you are attempting in the proposed patch. Add to that the fact that I have not enclosed my code in code markup, should tell you clearly, it is example code... Would you like me to submit a patch that works?

@SChernykh

This comment has been minimized.

Show comment
Hide comment
@SChernykh

SChernykh Sep 30, 2018

Contributor

So are you trying to make faster implementation? Whatever, calling "pow" in C++ is not faster than calling "sqrt". And calling "sqrt" is slower than calling SSE intrinsic for sqrt.

Contributor

SChernykh commented Sep 30, 2018

So are you trying to make faster implementation? Whatever, calling "pow" in C++ is not faster than calling "sqrt". And calling "sqrt" is slower than calling SSE intrinsic for sqrt.

@philtimmes

This comment has been minimized.

Show comment
Hide comment
@philtimmes

philtimmes Sep 30, 2018

But one uses more cachelines than the other. the difference in runtime is trivial, but the impact on performance of code following is substantial. And not all miners will have the sqrt SSE intrinsic now, will they?

philtimmes commented Sep 30, 2018

But one uses more cachelines than the other. the difference in runtime is trivial, but the impact on performance of code following is substantial. And not all miners will have the sqrt SSE intrinsic now, will they?

@SChernykh

This comment has been minimized.

Show comment
Hide comment
@SChernykh

SChernykh Sep 30, 2018

Contributor

@philtimmes You're not skilled/competent enough for productive discussion, it seems. Do more research, check the actual miner code: https://github.com/SChernykh/xmr-stak-cpu - my version
https://github.com/xmrig/xmrig/tree/dev - latest xmrig with variant 2 support

Try to make this code even 0.1% faster and if you succeed, then we can talk.

P.S. All 64-bit x86 CPUs have sqrt SSE intrinsic.

Contributor

SChernykh commented Sep 30, 2018

@philtimmes You're not skilled/competent enough for productive discussion, it seems. Do more research, check the actual miner code: https://github.com/SChernykh/xmr-stak-cpu - my version
https://github.com/xmrig/xmrig/tree/dev - latest xmrig with variant 2 support

Try to make this code even 0.1% faster and if you succeed, then we can talk.

P.S. All 64-bit x86 CPUs have sqrt SSE intrinsic.

@philtimmes

This comment has been minimized.

Show comment
Hide comment
@philtimmes

philtimmes Sep 30, 2018

Skilled / competent enough? You say that with 0 indication of who I am... or even checking what I wrote with more varacity than a copy pasta... I proposed a patch, and that was ignored...
Again, I submit that I would be more than happy to submit a patch (for portable / non-SIMD bound side) along with before and after results.

philtimmes commented Sep 30, 2018

Skilled / competent enough? You say that with 0 indication of who I am... or even checking what I wrote with more varacity than a copy pasta... I proposed a patch, and that was ignored...
Again, I submit that I would be more than happy to submit a patch (for portable / non-SIMD bound side) along with before and after results.

@SChernykh

This comment has been minimized.

Show comment
Hide comment
@SChernykh

SChernykh Sep 30, 2018

Contributor

@philtimmes The only thing you proposed so far was slow sqrt implementation with a bug in every line. You're right, I have 0 indication of who you are, so until I see some actual working code from you, it will be like this. Don't think I'm grumpy, but I had a lot of time wasted recently in such conversations.

Contributor

SChernykh commented Sep 30, 2018

@philtimmes The only thing you proposed so far was slow sqrt implementation with a bug in every line. You're right, I have 0 indication of who you are, so until I see some actual working code from you, it will be like this. Don't think I'm grumpy, but I had a lot of time wasted recently in such conversations.

@philtimmes

This comment has been minimized.

Show comment
Hide comment
@philtimmes

philtimmes Sep 30, 2018

Yessir...
I will submit a patch shortly.

philtimmes commented Sep 30, 2018

Yessir...
I will submit a patch shortly.

@plavirudar

This comment has been minimized.

Show comment
Hide comment
@plavirudar

plavirudar Sep 30, 2018

I gave simplified code... It is not targetting the sample inputs, just showing simpler ways (and less computational) to do exactly what you are attempting in the proposed patch. Add to that the fact that I have not enclosed my code in code markup, should tell you clearly, it is example code... Would you like me to submit a patch that works?

Under what set of circumstances would you submit a patch that does not work (and be expected to be taken seriously)?

plavirudar commented Sep 30, 2018

I gave simplified code... It is not targetting the sample inputs, just showing simpler ways (and less computational) to do exactly what you are attempting in the proposed patch. Add to that the fact that I have not enclosed my code in code markup, should tell you clearly, it is example code... Would you like me to submit a patch that works?

Under what set of circumstances would you submit a patch that does not work (and be expected to be taken seriously)?

@philtimmes

This comment has been minimized.

Show comment
Hide comment
@philtimmes

philtimmes Sep 30, 2018

@plavirudar While I admit there are 2 ways to read what I wrote, I would assume you could see both of them.

philtimmes commented Sep 30, 2018

@plavirudar While I admit there are 2 ways to read what I wrote, I would assume you could see both of them.

@SChernykh

This comment has been minimized.

Show comment
Hide comment
@SChernykh

SChernykh Sep 30, 2018

Contributor

Portable version needs improvement actually. Let's wait for what @philtimmes comes up with.

Contributor

SChernykh commented Sep 30, 2018

Portable version needs improvement actually. Let's wait for what @philtimmes comes up with.

@notgiven688

This comment has been minimized.

Show comment
Hide comment
@notgiven688

notgiven688 Oct 1, 2018

@SChernykh An optimization for a portable code would be nice for webassembly, since there is no SIMD available at the moment. By the way:

Converting to double before doing the division
const uint64_t division = (uint64_t)((double)dividend / (double)divisor); did improve the overall speed of the hash function by about 1-2% in my test cases - the problem is that it does not yield the correct hash in all cases. Someone (maybe myself) needs to check if this is a possible route for a portable optimization.

notgiven688 commented Oct 1, 2018

@SChernykh An optimization for a portable code would be nice for webassembly, since there is no SIMD available at the moment. By the way:

Converting to double before doing the division
const uint64_t division = (uint64_t)((double)dividend / (double)divisor); did improve the overall speed of the hash function by about 1-2% in my test cases - the problem is that it does not yield the correct hash in all cases. Someone (maybe myself) needs to check if this is a possible route for a portable optimization.

@miki-bgd-011

This comment has been minimized.

Show comment
Hide comment
@miki-bgd-011

miki-bgd-011 commented Oct 3, 2018

@SChernykh Please update the original cpuminer at https://github.com/hyc/cpuminer-multi

@SChernykh

This comment has been minimized.

Show comment
Hide comment
@SChernykh

SChernykh Oct 3, 2018

Contributor

@miki-bgd-011 This repository looks abandoned. xmrig and xmr-stak already have optimized CPU versions, I feel it would be a waste of time to add support there.

Contributor

SChernykh commented Oct 3, 2018

@miki-bgd-011 This repository looks abandoned. xmrig and xmr-stak already have optimized CPU versions, I feel it would be a waste of time to add support there.

@miki-bgd-011

This comment has been minimized.

Show comment
Hide comment
@miki-bgd-011

miki-bgd-011 commented Oct 3, 2018

:( ok

@SChernykh

This comment has been minimized.

Show comment
Hide comment
@SChernykh

SChernykh Oct 3, 2018

Contributor

I mean, I can add C code, but it will be significantly slower than what xmrig and xmr-stak have. Adding assembler versions will be a lot more work and not worth it.

Contributor

SChernykh commented Oct 3, 2018

I mean, I can add C code, but it will be significantly slower than what xmrig and xmr-stak have. Adding assembler versions will be a lot more work and not worth it.

@miki-bgd-011

This comment has been minimized.

Show comment
Hide comment
@miki-bgd-011

miki-bgd-011 Oct 3, 2018

I understand and agree.

miki-bgd-011 commented Oct 3, 2018

I understand and agree.

@hyc

This comment has been minimized.

Show comment
Hide comment
@hyc

hyc Oct 3, 2018

Contributor

@SChernykh Fwiw - no, not abandoned, I use this code still. Some people still like small C projects with few external dependencies...

Contributor

hyc commented Oct 3, 2018

@SChernykh Fwiw - no, not abandoned, I use this code still. Some people still like small C projects with few external dependencies...

@SChernykh

This comment has been minimized.

Show comment
Hide comment
@SChernykh

SChernykh Oct 3, 2018

Contributor

@hyc Ok, I'll add support later this week, but don't expect high performance from this.

Contributor

SChernykh commented Oct 3, 2018

@hyc Ok, I'll add support later this week, but don't expect high performance from this.

@miki-bgd-011

This comment has been minimized.

Show comment
Hide comment
@miki-bgd-011

miki-bgd-011 commented Oct 3, 2018

YAY!!

@SChernykh

This comment has been minimized.

Show comment
Hide comment
@SChernykh
Contributor

SChernykh commented Oct 5, 2018

@miki-bgd-011

This comment has been minimized.

Show comment
Hide comment
@miki-bgd-011

miki-bgd-011 Oct 5, 2018

@SChernykh Thank you, works just fine on http://killallasics.moneroworld.com/.

@hyc I've noticed the miner wastes cpu when connection to the pool is not established. Would be nice to fix that, to start mining only when connection to the pool is successfully established :)

miki-bgd-011 commented Oct 5, 2018

@SChernykh Thank you, works just fine on http://killallasics.moneroworld.com/.

@hyc I've noticed the miner wastes cpu when connection to the pool is not established. Would be nice to fix that, to start mining only when connection to the pool is successfully established :)

brandonlehmann added a commit to brandonlehmann/node8-multi-hashing that referenced this pull request Oct 15, 2018

Cryptonight variant 2 support + tests (zone117x#64)
Reference code: monero-project/monero#4218

(cherry picked from commit f8d6b6b)

Cryptonight variant 2 - final version (zone117x#66)

Reference code: monero-project/monero#4404

(cherry picked from commit c130750)

Added additional tests

brandonlehmann added a commit to brandonlehmann/node8-multi-hashing that referenced this pull request Oct 15, 2018

Cryptonight variant 2 support + tests (zone117x#64)
Reference code: monero-project/monero#4218

(cherry picked from commit f8d6b6b)

Cryptonight variant 2 - final version (zone117x#66)

Reference code: monero-project/monero#4404

(cherry picked from commit c130750)

Added additional tests

brandonlehmann added a commit to brandonlehmann/node8-multi-hashing that referenced this pull request Oct 15, 2018

Cryptonight variant 2 support + tests (zone117x#64)
Reference code: monero-project/monero#4218

(cherry picked from commit f8d6b6b)

Cryptonight variant 2 - final version (zone117x#66)

Reference code: monero-project/monero#4404

(cherry picked from commit c130750)

Added additional tests
@madscientist159

This comment has been minimized.

Show comment
Hide comment
@madscientist159

madscientist159 Oct 15, 2018

I know it's a bit late to change this, but it looks like the new algorithm is going to lock the network onto closed / locked Intel and AMD CPUs (ME/PSP concerns + secure boot etc.). The new algorithm has (inadvertently?) inserted a rather nasty FPU performance microbench that is knocking non-x86 CPUs, including owner controllable ones, out of consideration (half the hashrate in many cases).

My main concern is that we've exchanged one kind of ASIC (mining ASIC) for another (Windows / x86 locked CPUs). I am not convinced the network can be properly secured when only one CPU architecture, controlled by a fairly hostile duopoly, is economical for mining.

madscientist159 commented Oct 15, 2018

I know it's a bit late to change this, but it looks like the new algorithm is going to lock the network onto closed / locked Intel and AMD CPUs (ME/PSP concerns + secure boot etc.). The new algorithm has (inadvertently?) inserted a rather nasty FPU performance microbench that is knocking non-x86 CPUs, including owner controllable ones, out of consideration (half the hashrate in many cases).

My main concern is that we've exchanged one kind of ASIC (mining ASIC) for another (Windows / x86 locked CPUs). I am not convinced the network can be properly secured when only one CPU architecture, controlled by a fairly hostile duopoly, is economical for mining.

@kio3i0j9024vkoenio

This comment has been minimized.

Show comment
Hide comment
@kio3i0j9024vkoenio

kio3i0j9024vkoenio Oct 15, 2018

I know it's a bit late to change this, but it looks like the new algorithm is going to lock the network onto closed / locked Intel and AMD CPUs (ME/PSP concerns + secure boot etc.). The new algorithm has (inadvertently?) inserted a rather nasty FPU performance microbench that is knocking non-x86 CPUs, including owner controllable ones, out of consideration (half the hashrate in many cases).

My main concern is that we've exchanged one kind of ASIC (mining ASIC) for another (Windows / x86 locked CPUs). I am not convinced the network can be properly secured when only one CPU architecture, controlled by a fairly hostile duopoly, is economical for mining.

You sound like an unhappy ASIC owner.

kio3i0j9024vkoenio commented Oct 15, 2018

I know it's a bit late to change this, but it looks like the new algorithm is going to lock the network onto closed / locked Intel and AMD CPUs (ME/PSP concerns + secure boot etc.). The new algorithm has (inadvertently?) inserted a rather nasty FPU performance microbench that is knocking non-x86 CPUs, including owner controllable ones, out of consideration (half the hashrate in many cases).

My main concern is that we've exchanged one kind of ASIC (mining ASIC) for another (Windows / x86 locked CPUs). I am not convinced the network can be properly secured when only one CPU architecture, controlled by a fairly hostile duopoly, is economical for mining.

You sound like an unhappy ASIC owner.

@madscientist159

This comment has been minimized.

Show comment
Hide comment
@madscientist159

madscientist159 Oct 15, 2018

I know it's a bit late to change this, but it looks like the new algorithm is going to lock the network onto closed / locked Intel and AMD CPUs (ME/PSP concerns + secure boot etc.). The new algorithm has (inadvertently?) inserted a rather nasty FPU performance microbench that is knocking non-x86 CPUs, including owner controllable ones, out of consideration (half the hashrate in many cases).
My main concern is that we've exchanged one kind of ASIC (mining ASIC) for another (Windows / x86 locked CPUs). I am not convinced the network can be properly secured when only one CPU architecture, controlled by a fairly hostile duopoly, is economical for mining.

You sound like an unhappy ASIC owner.

Nope, no ASIC here 😄 Small fleet of POWER9 machines that are far more secure than your x86 AMD/Intel controlled stuff though!

madscientist159 commented Oct 15, 2018

I know it's a bit late to change this, but it looks like the new algorithm is going to lock the network onto closed / locked Intel and AMD CPUs (ME/PSP concerns + secure boot etc.). The new algorithm has (inadvertently?) inserted a rather nasty FPU performance microbench that is knocking non-x86 CPUs, including owner controllable ones, out of consideration (half the hashrate in many cases).
My main concern is that we've exchanged one kind of ASIC (mining ASIC) for another (Windows / x86 locked CPUs). I am not convinced the network can be properly secured when only one CPU architecture, controlled by a fairly hostile duopoly, is economical for mining.

You sound like an unhappy ASIC owner.

Nope, no ASIC here 😄 Small fleet of POWER9 machines that are far more secure than your x86 AMD/Intel controlled stuff though!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment