Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support of CryptoNight v8 ReverseWaltz based on CryptoNight V8 #234

Merged
merged 5 commits into from
Mar 4, 2019

Conversation

EDDragonWolf
Copy link
Contributor

@EDDragonWolf EDDragonWolf commented Feb 26, 2019

Added support of following tweaks of CryptoNight hashing algorithms:

  • CryptoNight Waltz - equal to CryptoNight but with 3/4 iterations of CryptoNight (variant=0, modifier=1)
  • CryptoNight v7 Waltz - equal to CryptoNight v8 but with 3/4 iterations of CryptoNight v7 (variant=1, modifier=1)
  • CryptoNight v8 Waltz - equal to CryptoNight v8 but with 3/4 iterations of CryptoNight v8 (variant=2, modifier=1)
  • CryptoNight v8 Reverse Waltz - equal to CryptoNight v8 but with 3/4 iterations of CryptoNight v8 and with reversed shuffle operation (variant=2, modifier=2)

CryptoNight v8 Reverse Waltz is planned to use as our new PoW algorithm.

Note: Hard Fork 12 scheduled on block 299200 (~2019-03-07T05:00:00+00).

Closes #208
Closes #223
Closes #224

@EDDragonWolf
Copy link
Contributor Author

@jagerman, the test which I'm added in the previous comment not satisfied me, so I added performance test here.
Our results:
image
chart
Note:
CNv8-origin - it is the current implementation of cn_slow_hash from the master, without any new code.
CNv8 ReverseWaltz #1 - implementation present in here
CNv8 ReverseWaltz #2 - implementation which implements a separated version of store operation for each variant without additional reading. Something like:

#define VARIANT2_SHUFFLE_ADD_SSE2(base_ptr, offset) \
  do if (variant >= 2) \
  { \
    __m128i chunk1 = _mm_load_si128((__m128i *)((base_ptr) + ((offset) ^ 0x10))); \
    __m128i chunk2 = _mm_load_si128((__m128i *)((base_ptr) + ((offset) ^ 0x20))); \
    __m128i chunk3 = _mm_load_si128((__m128i *)((base_ptr) + ((offset) ^ 0x30))); \
    if (modifier & CN_MODIFIER_REVERSE) { \
      _mm_store_si128((__m128i *)((base_ptr) + ((offset) ^ 0x10)), _mm_add_epi64(chunk1, _b1)); \
      _mm_store_si128((__m128i *)((base_ptr) + ((offset) ^ 0x20)), _mm_add_epi64(chunk3, _b)); \
      _mm_store_si128((__m128i *)((base_ptr) + ((offset) ^ 0x30)), _mm_add_epi64(chunk2, _a)); \
    } else { \
      _mm_store_si128((__m128i *)((base_ptr) + ((offset) ^ 0x10)), _mm_add_epi64(chunk3, _b1)); \
      _mm_store_si128((__m128i *)((base_ptr) + ((offset) ^ 0x20)), _mm_add_epi64(chunk1, _b)); \
      _mm_store_si128((__m128i *)((base_ptr) + ((offset) ^ 0x30)), _mm_add_epi64(chunk2, _a)); \
    } \
  } while (0)

There is some difference between CNv8 and CNv8-origin, and between waltz's variants, but as for me it looks like more as measurement error, than performance reduction.

@jagerman
Copy link
Contributor

jagerman commented Feb 26, 2019

@EDDragonWolf - there is another gain to be had here by getting the conditional outside of the hashing loop by making it a compile-time constant by "loop hoisting" it. This macro would work, I think:

#define VARIANT2_SHUFFLE_ADD_SSE2_REVERSE(base_ptr, offset, REVERSE_STEP) \
  do if (variant >= 2) \
  { \
    __m128i chunk1 = _mm_load_si128((__m128i *)((base_ptr) + ((offset) ^ (REVERSE_STEP ? 0x30 : 0x10)))); \
    __m128i chunk2 = _mm_load_si128((__m128i *)((base_ptr) + ((offset) ^ 0x20))); \
    __m128i chunk3 = _mm_load_si128((__m128i *)((base_ptr) + ((offset) ^ (REVERSE_STEP ? 0x30 : 0x10)))); \
    _mm_store_si128((__m128i *)((base_ptr) + ((offset) ^ 0x10)), _mm_add_epi64(chunk1, _b1)); \
    _mm_store_si128((__m128i *)((base_ptr) + ((offset) ^ 0x20)), _mm_add_epi64(chunk3, _b)); \
    _mm_store_si128((__m128i *)((base_ptr) + ((offset) ^ 0x30)), _mm_add_epi64(chunk2, _a)); \
  } while (0)

you'll need to add this REVERSE_STEP constant as a macro back through the post_aes macro as well, then add a separate loop (with a compile-time constant!) for the reverse version:

    if(useAes)
    {

        if(modifier & CN_MODIFIER_REVERSE) {
            for(i = 0; i < iters; i++)
            {
                pre_aes();
                _c = _mm_aesenc_si128(_c, _a);
                post_aes(1);
            }
        }
        else {
            for(i = 0; i < iters; i++)
            {
                pre_aes();
                _c = _mm_aesenc_si128(_c, _a);
                post_aes(0);
            }
        }
    }

It's quite possible that the compiler is already doing this optimization, but given the size of the code in the loop it's quite possible that it isn't (or that it would only do so at -O3).

@EDDragonWolf
Copy link
Contributor Author

Thanks, @jagerman. I have not thought about optimization in this way. I added changes based on your suggestion with minor fixes.

@EDDragonWolf
Copy link
Contributor Author

Also, we thought about using -O3 as the default option for compilation. However, we need to check if there are no side effects with it. I created an issue for it (#235). @jagerman, if you are interested in it and you have time, investigate it and add your suggestions about it. Thanks.

@jagerman
Copy link
Contributor

I don't like -O3 unless it shows tangible benefits -- and very often it doesn't. Because it tends to make object code much larger there are a lot of code bases that end up slower rather than faster under -O3.

@EDDragonWolf
Copy link
Contributor Author

@jagerman, please, add your comment to issue #235. any opinion is very important for us.

@EDDragonWolf EDDragonWolf merged commit fec3ac9 into master Mar 4, 2019
@EDDragonWolf EDDragonWolf deleted the feature/cryptonight_waltz_support branch March 4, 2019 12:55
@SChernykh
Copy link
Contributor

Any test pool available?

@EDDragonWolf
Copy link
Contributor Author

@SChernykh
Sorry for the delay, we had a big delay with HF on testnet, it forked only now
testnet mining pool - http://3.83.140.241/
if you need some testnet wallet addresses:
FAaegMUw5YV9GcwNGwJsyLdc1jkVRnNWcX3zEd5e1Nmci8HmGQGt3J3NUjeWi19WQi9t52mAwxHCXUSkcufmmU7CMVpjACG
FB4ZejF4V3w8qhRgxQVENyKc8aCmgP4whaUQhZuw7zwnb5rdgEKFq1G5gbGnhUCBXKPHF3bYLDqZD5e7JG7i2Wf3LwNmXDu
F8WjfGHDBqkhtSy674bz1tjaBooPnFEgvF92ooYbrCCNCagzbT8SxogS2PiW3LKuEMhGrE6V2YJP3CgLeENd53JZLExetb6

psychocrypt pushed a commit to psychocrypt/xmr-stak that referenced this pull request Mar 7, 2019
Added support of CryptoNight v8 Reverse Waltz (named cryptonight_v8_reversewaltz here) - equal to CryptoNight v8 but with 3/4 iterations of CryptoNight v8 and with reversed shuffle operation

We plan to use CryptoNight v8 Reverse Waltz as new PoW algorithm for Graft (graft-project/GraftNetwork#234).
psychocrypt added a commit to psychocrypt/xmr-stak that referenced this pull request Mar 7, 2019
rebased version of fireice-uk#2261

Added support of CryptoNight v8 Reverse Waltz (named cryptonight_v8_reversewaltz here) - equal to CryptoNight v8 but with 3/4 iterations of CryptoNight v8 and with reversed shuffle operation

We plan to use CryptoNight v8 Reverse Waltz as new PoW algorithm for Graft (graft-project/GraftNetwork#234).
psychocrypt pushed a commit to psychocrypt/xmr-stak that referenced this pull request Mar 7, 2019
rebased version of fireice-uk#2261

Added support of CryptoNight v8 Reverse Waltz (named cryptonight_v8_reversewaltz here) - equal to CryptoNight v8 but with 3/4 iterations of CryptoNight v8 and with reversed shuffle operation

We plan to use CryptoNight v8 Reverse Waltz as new PoW algorithm for Graft (graft-project/GraftNetwork#234).
Dead2 pushed a commit to Dead2/CryptoGoblin that referenced this pull request Mar 9, 2019
rebased version of #2261

Added support of CryptoNight v8 Reverse Waltz (named cryptonight_v8_reversewaltz here) - equal to CryptoNight v8 but with 3/4 iterations of CryptoNight v8 and with reversed shuffle operation

We plan to use CryptoNight v8 Reverse Waltz as new PoW algorithm for Graft (graft-project/GraftNetwork#234).
gnagel pushed a commit to gnagel/xmr-stak that referenced this pull request Mar 23, 2019
rebased version of fireice-uk#2261

Added support of CryptoNight v8 Reverse Waltz (named cryptonight_v8_reversewaltz here) - equal to CryptoNight v8 but with 3/4 iterations of CryptoNight v8 and with reversed shuffle operation

We plan to use CryptoNight v8 Reverse Waltz as new PoW algorithm for Graft (graft-project/GraftNetwork#234).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants