Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove tx forwarding #22

Closed
wants to merge 31 commits into from

Conversation

rldleblanc
Copy link

This is the basics for removing the transaction code from the share chain. It is built on my previous work for the cashaddr/segwit mining addresses and both will be activated with share version 34. I still have some fixes on the web interface side for the cashaddr/segwit updates. If you want to only see the TX removal fixes, compare the remove_tx_forwarding branch against my p2pool_fixes branch. This is for issue #19 .

Robert LeBlanc and others added 12 commits September 20, 2018 15:02
Enables support for bech32 and p2sh addresses.
Bech32 support is defined by HUMAN_READABLE_PART and P2sh support is
defined by SEGWIT_ADDRESS_VERSION.

Cherry-picked from d2abd263b62043427e1691d49558cb0781c6193d
Fixed up for Bitcoin and Litecoin.
-Robert LeBlanc <robert@leblancnet.us>
the pubkey hash doesn't have enough information to accurately
reconstruct the address.

This isn't complete, but at least isn't completely failing. Still need
to run test for the different coins and make sure that upgrading works
fine.
@jtoomim
Copy link
Owner

jtoomim commented Dec 28, 2018

git fetch origin; git rebase origin/1mb_segwit

You'll need to manually resolve the merge conflict in bitcoincash_testnet.py.

@jtoomim
Copy link
Owner

jtoomim commented Dec 28, 2018

I don't see any differences here between p2pool-fixes. Perhaps you forgot to git push?

Robert LeBlanc and others added 13 commits December 28, 2018 10:39
Enables support for bech32 and p2sh addresses.
Bech32 support is defined by HUMAN_READABLE_PART and P2sh support is
defined by SEGWIT_ADDRESS_VERSION.

Cherry-picked from d2abd263b62043427e1691d49558cb0781c6193d
Fixed up for Bitcoin and Litecoin.
-Robert LeBlanc <robert@leblancnet.us>
the pubkey hash doesn't have enough information to accurately
reconstruct the address.

This isn't complete, but at least isn't completely failing. Still need
to run test for the different coins and make sure that upgrading works
fine.
@rldleblanc
Copy link
Author

I pushed, but I forgot to commit! Then I moved branches to rebase everything and had to go back to the snapshot I took overnight. It should all be there now rldleblanc@25bb251 is the isolated changes for removing the TXs. I had two nodes run all night with a miner on each and there were no problem. I'm sure I'm missing something (the code is quiet ... complex), so another set of eyes would be helpful.

@jtoomim
Copy link
Owner

jtoomim commented Jan 10, 2019

I told Jason Cox (@jasonbcox maybe?) that p2pool needs to be able to reserve the coinbase size with each GBT request, and I don't think he believed me. He tried to say that it could be done as a command-line option instead of a RPC argument. Grumble grumble.

IIRC, the tx unhexlifying/hashing code is mostly C library calls anyway, so it probably won't be much different in PyPy. When I added caching of txids to helper.py:getwork(...) (1161221#diff-8fb6114f1943345da603ca32f0fcd78eR81), I noticed that CPython was only about 2x slower than PyPy.

If it's taking 60 ms for a 1 MB block, that would be about 1.8 sec for a 32 MB block on a cold cache, or 240 ms on a warm cache. That might be tolerable, especially if PyPy cuts that down another 50%, as long as there aren't many other places in the code that are O(n) or O(n log n) vs block size. Above 32 MB will probably require more improvements.

The first $2k of the bounty has been sent as a milestone payment.

@jtoomim
Copy link
Owner

jtoomim commented Jan 10, 2019

@kr1z1s I replied to that email last night. tl;dr: it's just an issue with how hashrate is reported when multiple nodes are mining to the same address. It's only a UI issue. That user does actually have 22 PH/s of hashrate among all of his nodes.

@rldleblanc
Copy link
Author

The thing about passing the coinbase size on each request is that as the number of payouts change, we can get as full a block as possible instead of having to reserve the max size, just in case. When (I expect to get to this) we change the payout scheme (more like ckpool), we will have a smaller coinbase and more room for transactions. If they leave the default for like 4 TXs, I'm sure most people won't bother with it (I think P2pool and ckpool is the only one with many payout addresses).

Unfortunately, I could not get away from the O(n) for calculate_merkle_link, but I got rid of all the lambda expressions for the super small execution (lambda is not the most efficient). Pypy may get some improvement, but I'm not sure how much. I expect to get about the same improvement with merkle_hash. I do know that calculate_merkle_link is called twice back to back, and then many of the times after it is either the same number of TXs or a slight increase. I'm hoping the new TXs are only tacked on the end so that we get a lot of cache hits. So the improvement will be significant overall in that regard. I don't know about merkle_hash, I've only identified it as an area of improvement, I hope to have a better idea tomorrow. I'm not sure about the third area that I identified, I'm not expecting huge improvements, but I haven't looked in great detail yet.

Thanks for the bounty!

@jtoomim
Copy link
Owner

jtoomim commented Jan 10, 2019

I've got a lot of thoughts on payout schemes. I'll put some of them into separate issues when I get a chance. Quick version of one idea: I'd like to add an option for people to "buy" the right to a p2pool payout from miners. If you are owed 1 BCH per block for the next 1.6 blocks (expected value), you might be willing to sell me those possible future payouts for 1.58 BCH (e.g. a 2% fee, PPS-equivalent). This would allow the elimination of block variance from p2pool.

calculate_merkle_link is necessarily O(n log n) as long as the input is a list of transactions. Can't avoid that. But that's the only place that really needs to be n log n. Parsing the GBT results can be O(n), and everything else can be O(log n) or better. But just because it can be O(log n) doesn't mean that the current code works that way.

I'm hoping the new TXs are only tacked on the end so that we get a lot of cache hits.

That definitely does not happen with CTOR. It also does not usually happen with non-BCH coins. Usually (non-BCH), transactions are roughly sorted by fee/kB, not by arrival time.

However, if two transactions were inserted prior to the node you're calculating, then the bottom level of the merkle tree will still be in your cache; if displaced by 4 transactions, then the bottom two levels will be in your cache. So if you calculate the merkle tree, then insert 1 transaction, recalculate the merkle tree, then insert 1 more transaction, and recalculate, 49% of the third recalculation will be cached, 49.9% of the 4th, and 74.9% of the 5th, 6th, 7th, 8th, and 87.4% of the 9th, etc.

By the way, using a python dictionary to memoize some_slow_function(large_input) works pretty well. Python's default non-cryptographic hash function is pretty fast, and the hashtable implementation is also pretty fast, especially relative to serialization, hex decoding, and SHA256ing. This is probably what you're doing, but if it's not, feel free to try it.

Something else that ought to be done: p2pool's transaction objects need to either keep a cached copy of the serialized format and length, or they need to never be unpacked at all. (I can't think of any reason why p2pool would need to do this (https://github.com/jtoomim/p2pool/blob/1mb_segwit/p2pool/bitcoin/data.py#L123) except with the coinbase transaction, to be honest. And yet, p2pool does it with every transaction. Sigh.) If you do grep -rn bitcoin_data.tx_type . in the p2pool source tree, there are a lot of matches that look sketchy, especially since veqtrus's SegWit code seems to have borked the packed_size caching in some circumstances.

@rldleblanc
Copy link
Author

I reworked some code to just do the packing/unpacking manually and I'm getting good results without the cache now. I'm packing into a struct only once and saving that in the merkle tree and unpacking only the result, so much less time spent doing unnecessary packing/unpacking. I'm sure the whole code base could use some of that optimization, but let's start where it hurts the most.

With LRU cache (CPython):

Num TXs: 2837
Old calculate_merkle_link():         116.678 ms
New calculate_merkle_link() run #1:   21.111 ms
New calculate_merkle_link() run #2:   10.385 ms
New calculate_merkle_link() run #3:   10.561 ms

Without LRU cache (CPython):

Num TXs: 2837       
Old calculate_merkle_link():         116.379 ms
New calculate_merkle_link() run #1:   12.834 ms
New calculate_merkle_link() run #2:   13.318 ms                                                                  
New calculate_merkle_link() run #3:   13.225 ms

execution time.

Num TXs: 2837
Old calculate_merkle_link run #0:  124.855 ms.
Old calculate_merkle_link run p2pool#1:  111.246 ms.
Old calculate_merkle_link run p2pool#2:  108.213 ms.
Old calculate_merkle_link    avg:  114.771 ms.

New calculate_merkle_link run #0:   13.263 ms.
New calculate_merkle_link run p2pool#1:   12.961 ms.
New calculate_merkle_link run p2pool#2:   13.519 ms.
New calculate_merkle_link    avg:   13.248 ms.
@rldleblanc
Copy link
Author

rldleblanc commented Jan 11, 2019

If you start an isolated node on the Bitcoin network, then it starts out with Share version 17, but there are segwit TXs which needs a newer version. But it can't work up to the new version because of the assert. I just commented out all the old share versions and have it start with share version 34 for my testing.

p2pool/data.py

-class SegwitMiningShare(BaseShare):
+class Share(BaseShare):
     VERSION = 34
     VOTING_VERSION = 34
     SUCCESSOR = None

-class NewShare(BaseShare):
-    VERSION = 33
-    VOTING_VERSION = 33
-    SUCCESSOR = SegwitMiningShare
-
-class PreSegwitShare(BaseShare):
-    VERSION = 32
-    VOTING_VERSION = 32
-    SUCCESSOR = SegwitMiningShare
-
-class Share(BaseShare):
-    VERSION = 17
-    VOTING_VERSION = 17
-    SUCCESSOR = SegwitMiningShare
+#class NewShare(BaseShare):
+#    VERSION = 33
+#    VOTING_VERSION = 33
+#    SUCCESSOR = SegwitMiningShare
+#
+#class PreSegwitShare(BaseShare):
+#    VERSION = 32
+#    VOTING_VERSION = 32
+#    SUCCESSOR = SegwitMiningShare
+#
+#class Share(BaseShare):
+#    VERSION = 17
+#    VOTING_VERSION = 17
+#    SUCCESSOR = SegwitMiningShare


-share_versions = {s.VERSION:s for s in [SegwitMiningShare, NewShare, PreSegwitShare, Share]}
+share_versions = {s.VERSION:s for s in [Share]}
+#share_versions = {s.VERSION:s for s in [SegwitMiningShare, NewShare, PreSegwitShare, Share]}

@kr1z1s
Copy link

kr1z1s commented Jan 11, 2019

I know that. Of course I know about it.
I ran a node on the live network. Not isolated

@rldleblanc
Copy link
Author

Oh, I see, you are mining with a segwit address. Yes, until the shares are upgraded to v34, the segwit mining address does not work as the share does not have the right information to support it. Any shares mined will go to the pool operator address until v34 becomes active, then the shares will slowly transition over.

@kr1z1s
Copy link

kr1z1s commented Jan 11, 2019

But voting should work.
If I use your code shares must be 34 version

@rldleblanc
Copy link
Author

What coin are you mining? I'm only running this code on my Litecoin node and I only have ~1% of the hashrate, that is far from the 90% needed to transition to the new share version. I don't think jtoomim is running any of the code. So unless you have >90% of the hashrate, I wouldn't expect the share to upgrade any time soon.

@kr1z1s
Copy link

kr1z1s commented Jan 11, 2019

I tried on bitcoin

@rldleblanc
Copy link
Author

So, my code will follow the voting and with jtoomim not releasing the code yet, you may be the only one voting for v34. I think after I get these last performance issues nailed down jtoomim may be more willing to release a new version.

@rldleblanc
Copy link
Author

@jtoomim One chunk of code is packing all the transactions to figure out the length of each transaction to make sure that we aren't including too many transactions. Doesn't the transaction weight give us the same info? If we can use weight to calculate block size, then I should be able to chop off a good 50-60% of the time in generate_transaction (i.e. 241ms out of a total run of 420ms). If we can use weights, then maybe it would be time to retire the old getmemorypool code since I don't think it includes weight.

@jtoomim
Copy link
Owner

jtoomim commented Jan 11, 2019

@rldleblanc Weight and size are different. Weight is 4x of the non-witness data size plus the witness data size. Size is just the non-witness data size. Without knowing how much of a transaction is witness data, you can't convert between weight and size. The two are not interchangeable, and if you change that code, BTC p2pool will probably either start generating invalid blocks or making blocks that don't include as many transactions as possible, or both.

Calculating size for non-Segwit transactions is easy. You just take the hex-encoded string you get from the getblocktemplate RPC (the "data" field), and divide the length of that string by 2. There's no need to deserialize into a bitcoind_data.tx_type object at all if all you want is the size. P2pool currently unpacks transactions it gets from getblocktemplate, and then repacks them for nearly every place that p2pool uses those transactions. I propose not unpacking them at all unless explicitly needed. As getblocktemplate calls return the weight of the transaction too, we can just use that number, and don't need to do any unpacking for that either.

Bitcoin Core now returns both the txid and the hash in getblocktemplate results. The hash should allow us to avoid doing any SHA256 as long as it's provided. Bitcoin Unlimited provides the hash field, but not the txid. I don't know if anything fails to provide the hash field; I suspect not. All that we will need to do is hex->binary decoding of the hash (and possibly the tx itself), and merkle tree calculation. Doing this will require changing the p2pool/bitcoin/helper.py:getwork() function to return a different data format for the "transactions" key instead of unpacked_transactions, and will also require modifying end-users of that data accordingly.

I'm not saying this change has to be done before this PR is viable; I'm just saying that if we want to be getting rid of unnecessary operations on transaction data from the p2pool code, we should start at the very beginning of the processing pipeline.

@rldleblanc
Copy link
Author

@jtoomim I understand that weight and size are not interchangeable, but from what I can tell in the code, we just want to make sure we are not creating blocks too big and we can do that with weight alone, right?

From what I understand we can also determine if a TX is a segwit TX only by comparing the 'hash' and 'txid' fields from getblocktemplate (if they are the same, it is non-segwit) instead of unpacking the whole thing to check. If a coin daemon only provides one, then can we assume that segwit is not supported as in the case of BCH?

The whole packing class is pretty cool, but I can't think of a way of caching it (can't find anything solid enough to create a hash on) and we are constantly packing and unpacking even for just getting the size since there is no state in those objects. I can't think of any reason we want to look at the guts of a TX. I'd like to make P2pool dumber in this sense and rip out all that stuff that not used (to my knowledge). I think it is the only way we are going to get the performance.

I feel like I've been through most of the code enough now that I have a pretty good grasp on things. Do you want to try and clean things out good (what we talked about above and removing all the different share version code) and basically relaunch P2pool? I'd also like to get it moved to Python3 just so that we aren't stuck on a dying branch of Python. Let me know what you think, I like taking sledgehammers to code! :)

@jtoomim
Copy link
Owner

jtoomim commented Jan 11, 2019

Not all coin daemons will provide weight information in getblocktemplate results. You basically only get the "weight" on coin daemons that support segwit. On those platforms, yes, a weight check alone is sufficient.

ABC provides both "txid" and "hash" even though it does not support segwit.

Yeah, the packing code is a pretty example of recursive object-oriented inheritance code that we just don't need to use for transactions. It will still be used for share objects and the coinbase, though.

I'm definitely game for a sledgehammer/relaunch strategy. P2pool is definitely too smart by a half.

Python3 support would be fine as long as it doesn't break pypy/twisted/zope.interface dependencies. You should probably check that those will work together on pypy3 before doing any code conversion.

@rldleblanc
Copy link
Author

Not all coin daemons will provide weight information in getblocktemplate results. You basically only get the "weight" on coin daemons that support segwit. On those platforms, yes, a weight check alone is sufficient.

Well, if segwit is always providing weight, then it's still easy to calculate size if there is no weight because all transactions would be non-segwit. We can just multiply it by 4 and set our own weights so the rest of the code only deals with weight. Does that sound right?

ABC provides both "txid" and "hash" even though it does not support segwit.

In this case txid and hash are always equal, still passes the test easily.

Yeah, the packing code is a pretty example of recursive object-oriented inheritance code that we just don't need to use for transactions. It will still be used for share objects and the coinbase, though.

I was thinking about this today, do we really need it for the share objects? I was thinking of moving to JSON (as long as we are keeping things in hex), then we don't have to do all the versioning stuff when we need to add something new. Coinbase probably still needs it, or we just write a much simpler packer since it would only be needed for coinbase.

I'm definitely game for a sledgehammer/relaunch strategy. P2pool is definitely too smart by a half.

I feel it is really the only way to get the big wins we want. The P2pool code has been good, but as Forrest said, it was only a proof of concept.

Python3 support would be fine as long as it doesn't break pypy/twisted/zope.interface dependencies. You should probably check that those will work together on pypy3 before doing any code conversion.

I was thinking about moving from twisted to asyncio, but that may be just too much as most of the rewrite I think we need doesn't touch much of the twisted code. With rewriting the code, I'd prefer to chunk things up into more bite size chunks and write tests for them so I know when things are breaking. I don't see that there would be many issues with compatibility with Python3, but the devil is always in the details.

@jtoomim
Copy link
Owner

jtoomim commented Jan 11, 2019

JSON is not very space-efficient. cPickle might be a decent choice, with the only major disadvantage that it's Python-only. Protobuf could also work.

Yeah, I think asyncio is a mostly orthogonal change and shouldn't be lumped in with this.

Versioning will be needed whenever we change the rules determining share validity. I don't think JSON or protobuf or cPickle or other extensible serialization formats will save us that problem.

@rldleblanc
Copy link
Author

JSON is not very space-efficient. cPickle might be a decent choice, with the only major disadvantage that it's Python-only. Protobuf could also work.

I was thinking for peer to peer communication, but we need storage as well. If space is an issue, what about running it through lzo? It is simple and widely compatible and super fast. It would be easy enough to inspect the share files and p2p communication and only be slightly less space efficient then the binary packing (actually may be even more space efficient because we are packing a bunch of '0's). I much prefer to use something that can be easily human debugged if needed and widely compatible if it can be accomplished for only a very slight expense. Right now 3,000 BTC shares is using about 40MB on disk (the new share version), if JSON is 4x that, still doesn't seem bad.

Versioning will be needed whenever we change the rules determining share validity. I don't think JSON or protobuf or cPickle or other extensible serialization formats will save us that problem.

Yes, versioning is still needed, what it gets rid of is the nasty binary splicing stuff we are doing now when we need to add/remove something from the share. It would just be a simple matter of adding/removing from a dict. Checking is just as easy if x in share.... The versioning still enforces the rules, we just don't have to know the binary format to know which version the share is (chicken and egg problem).

@jtoomim
Copy link
Owner

jtoomim commented Jan 11, 2019

I was thinking cPickle for p2p as well as storage. Space is more of an issue with p2p than with storage. We don't want to be sending 2x as much data in the latency-critical code path if we can avoid it. Keep in mind that each node may have 10 or 20 peers. If a share object is 50 kB serialized in JSON, that means 500 kB or 1,000 kB of traffic to upload that share. On a 10 Mbps pipe, that can take about a second. Not good.

cPickle is fast, efficient, and reliable. I think it quite unlikely that we'll ever need to debug the serialization itself. cPickle supports recursive objects and all sorts of Python-specific stuff, so serializing our shares should just be a matter of doing cPickle.dumps(share).

@rldleblanc
Copy link
Author

rldleblanc commented Jan 12, 2019 via email

@jtoomim
Copy link
Owner

jtoomim commented Jan 12, 2019

Okay, maybe pickle isn't a good choice. https://www.smartfile.com/blog/python-pickle-security-problems-and-solutions/ Perhaps prototyping in JSON, then switching to Protobuf later would be the best choice.

Multiple threads with the GIL in place still can make sense. We're sensitive to latency, but generally not throughput. As long as long-running slow tasks don't block the latency-critical stuff, we should be fine. If there are any tasks where throughput might matter, they need to be done either in a subprocess or in a C module that releases the GIL. But I doubt that will be necessary.

That said, most of the longer-running things you listed are latency-sensitive. Validation can be a background process, though.

@jtoomim
Copy link
Owner

jtoomim commented Jun 18, 2019

hey @rldleblanc I'd like to get some of this stuff merged in soon. Which of these PRs do you think are ready for merging?

Note: someone recently started mining v34 shares on BCH p2pool, and I don't know if they're running #22 or #14 or something else. We may need to change to v35 just to be certain that everyone is running the same code.

ctrl-f "Desired version rates":
http://ml.toom.im:9348/static/graphs.html?Week

@rldleblanc
Copy link
Author

rldleblanc commented Jun 18, 2019 via email

@jtoomim
Copy link
Owner

jtoomim commented Jul 1, 2019

I manually merged this into jtoomim/p2pool master branch. I will be leaving the 1mb_segwit branch on v33 in case people need to downgrade for some reason.

Thanks a lot for writing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants