-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove tx forwarding #22
Conversation
Enables support for bech32 and p2sh addresses. Bech32 support is defined by HUMAN_READABLE_PART and P2sh support is defined by SEGWIT_ADDRESS_VERSION. Cherry-picked from d2abd263b62043427e1691d49558cb0781c6193d Fixed up for Bitcoin and Litecoin. -Robert LeBlanc <robert@leblancnet.us>
and Bech32 addresses.
the pubkey hash doesn't have enough information to accurately reconstruct the address. This isn't complete, but at least isn't completely failing. Still need to run test for the different coins and make sure that upgrading works fine.
You'll need to manually resolve the merge conflict in bitcoincash_testnet.py. |
I don't see any differences here between p2pool-fixes. Perhaps you forgot to git push? |
Enables support for bech32 and p2sh addresses. Bech32 support is defined by HUMAN_READABLE_PART and P2sh support is defined by SEGWIT_ADDRESS_VERSION. Cherry-picked from d2abd263b62043427e1691d49558cb0781c6193d Fixed up for Bitcoin and Litecoin. -Robert LeBlanc <robert@leblancnet.us>
and Bech32 addresses.
the pubkey hash doesn't have enough information to accurately reconstruct the address. This isn't complete, but at least isn't completely failing. Still need to run test for the different coins and make sure that upgrading works fine.
c574dbb
to
25bb251
Compare
I pushed, but I forgot to commit! Then I moved branches to rebase everything and had to go back to the snapshot I took overnight. It should all be there now rldleblanc@25bb251 is the isolated changes for removing the TXs. I had two nodes run all night with a miner on each and there were no problem. I'm sure I'm missing something (the code is quiet ... complex), so another set of eyes would be helpful. |
I told Jason Cox (@jasonbcox maybe?) that p2pool needs to be able to reserve the coinbase size with each GBT request, and I don't think he believed me. He tried to say that it could be done as a command-line option instead of a RPC argument. Grumble grumble. IIRC, the tx unhexlifying/hashing code is mostly C library calls anyway, so it probably won't be much different in PyPy. When I added caching of txids to helper.py:getwork(...) (1161221#diff-8fb6114f1943345da603ca32f0fcd78eR81), I noticed that CPython was only about 2x slower than PyPy. If it's taking 60 ms for a 1 MB block, that would be about 1.8 sec for a 32 MB block on a cold cache, or 240 ms on a warm cache. That might be tolerable, especially if PyPy cuts that down another 50%, as long as there aren't many other places in the code that are O(n) or O(n log n) vs block size. Above 32 MB will probably require more improvements. The first $2k of the bounty has been sent as a milestone payment. |
@kr1z1s I replied to that email last night. tl;dr: it's just an issue with how hashrate is reported when multiple nodes are mining to the same address. It's only a UI issue. That user does actually have 22 PH/s of hashrate among all of his nodes. |
The thing about passing the coinbase size on each request is that as the number of payouts change, we can get as full a block as possible instead of having to reserve the max size, just in case. When (I expect to get to this) we change the payout scheme (more like ckpool), we will have a smaller coinbase and more room for transactions. If they leave the default for like 4 TXs, I'm sure most people won't bother with it (I think P2pool and ckpool is the only one with many payout addresses). Unfortunately, I could not get away from the O(n) for calculate_merkle_link, but I got rid of all the lambda expressions for the super small execution (lambda is not the most efficient). Pypy may get some improvement, but I'm not sure how much. I expect to get about the same improvement with merkle_hash. I do know that calculate_merkle_link is called twice back to back, and then many of the times after it is either the same number of TXs or a slight increase. I'm hoping the new TXs are only tacked on the end so that we get a lot of cache hits. So the improvement will be significant overall in that regard. I don't know about merkle_hash, I've only identified it as an area of improvement, I hope to have a better idea tomorrow. I'm not sure about the third area that I identified, I'm not expecting huge improvements, but I haven't looked in great detail yet. Thanks for the bounty! |
I've got a lot of thoughts on payout schemes. I'll put some of them into separate issues when I get a chance. Quick version of one idea: I'd like to add an option for people to "buy" the right to a p2pool payout from miners. If you are owed 1 BCH per block for the next 1.6 blocks (expected value), you might be willing to sell me those possible future payouts for 1.58 BCH (e.g. a 2% fee, PPS-equivalent). This would allow the elimination of block variance from p2pool. calculate_merkle_link is necessarily O(n log n) as long as the input is a list of transactions. Can't avoid that. But that's the only place that really needs to be n log n. Parsing the GBT results can be O(n), and everything else can be O(log n) or better. But just because it can be O(log n) doesn't mean that the current code works that way.
That definitely does not happen with CTOR. It also does not usually happen with non-BCH coins. Usually (non-BCH), transactions are roughly sorted by fee/kB, not by arrival time. However, if two transactions were inserted prior to the node you're calculating, then the bottom level of the merkle tree will still be in your cache; if displaced by 4 transactions, then the bottom two levels will be in your cache. So if you calculate the merkle tree, then insert 1 transaction, recalculate the merkle tree, then insert 1 more transaction, and recalculate, 49% of the third recalculation will be cached, 49.9% of the 4th, and 74.9% of the 5th, 6th, 7th, 8th, and 87.4% of the 9th, etc. By the way, using a python dictionary to memoize some_slow_function(large_input) works pretty well. Python's default non-cryptographic hash function is pretty fast, and the hashtable implementation is also pretty fast, especially relative to serialization, hex decoding, and SHA256ing. This is probably what you're doing, but if it's not, feel free to try it. Something else that ought to be done: p2pool's transaction objects need to either keep a cached copy of the serialized format and length, or they need to never be unpacked at all. (I can't think of any reason why p2pool would need to do this (https://github.com/jtoomim/p2pool/blob/1mb_segwit/p2pool/bitcoin/data.py#L123) except with the coinbase transaction, to be honest. And yet, p2pool does it with every transaction. Sigh.) If you do |
I reworked some code to just do the packing/unpacking manually and I'm getting good results without the cache now. I'm packing into a struct only once and saving that in the merkle tree and unpacking only the result, so much less time spent doing unnecessary packing/unpacking. I'm sure the whole code base could use some of that optimization, but let's start where it hurts the most. With LRU cache (CPython):
Without LRU cache (CPython):
|
execution time. Num TXs: 2837 Old calculate_merkle_link run #0: 124.855 ms. Old calculate_merkle_link run p2pool#1: 111.246 ms. Old calculate_merkle_link run p2pool#2: 108.213 ms. Old calculate_merkle_link avg: 114.771 ms. New calculate_merkle_link run #0: 13.263 ms. New calculate_merkle_link run p2pool#1: 12.961 ms. New calculate_merkle_link run p2pool#2: 13.519 ms. New calculate_merkle_link avg: 13.248 ms.
If you start an isolated node on the Bitcoin network, then it starts out with Share version 17, but there are segwit TXs which needs a newer version. But it can't work up to the new version because of the assert. I just commented out all the old share versions and have it start with share version 34 for my testing. p2pool/data.py
|
I know that. Of course I know about it. |
Oh, I see, you are mining with a segwit address. Yes, until the shares are upgraded to v34, the segwit mining address does not work as the share does not have the right information to support it. Any shares mined will go to the pool operator address until v34 becomes active, then the shares will slowly transition over. |
But voting should work. |
What coin are you mining? I'm only running this code on my Litecoin node and I only have ~1% of the hashrate, that is far from the 90% needed to transition to the new share version. I don't think jtoomim is running any of the code. So unless you have >90% of the hashrate, I wouldn't expect the share to upgrade any time soon. |
I tried on bitcoin |
So, my code will follow the voting and with jtoomim not releasing the code yet, you may be the only one voting for v34. I think after I get these last performance issues nailed down jtoomim may be more willing to release a new version. |
@jtoomim One chunk of code is packing all the transactions to figure out the length of each transaction to make sure that we aren't including too many transactions. Doesn't the transaction weight give us the same info? If we can use weight to calculate block size, then I should be able to chop off a good 50-60% of the time in generate_transaction (i.e. 241ms out of a total run of 420ms). If we can use weights, then maybe it would be time to retire the old getmemorypool code since I don't think it includes weight. |
@rldleblanc Weight and size are different. Weight is 4x of the non-witness data size plus the witness data size. Size is just the non-witness data size. Without knowing how much of a transaction is witness data, you can't convert between weight and size. The two are not interchangeable, and if you change that code, BTC p2pool will probably either start generating invalid blocks or making blocks that don't include as many transactions as possible, or both. Calculating size for non-Segwit transactions is easy. You just take the hex-encoded string you get from the getblocktemplate RPC (the "data" field), and divide the length of that string by 2. There's no need to deserialize into a bitcoind_data.tx_type object at all if all you want is the size. P2pool currently unpacks transactions it gets from getblocktemplate, and then repacks them for nearly every place that p2pool uses those transactions. I propose not unpacking them at all unless explicitly needed. As getblocktemplate calls return the weight of the transaction too, we can just use that number, and don't need to do any unpacking for that either. Bitcoin Core now returns both the txid and the hash in getblocktemplate results. The hash should allow us to avoid doing any SHA256 as long as it's provided. Bitcoin Unlimited provides the hash field, but not the txid. I don't know if anything fails to provide the hash field; I suspect not. All that we will need to do is hex->binary decoding of the hash (and possibly the tx itself), and merkle tree calculation. Doing this will require changing the p2pool/bitcoin/helper.py:getwork() function to return a different data format for the "transactions" key instead of unpacked_transactions, and will also require modifying end-users of that data accordingly. I'm not saying this change has to be done before this PR is viable; I'm just saying that if we want to be getting rid of unnecessary operations on transaction data from the p2pool code, we should start at the very beginning of the processing pipeline. |
@jtoomim I understand that weight and size are not interchangeable, but from what I can tell in the code, we just want to make sure we are not creating blocks too big and we can do that with weight alone, right? From what I understand we can also determine if a TX is a segwit TX only by comparing the 'hash' and 'txid' fields from getblocktemplate (if they are the same, it is non-segwit) instead of unpacking the whole thing to check. If a coin daemon only provides one, then can we assume that segwit is not supported as in the case of BCH? The whole packing class is pretty cool, but I can't think of a way of caching it (can't find anything solid enough to create a hash on) and we are constantly packing and unpacking even for just getting the size since there is no state in those objects. I can't think of any reason we want to look at the guts of a TX. I'd like to make P2pool dumber in this sense and rip out all that stuff that not used (to my knowledge). I think it is the only way we are going to get the performance. I feel like I've been through most of the code enough now that I have a pretty good grasp on things. Do you want to try and clean things out good (what we talked about above and removing all the different share version code) and basically relaunch P2pool? I'd also like to get it moved to Python3 just so that we aren't stuck on a dying branch of Python. Let me know what you think, I like taking sledgehammers to code! :) |
Not all coin daemons will provide weight information in getblocktemplate results. You basically only get the "weight" on coin daemons that support segwit. On those platforms, yes, a weight check alone is sufficient. ABC provides both "txid" and "hash" even though it does not support segwit. Yeah, the packing code is a pretty example of recursive object-oriented inheritance code that we just don't need to use for transactions. It will still be used for share objects and the coinbase, though. I'm definitely game for a sledgehammer/relaunch strategy. P2pool is definitely too smart by a half. Python3 support would be fine as long as it doesn't break pypy/twisted/zope.interface dependencies. You should probably check that those will work together on pypy3 before doing any code conversion. |
Well, if segwit is always providing weight, then it's still easy to calculate size if there is no weight because all transactions would be non-segwit. We can just multiply it by 4 and set our own weights so the rest of the code only deals with weight. Does that sound right?
In this case txid and hash are always equal, still passes the test easily.
I was thinking about this today, do we really need it for the share objects? I was thinking of moving to JSON (as long as we are keeping things in hex), then we don't have to do all the versioning stuff when we need to add something new. Coinbase probably still needs it, or we just write a much simpler packer since it would only be needed for coinbase.
I feel it is really the only way to get the big wins we want. The P2pool code has been good, but as Forrest said, it was only a proof of concept.
I was thinking about moving from twisted to asyncio, but that may be just too much as most of the rewrite I think we need doesn't touch much of the twisted code. With rewriting the code, I'd prefer to chunk things up into more bite size chunks and write tests for them so I know when things are breaking. I don't see that there would be many issues with compatibility with Python3, but the devil is always in the details. |
JSON is not very space-efficient. cPickle might be a decent choice, with the only major disadvantage that it's Python-only. Protobuf could also work. Yeah, I think asyncio is a mostly orthogonal change and shouldn't be lumped in with this. Versioning will be needed whenever we change the rules determining share validity. I don't think JSON or protobuf or cPickle or other extensible serialization formats will save us that problem. |
I was thinking for peer to peer communication, but we need storage as well. If space is an issue, what about running it through lzo? It is simple and widely compatible and super fast. It would be easy enough to inspect the share files and p2p communication and only be slightly less space efficient then the binary packing (actually may be even more space efficient because we are packing a bunch of '0's). I much prefer to use something that can be easily human debugged if needed and widely compatible if it can be accomplished for only a very slight expense. Right now 3,000 BTC shares is using about 40MB on disk (the new share version), if JSON is 4x that, still doesn't seem bad.
Yes, versioning is still needed, what it gets rid of is the nasty binary splicing stuff we are doing now when we need to add/remove something from the share. It would just be a simple matter of adding/removing from a dict. Checking is just as easy |
I was thinking cPickle for p2p as well as storage. Space is more of an issue with p2p than with storage. We don't want to be sending 2x as much data in the latency-critical code path if we can avoid it. Keep in mind that each node may have 10 or 20 peers. If a share object is 50 kB serialized in JSON, that means 500 kB or 1,000 kB of traffic to upload that share. On a 10 Mbps pipe, that can take about a second. Not good. cPickle is fast, efficient, and reliable. I think it quite unlikely that we'll ever need to debug the serialization itself. cPickle supports recursive objects and all sorts of Python-specific stuff, so serializing our shares should just be a matter of doing |
I understand the size issue with communicating with peers. Is cPickle
stable between machines of different endiness and different versions of
Python? I know pickle had problems in the past? I'll run some tests and see
how things look and share the results.
I'd like to design things with at least two threads (real threads), where
one is dedicated to time sensitive things and the other for longer running
processes (work generation, peer communication, share validation and block
submission). Not sure how to make it all work yet so the Gil doesn't have
to be shared, but still thinking about it.
Sent from a mobile device, please excuse any typos.
…On Fri, Jan 11, 2019, 4:58 PM Jonathan Toomim ***@***.*** wrote:
I was thinking cPickle for p2p as well as storage. Space is more of an
issue with p2p than with storage. We don't want to be sending 2x as much
data in the latency-critical code path if we can avoid it. Keep in mind
that each node may have 10 or 20 peers. If a share object is 50 kB
serialized in JSON, that means 500 kB or 1,000 kB of traffic to upload that
share. On a 10 Mbps pipe, that can take about a second. Not good.
cPickle is fast, efficient, and reliable. I think it quite unlikely that
we'll ever need to debug the serialization itself. cPickle supports
recursive objects and all sorts of Python-specific stuff, so serializing
our shares should just be a matter of doing cPickle.dumps(share).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#22 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AK3p1xgBet2Czea02TQZwAxyk95CZURJks5vCSUhgaJpZM4ZjbO0>
.
|
Okay, maybe pickle isn't a good choice. https://www.smartfile.com/blog/python-pickle-security-problems-and-solutions/ Perhaps prototyping in JSON, then switching to Protobuf later would be the best choice. Multiple threads with the GIL in place still can make sense. We're sensitive to latency, but generally not throughput. As long as long-running slow tasks don't block the latency-critical stuff, we should be fine. If there are any tasks where throughput might matter, they need to be done either in a subprocess or in a C module that releases the GIL. But I doubt that will be necessary. That said, most of the longer-running things you listed are latency-sensitive. Validation can be a background process, though. |
hey @rldleblanc I'd like to get some of this stuff merged in soon. Which of these PRs do you think are ready for merging? Note: someone recently started mining v34 shares on BCH p2pool, and I don't know if they're running #22 or #14 or something else. We may need to change to v35 just to be certain that everyone is running the same code. ctrl-f "Desired version rates": |
It's probably not the Python3 code, I haven't had time to test and I
haven't heard from anyone. It's most likely my other branch that has the
updated address code, but could be the code with the share confirmation
taken out as it was that same share version. I finally found a job and
started a little over a month ago, it is requiring is to relocate and
between flying back and forth and trying to get the house ready to sell, I
haven't had any time to work on the code. The bay area is too expensive to
run my rigs so I can only do GPU mining on test net once I finally get
moved out there in another two months. The last five months have been
really crazy in my personal life.
Sent from a mobile device, please excuse any typos.
…On Mon, Jun 17, 2019, 10:22 PM Jonathan Toomim ***@***.***> wrote:
hey @rldleblanc <https://github.com/rldleblanc> I'd like to get some of
this stuff merged in soon. Which of these PRs do you think are ready for
merging?
Note: someone recently started mining v34 shares on BCH p2pool, and I
don't know if they're running #22
<#22> or #14
<#14> or something else. We may
need to change to v35 just to be certain that everyone is running the same
code.
ctrl-f "Desired version rates":
http://ml.toom.im:9348/static/graphs.html?Week
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#22?email_source=notifications&email_token=ACW6TV2LDI3PMBORSDORVSLP3BPKBA5CNFSM4GMNWO2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX5EHWI#issuecomment-502940633>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACW6TV5V3PV7DAPKAW6XNBLP3BPKBANCNFSM4GMNWO2A>
.
|
I manually merged this into jtoomim/p2pool master branch. I will be leaving the 1mb_segwit branch on v33 in case people need to downgrade for some reason. Thanks a lot for writing this. |
This is the basics for removing the transaction code from the share chain. It is built on my previous work for the cashaddr/segwit mining addresses and both will be activated with share version 34. I still have some fixes on the web interface side for the cashaddr/segwit updates. If you want to only see the TX removal fixes, compare the remove_tx_forwarding branch against my p2pool_fixes branch. This is for issue #19 .