Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimale Viable Tx Pool #898

Merged
merged 3 commits into from Jun 21, 2018
Merged

Conversation

cburgdorf
Copy link
Contributor

@cburgdorf cburgdorf commented Jun 12, 2018

What was wrong?

As specified in #866 we need a minimal viable tx pool that may not work as a proper pool at first but at least relay transactions to peers. This PR serves as a stub to signal that I'm on it.

How was it fixed?

Implemented minimal viable transaction pool that simply relays transactions among connected peers.

Cute Animal Picture

put a cute animal picture link inside the parentheses

@@ -23,6 +26,9 @@ def __init__(self, chain_config: ChainConfig) -> None:
self._node_key = chain_config.nodekey
self._node_port = chain_config.port
self._max_peers = chain_config.max_peers
self._tx_pool = TxPool()
self.add_service(self._tx_pool)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the light node also need to propagate the transaction pool? If so, this can go in Node instead of FullNode.

Copy link
Contributor Author

@cburgdorf cburgdorf Jun 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was infact, one of the questions I had in mind. I wasn't sure if a light node relays transactions similar to a full node. Thanks for jumping in to clarify!

@cburgdorf cburgdorf force-pushed the christoph/feat/tx-pool branch 2 times, most recently from 6dac181 to 82efd80 Compare June 14, 2018 12:45
receiving_peer = cast(ETHPeer, receiving_peer)
if receiving_peer is not peer:
receiving_peer.sub_proto.send_transactions(
self._filter_tx_for_peer(receiving_peer, msg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filtering should probably happen first, and then only send if the result of filtering is not an empty list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good catch. I plan to cover all of that with test as well.

@cburgdorf cburgdorf force-pushed the christoph/feat/tx-pool branch 5 times, most recently from 3b14e4e to 9c7b43d Compare June 18, 2018 15:51
@property
def hash(self) -> bytes:
return keccak(rlp.encode(self))


class BaseTransaction(BaseTransactionCommonFields, BaseTransactionCommonMethods):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR but I think we could drop the Common from these names.

p2p/rlp.py Outdated
from evm.rlp.transactions import BaseTransaction

# from eth_typing import (
# Address
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why commented out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactoring leftover. Will fix.

p2p/eth.py Outdated
@@ -40,7 +41,7 @@ class NewBlockHashes(Command):

class Transactions(Command):
_cmd_id = 2
structure = sedes.CountableList(P2PTransaction)
structure = sedes.CountableList(BaseTransactionCommonFields)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be willing to extract these changes to their own PR? They seem like they can stand on their own.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are related (because I need hash() on the P2PTransaction and getting rid of that P2PTransaction alltogether seems like a good thing to me. But I'm happy to move that into a seperate commit to make the core of the tx pool change stand out better.

return [val for val in txs if self._construct_bloom_entry(peer, val) not in self._bloom]

def _construct_bloom_entry(self, peer: BasePeer, tx: BaseTransactionCommonFields) -> bytes:
return "{!r}-{}".format(peer.remote, tx.hash).encode()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have a localized salt of some sort, otherwise, it will be possible for peers to craft bloom filter collisions. Not a big deal in terms of security, but still something I think we should mitigate. The pool can just generate a singe random value on instantiation that it appends to this message.

def __init__(self, peer_pool: PeerPool) -> None:
super().__init__()
self._peer_pool = peer_pool
self._bloom = BloomFilter()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be explicit about the bloom filter parameters as well as having a comment here which documents what our collision rates are likely to be with the given parameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what our collision rates are likely to be with the given parameters

¯_(ツ)_/¯

MockPeerPoolWithConnectedPeers,
)

# TODO: This whole file should live in the trinity tests but can't
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pipermerriam any advice how to deal with that? My limited Python fu tells me the problem is that most of the directories within tests aren't actually Python modules (no __init__.py) and that is the reason why I can't cross import it (..relative.lookup also doesn't work).

But I read somewhere that NOT making the test directories proper python modules is actually the right way to handle test directories because of some pytest madness. Any advice how to handle this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe relocating peer_helpers to something like p2p.utils.testing.peer_helpers. That will cause it to be shipped with the actual p2p module but I don't see that as a big deal as long as the overhead remains small (like not including large fixture files or something like that).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmh...not terribly excited about that 😅 I digged a bit further and noticed that tests already had a __init__.py but tests/p2p did not. So, I tried what would happen if we make that a module and this seems to work: 8827809

Still has some things failing but I think that could be worked out. But maybe we leave this out of this PR and I address that in a follow up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I slightly changed that comment and plan to address that test refactoring in a follow up PR to not grow this PR with more unrelated changes.

@cburgdorf cburgdorf force-pushed the christoph/feat/tx-pool branch 2 times, most recently from e55cf13 to 8fd4cfc Compare June 18, 2018 20:58
@cburgdorf cburgdorf changed the title WIP: Minimale Viable Tx Pool Minimale Viable Tx Pool Jun 18, 2018
p2p/server.py Outdated
self.logger.info("Running server...")
upnp_dev = await self._discover_upnp_device()
async def _make_discovery(self,
upnp_device: Awaitable[upnpclient.Device]) -> DiscoveryProtocol:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing in a single upnp device will not be sufficient for discovery. Take a look at the proposed refactor here, which loops over multiple devices: #905

Copy link
Contributor Author

@cburgdorf cburgdorf Jun 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that commit basically just tries to solve the problem of not having the PeerPool available with the instantiation of the Server. It's a small refactor which doesn't change the rest of the process. I can rebase that in case #905 lands first.

p2p/server.py Outdated
await self._add_nat_portmap(upnp_dev)
external_ip = upnp_dev.WANIPConn1.GetExternalIPAddress()['NewExternalIPAddress']

addr = Address(external_ip, self.port, self.port)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized this will collide with our eth_typing.Address. Just pointing it out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address is a pretty generic name so I expect it to exist in many different modules. That said, in this module only kademlia.Address is used. Do you want me to change that to kademlia.Address to make that more explicit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, just wanted to point it out as we'll probably make the mistake eventually.

def __init__(self, peer_pool: PeerPool) -> None:
super().__init__()
self._peer_pool = peer_pool
self._bloom = BloomFilter(max_elements=10000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some quick checks suggest 10,000 transactions is equivalent to about 90 blocks at current mainnet volume.

Lets bump this to something like 1 million, which looks like it will consume about 1mb of memory. This also suggests a rolling bloom filter (maybe per peer) solution is going to be a priority very quickly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will bump it up to 1m. I'd also like to check what geth and Parity do. If they do a rolling bloom filter or something else and how exactly. But yeah, can make that a high priority.

@cburgdorf cburgdorf force-pushed the christoph/feat/tx-pool branch 3 times, most recently from a14972c to ebad10f Compare June 20, 2018 20:26
@cburgdorf
Copy link
Contributor Author

@pipermerriam @gsalgado This is now rebased on top of the refactored PeerPool. I also bumped up the bloom filter size as suggested.


for receiving_peer in self._peer_pool.peers:
receiving_peer = cast(ETHPeer, receiving_peer)
if receiving_peer is not peer:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've found that in loops like this the following is easier to read. Thoughts?

for thing in many_things:
    if not some_required_condition(thing):
        continue

    do_for_loop_body_logic(thing)

# instead of
for thing in many_things:
    if some_required_condition(thing):
        do_for_loop_body_logic(thing)

Basically, rather than making the context deeper, trying to keep the context as shallow as possible by putting the early exit conditions first and then maintaining the shallow context level for the main function body.

For bodies with a single conditional this isn't a huge improvement, but when we get to having two conditional checks:

for thing in many_things:
    if not some_required_condition(thing):
        continue
    if some_forbidden_condition(thing):
        continue

    do_for_loop_body_logic(thing)

# instead of
for thing in many_things:
    if some_required_condition(thing):
        if not some_forbidden_condition(thing):
            do_for_loop_body_logic(thing)

Copy link
Contributor Author

@cburgdorf cburgdorf Jun 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can live with that. I'm personally more used to something like this (python-pseudocode):

things.filter(lambda thing: return thing.is_cool())
      .filter(lambda thing: return thing.rocks())
      .filter(lambda thing: return thing.doesn_it_kitten())
      .for_each(lamda thing: do_stuff(thing))

I find this far more readable but I'm not sure if something similarish exists. I guess with lambdas being restricted to single lines in Python this probably doesn't work so well in this ecosystem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh and I do agree that rightwards drift is ugly and should be avoided. I'm just not used that much to that loop skipping style.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can accomplish this pretty easily with toolz/cytoolz but the result is far less readable.

tuple(pipe(
    things,
    functools.partial(filter, operator.methodcaller('is_cool')),
    functools.partial(filter, operator.methodcaller('rocks')),
    functools.partial(filter, operator.methodcaller('does_it_kitten')),
    functools.partial(map, do_stuff)
))

I suspect there is a data pipeline library out there somewhere which allows for a smoother api.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the result is far less readable

Well...yes...that is.....GOD DAMN UGLY hahaha!

It's fine, it's the ecosystem I signed up for. If that's the price I pay for working on exciting stuff, I'm happy to pay it ;)

peer: BasePeer,
txs: List[BaseTransactionFields]) -> List[BaseTransactionFields]:

return [val for val in txs if self._construct_bloom_entry(peer, val) not in self._bloom]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick

I find these easier to read as

return [
    val 
    for val 
    in txs 
    if self._construct_bloom_entry(peer, val) not in self._bloom
]
# or if you're more into brevity

return [
    val for val in txs 
    if self._construct_bloom_entry(peer, val) not in self._bloom
]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, same here, I'm more used to

txs.filter(lambda tx: return self._construct_bloom_entry(peer, tx) not in self._bloom)

I like the second of your suggestions more than the first.

@cburgdorf cburgdorf merged commit 82022a7 into ethereum:master Jun 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants