Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chainfee: introduce filterManager and use it for fee floor #8418

Merged
merged 2 commits into from Mar 21, 2024

Conversation

Crypt-iQ
Copy link
Collaborator

This commit introduces a filterManager that uses our outbound peers' feefilter values to determine an acceptable minimum feerate that ensures successful transaction propagation. Under the hood, a moving median is used as it is more resistant to shocks than a moving average.

Mostly looking for an approach ACK on:

  • the choice of a moving median compared to a moving average
  • the choice of exclusively using outbound peers' feefilter to avoid manipulation by an inbound attacker. The downside is that we get less datapoints
  • the choice of constants (polling interval and the moving window) which will affect the overall result
    • a quicker polling interval means more up-to-date data but at the cost of taking up the window size with potentially stale data
    • a larger moving window would give the moving median more "memory", but may end up carrying irrelevant data

Copy link

coderabbitai bot commented Jan 24, 2024

Important

Auto Review Skipped

Auto reviews are limited to the following labels: llm-review. Please add one of these labels to enable auto reviews.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository.

To trigger a single review, invoke the @coderabbitai review command.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit-tests for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit tests for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit tests.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

return minFee
}

if filterFee > minFee {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I wonder if we want to allow the user to override this if they really want to publish at that low fee. I guess ultimately, they can specify the sat_per_vbyte field in that case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting sat_per_vbyte wouldn't override this because most call-sites end up invoking EstimateFeePerKW or comparing against RelayFeePerKW. Both of these call into the filterManager

lnwallet/chainfee/estimator.go Outdated Show resolved Hide resolved
lnwallet/chainfee/estimator.go Outdated Show resolved Hide resolved
values []SatPerKWeight
valuesMtx sync.RWMutex

fetchFunc func(*rpcclient.Client) ([]SatPerKWeight, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can abstract the fetching further. Assuming there's some closure that holds the client, it just becomes a normal call back. Then we also don't need to worry about bitcoind or not below as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another route is making the fetchFunc into an interface, then having one version for btcd and the other for bitcoind. Then when we make the filterManager in the respective fee estimators, we pick the one based on the target backend.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made it into a callback that holds the client, lmk what you think

continue
}

// The feefilter is already in units of sat/KvB.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, didn't know we diverged there re displayed units.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, Core displays as a float

return false
}

// TODO: ignore values outside of a certain band (configurable?)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in values above our max accepted fee?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I think we need some sort of ceiling, but I don't know what to base it off of...

@saubyk saubyk added this to the v0.18.0 milestone Jan 29, 2024
@Crypt-iQ Crypt-iQ marked this pull request as ready for review January 31, 2024 17:59
@yyforyongyu yyforyongyu self-requested a review February 1, 2024 15:25
@saubyk saubyk requested review from ProofOfKeags and removed request for yyforyongyu February 1, 2024 15:28
Copy link
Collaborator

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be put inside PublishTransaction, then every time we broadcast, we get the minfeefilter values from peers. If we are going with the median approach, we need to kick out the outliers first, so I guess the median value is better, tho we could make this configurable, that we can use percentiles.

On the other hand, I think we want fresh data when publishing txns, because the minfeefilter is only relevant by the time we send txns to our peers?

return median, nil
}

type bitcoindPeerInfoResp struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel like this should be part of chain under btcwallet/chain

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely.

@saubyk
Copy link
Collaborator

saubyk commented Feb 1, 2024

tho we could make this configurable, that we can use percentiles.

+1 for configurability and use of percentiles.
Much better control IMO, if we have a degree of confidence that at x rate y% of peers will accept the transaction
and then y can be a configurable value which user can play with, with a default of 70% e.g.

Copy link
Collaborator

@ProofOfKeags ProofOfKeags left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...moving median compared to a moving average...

So there's actually two dimensions being measured here, and the PR as written uses the median for both of them. The way the code reads now, we first collapse our set of peer fee filters down to a median value and then we add it to the moving median window. So with that in mind, knowing that we use moving median/average to reduce time based jitter, what are we smoothing? Outlier malicious peers are already controlled for by the time we even get to the time series data because we take the median there. As such I'm not sure we need to use a moving median to control for it as any jitter that shows up in the time-series will inherently be jitter in the median node's report and thus more likely already reflect broader market conditions.

...exclusively using outbound peers' feefilter...

So thinking more about this, I understand that if we use outbound we have a bit more selectivity when it comes to who we trust to implicitly set our fees for us. The issue is that just because they are outbound connections doesn't mean that they won't be malicious. It does prevent drive-by's but simply knowing this someone could serve a diff fee filter for outbound connections vs inbound ones. However, let's suppose that it does, do we lose anything other than data points? I don't think so. So I think we can fix a lower sample count by just instructing bitcoind to have more outbound peers using addnode and the like. Alternatively if that's too much effort we can always take up to the same number of samples and prefer the ones that are outbound over the ones that are inbound.

I do think that it's probably a good idea to have an escape hatch here as I don't believe bitcoind has a minimum count of nodes it is connected to outbound. So at minimum we probably need to handle the zero case.

...polling interval and the moving window...

The moving window size probably doesn't need to be very large as I expect that most of the relevant smoothing will come from cohort aggregation rather than time-series aggregation.

A quicker polling rate is probably better and we can either extend the window or time-bucketing to make it such that older data is at a lower resolution. That said, like I said in the above commentary, I don't think we need to get cute with the windowing calcs because I expect the important info to come from cohort aggregation.

@yyforyongyu
Copy link
Collaborator

Just realize GetPeerInfo is already implemented here:
https://github.com/btcsuite/btcd/blob/master/rpcclient/net.go#L354

Since this is a feefilter msg sent by our peers and we just cache it, I don't think calling GetPeerInfo periodically is necessary as 1) it's a not sending new requests to peers, so no new info gained 2) we are just reading a cache, so no concern performance-wise. At least that's how btcd handles it.

That being said, I don't think we are sending MsgFeeFilter to our peers in btcd? cc @Roasbeef

@Crypt-iQ
Copy link
Collaborator Author

Crypt-iQ commented Feb 2, 2024

Much better control IMO, if we have a degree of confidence that at x rate y% of peers will accept the transaction
and then y can be a configurable value which user can play with, with a default of 70% e.g.

If we do this, we need a ceiling because o/w if a user sets this to 100% their LND node will set this to a high value and they will lose their money to fees.

So with that in mind, knowing that we use moving median/average to reduce time based jitter, what are we smoothing?

This was laziness on my part, I didn't feel like adding more values to the moving window.

As such I'm not sure we need to use a moving median to control for it as any jitter that shows up in the time-series will inherently be jitter in the median node's report and thus more likely already reflect broader market conditions.

That's true, the only benefit here would be to smooth out any shocks. But as you say if there are shocks it reflects what the broader network is doing and we should probably react to them.

The issue is that just because they are outbound connections doesn't mean that they won't be malicious.

It is much harder for a malicious entity to be the majority of our outbound connections than it is for them to be the majority of our inbound connections (trivial). Assuming the user is running bitcoind, they would need to pollute our addrman table and hope that the next time we start up that their nodes are chosen for outbound connection. It's not impossible, but if we're going to use data from our peers it should be the outbound peers. This is another reason we should try to enforce sensible bounds on the data we receive.

It does prevent drive-by's but simply knowing this someone could serve a diff fee filter for outbound connections vs inbound ones.

For this to have any effect, a lot of nodes would have to be running this patch to serve different outbound feefilters. This would result in transaction relay failing for a lot of txn's. Also this type of coordination would probably not go unnoticed. For example, I have a script that would detect this sort of broader network behavior. Again this isn't impossible for somebody to pull off, but we should have sensible limits in place to mitigate any risk here. Deciding on what those will be might be tricky.

However, let's suppose that it does, do we lose anything other than data points? I don't think so.

I don't follow, can you clarify?

So I think we can fix a lower sample count by just instructing bitcoind to have more outbound peers using addnode and the like.

We could do this, bitcoind will let us do 8 addnode connections (https://github.com/bitcoin/bitcoin/blob/5b8c5970bdfc817cac9b59f699925c4426c59b61/src/net.h#L1488). However, I think lnd instructing bitcoind how to behave is unprecedented for our codebase.

Alternatively if that's too much effort we can always take up to the same number of samples and prefer the ones that are outbound over the ones that are inbound.

I don't think we should ever use inbound data

A quicker polling rate is probably better and we can either extend the window or time-bucketing to make it such that older data is at a lower resolution. That said, like I said in the above commentary, I don't think we need to get cute with the windowing calcs because I expect the important info to come from cohort aggregation.

Yup I agree.

Since this is a feefilter msg sent by our peers and we just cache it, I don't think calling GetPeerInfo periodically is necessary as 1) it's a not sending new requests to peers, so no new info gained 2) we are just reading a cache, so no concern performance-wise. At least that's how btcd handles it.

Our bitcoind peers will periodically send us feefilters (https://github.com/bitcoin/bitcoin/blob/5b8c5970bdfc817cac9b59f699925c4426c59b61/src/net_processing.cpp#L5463-L5472) so it makes sense to poll.
Btcd doesn't send out a feefilter, but the majority of the network is bitcoind so we don't need to worry about it unless we plan to implement it on the btcd side.

@yyforyongyu
Copy link
Collaborator

I don't think we need polling for minfeefilter, just like we don't poll for mempoolminfee. By the time we broadcast the tx, two scenarios may exist,

  • that the polled data is up-to-date and we can use it to guide us to send the tx.
  • that the polled data is outdated, making it less useful to guide us.

If we don't do polling we won't have this issue. As of today, publishing a tx already involves two RPC calls,

  1. call testmempoolaccept
  2. call sendrawtransaction

We might as well call getpeerinfo after testmempoolaccept and process the fresh data.

Re inbound and outbound peers, I find this and this posts helpful. In short, I agree with @Crypt-iQ that we should only use outbound peers.

@ProofOfKeags
Copy link
Collaborator

I don't follow, can you clarify?

I was agreeing with you in that, I don't think there is a real negative consequence other than just smaller sample size if we go with outbound-only data.

However, I think lnd instructing bitcoind how to behave is unprecedented for our codebase.

Hmm yeah I agree that maybe this is a route we'd like to avoid. I do think if we are restricting to outbound only, we need to ensure we always have enough of those connections. Do we know bitcoind's behavior here?

I don't think we should ever use inbound data

ACK

@Crypt-iQ
Copy link
Collaborator Author

Crypt-iQ commented Feb 5, 2024

On second thought I think keeping the moving window is good because if we suddenly lose the majority of our outbound peers and only have one or two peers 1000 sats/KvB (about 10% of the network will return this), we'll potentially be unable to broadcast our transactions.

Do we know bitcoind's behavior here?

It attempts to keep 8 outbound connections by default. IIRC this limit can be raised

@morehouse
Copy link
Collaborator

I don't think we need polling for minfeefilter, just like we don't poll for mempoolminfee. By the time we broadcast the tx, two scenarios may exist,

* that the polled data is up-to-date and we can use it to guide us to send the tx.

* that the polled data is outdated, making it less useful to guide us.

If we don't do polling we won't have this issue. As of today, publishing a tx already involves two RPC calls,

1. call `testmempoolaccept`

2. call `sendrawtransaction`

We might as well call getpeerinfo after testmempoolaccept and process the fresh data.

If we don't adjust our min relay feerate ahead of time (i.e. before we need to go onchain), we won't have a commitment transaction that can actually relay.

I think we really need to do polling so that we can do update_fee negotiation long before we ever call PublishTransaction.

@Crypt-iQ Crypt-iQ changed the title [WIP] chainfee: introduce filterManager and use it for fee floor chainfee: introduce filterManager and use it for fee floor Feb 5, 2024
@morehouse
Copy link
Collaborator

Mostly looking for an approach ACK on:

* the choice of a moving median compared to a moving average

ACK.

I don't think the configurable percentile idea should be implemented right now. Let's keep things simple and reduce configuration until we have a strong reason to do that.

Median sounds like a good default -- 50% of our outbound peers are expected to relay our min fee transactions.

Moving median seems good to smooth out spikiness caused by peer disconnects.

* the choice of exclusively using outbound peers' feefilter to avoid manipulation by an inbound attacker. The downside is that we get less datapoints

I agree outbound peers are preferred.

Perhaps we should set a minimum number of data points we want (e.g., 3?). If we can't get that many data points from outbound connections, we consider some inbound connections to make up the difference.

* the choice of constants (polling interval and the moving window) which will affect the overall result
  
  * a quicker polling interval means more up-to-date data but at the cost of taking up the window size with potentially stale data
  * a larger moving window would give the moving median more "memory", but may end up carrying irrelevant data

5 minute polling interval seems reasonable to me.

I wonder if we should reduce the size of the moving window though. A size of 25 would mean that we use the median relay feerate over the past ~13 blocks. If feerates happen to be consistently increasing, we could miss HTLC deadlines because our moving window was too slow to update.

@yyforyongyu
Copy link
Collaborator

If we don't adjust our min relay feerate ahead of time

Every time we create transactions, when estimating fees, we already call EstimateFeePerKW to get the latest min relay fee.

On second thought I think keeping the moving window is good because if we suddenly lose the majority of our outbound peers and only have one or two peers 1000 sats/KvB (about 10% of the network will return this), we'll potentially be unable to broadcast our transactions.

Could you elaborate a bit? If we only have two peers with 1000 sats/kvB, doesn't it mean the polling data won't be useful because the only peers we have won't accept our transactions?

It attempts to keep 8 outbound connections by default. IIRC this limit can be raised

The max number of outbound connections is independently limited by 10.

@Crypt-iQ
Copy link
Collaborator Author

Crypt-iQ commented Feb 5, 2024

Every time we create transactions, when estimating fees, we already call EstimateFeePerKW to get the latest min relay fee.

If we instantaneously query for feefilters rather than poll, we run into an issue if we only have one or two peers at the time of the query.

Could you elaborate a bit? If we only have two peers with 1000 sats/kvB, doesn't it mean the polling data won't be useful because the only peers we have won't accept our transactions?

So this is what the moving window would help us with, it would smooth out jitter caused by some of our outbound peers disconnecting. If our two remaining outbound peers advertise 1000 sats/KvB rather than the network majority of ~20,000 sats/KvB, we'll create txn's >= 1000 sats/KvB instead of ~ >= 20,000 sats/KvB since we won't have any memory of when we had 8 outbound peers.

The max number of outbound connections is independently limited by 10.

Two of these are block-relay-only and won't send us feefilter values.

Perhaps we should set a minimum number of data points we want (e.g., 3?). If we can't get that many data points from outbound connections, we consider some inbound connections to make up the difference.

I'm wary about this because it could open the door to some malicious behavior. It might be better to just have few data points rather than some inbound peers making us pay too much or too little in fees. bitcoind does actively try to maintain 8 outbound connections so I would expect that we'd have 8 outbounds most of the time.

I wonder if we should reduce the size of the moving window though. A size of 25 would mean that we use the median relay feerate over the past ~13 blocks. If feerates happen to be consistently increasing, we could miss HTLC deadlines because our moving window was too slow to update.

Yeah 13 blocks might be too much. What block value do you (or anybody else) have in mind?

@yyforyongyu
Copy link
Collaborator

If our two remaining outbound peers advertise 1000 sats/KvB rather than the network majority of ~20,000 sats/KvB

Cool I was thinking the opposite, that the network majoirty is 500 sats/kvB, and we have two peers of 1000 sats/kvB, by polling we'd have the memory of 500 sats/kvB, which gives us a feerate that won't be accepted by our peers.

Under the same assumed condition, that we only have two peers, if we don't poll, it means in the worst case we still have 2 peers relaying the tx for us. If we poll, the worst case is we cannot get our tx relayed.

I think we should intead focusing on maintaining X outbound peers - maybe add an outbound peers check in healthcheck.

@morehouse
Copy link
Collaborator

Perhaps we should set a minimum number of data points we want (e.g., 3?). If we can't get that many data points from outbound connections, we consider some inbound connections to make up the difference.

I'm wary about this because it could open the door to some malicious behavior. It might be better to just have few data points rather than some inbound peers making us pay too much or too little in fees. bitcoind does actively try to maintain 8 outbound connections so I would expect that we'd have 8 outbounds most of the time.

If we have 8 outbound peers all the time, then either approach behaves the same. But we should think about corner cases too. What do we do if there's 0 outbound peers? Or what if there's only 1 and its feefilter is an outlier (say 20x higher than actual min relay fee)?

Yeah 13 blocks might be too much. What block value do you (or anybody else) have in mind?

3 blocks?

@yyforyongyu
Copy link
Collaborator

If we have 8 outbound peers all the time, then either approach behaves the same.

I don't think it's the same - it's more design hence more complicated. Plus when 8 peers all increase their feefilter in a short time, polling won't work as we don't have the fresh data.

I also don't think the number of outbound connections is something we can or should control. We can only inform users, and suggest them to take care their bitcoind.

@ProofOfKeags
Copy link
Collaborator

I wonder if we should reduce the size of the moving window though. A size of 25 would mean that we use the median relay feerate over the past ~13 blocks. If feerates happen to be consistently increasing, we could miss HTLC deadlines because our moving window was too slow to update.

The other option would be to use an EMA instead of an SMA here, this will cause us to respond more aggressively to recent history which will increase jitter but still respond to a dynamic environment. If the primary jitter we are trying to smooth is coming from peer churn, then I think an EMA would be sufficient as it would still have some smoothing properties, but any MA will lag the current "correct" reading (by definition). So in an increasing fee environment no matter what [X]MA we go with we will have to cope with the fact that we'll be behind the trend.

@Crypt-iQ
Copy link
Collaborator Author

Crypt-iQ commented Feb 5, 2024

If we have 8 outbound peers all the time, then either approach behaves the same. But we should think about corner cases too. What do we do if there's 0 outbound peers? Or what if there's only 1 and its feefilter is an outlier (say 20x higher than actual min relay fee)?

We could skip a round if we don't have enough data points, but not sure of all the implications

@ProofOfKeags
Copy link
Collaborator

We could skip a round if we don't have enough data points, but not sure of all the implications

If we skip the round we should interpolate the data, solving for finding the reading we would need to keep the output value constant. This is so that we don't distort the measurement for future readings.

@morehouse
Copy link
Collaborator

I don't think it's the same - it's more design hence more complicated. Plus when 8 peers all increase their feefilter in a short time, polling won't work as we don't have the fresh data.

Good point. It's important not to undershoot the relay fee for commitment transactions. A moving window will undershoot during rising fees, while a on-demand polling strategy may undershoot if peers have disconnected right before we do the poll.

Instead of a moving average, how about a max-of-medians in the lookback window? That would mean that the latest data point is used in a rising fee environment, while older data is used for smoothing when fees are falling or oscillating.

Another consideration is that we may want to set the commitment fee rate at some % above the min relay fee, to provide some protection if relay fees rise soon after we craft the commitment transaction.

@yyforyongyu
Copy link
Collaborator

IMO it's at most 8 points of data and I don't think we should over design here. I feel the only thing we should fix is to make sure the majority of the outbound peers can relay our txns, and let users handle maintaining X amount of peers themselves since it's out of our control? We also shouldn't care about how our peers relay the txns, hence the network condition here - that's the job for mempoolminfee?

Copy link
Collaborator

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is to get the tx accepted by our peers,

  • normal condition, no fee spikes, no malicious peers: no difference between polling data and realtime data, tho the actual feerate may differ, the end effect stays the same.
  • fee spikes, no malicious peers: polling data is likely to fail because history doesn't reflect the present.
  • too few peers: say there're only 2 peers, polling data is likely to miss, resulting in the tx being rejected by the 2 peers.
  • malicious peers: nothing we can do, nor we can distinguish, it's the bottom layer's responsibility to get it right, which iirc the outbound peers are chosen at random. Since we are using median here, it's unlikely the attacker can control up to 5 outbound peers.
  • too few peers and malicious: polling helps, until that window is filled up with the malicious data, which means roughly 1 hour later it will behave the same as realtime data. Plus there's nothing we can do atm but inform users to manage their connections.

My main blocker is, the history of the 8 peers doesn't reflect the present preferences, not to mention we may be facing a group of different 8 peers. This doesn't mean the above issues cannot be solved - we can extend the crawler made by @Crypt-iQ and even manage the peer connections in btcwallet based on our understanding of the feefilter distribution, freeing ourselves from using outbound peers only by applying randomized peer selection, while keeping the current PR simple.

It helps because if we use on-demand querying and have one data point, we shouldn't use just that one data point. Instead we can use the lookback window (which would store polling data for the last 3 blocks).

I'm thinking the opposite, if we only have one data point then we only have one peer, then that's the only peer we can rely on, assuming there are no inbound peers, which I think is the case by default. In addition, this is an error state, and we should implement #8487 to mitigate it.

// filterManager uses our peer's feefilter values to determine a
// suitable feerate to use that will allow successful transaction
// propagation.
filterManager *filterManager
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just extend minFeeManager to implement all the new methods?

lnwallet/chainfee/filtermanager.go Outdated Show resolved Hide resolved
}

func fetchBitcoindFilters(client *rpcclient.Client) ([]SatPerKWeight, error) {
resp, err := client.RawRequest("getpeerinfo", nil)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be replaced with GetPeerInfo

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can use this because feefilter in btcjson.GetPeerInfoResult has it as a uint64 and not a float64

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we document that here so that someone coming in later doesn't try to "fix" it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm ok does this mean btcd has been interpreted this field wrong all the time when received it from its peers...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double checked - I think btcd is behaving as expected, and I also think bitcoind is sending this msg using int but somehow printing it using float. As according to the specs, this should be an int64 field.

Tested it by connecting to this weird node, "[2a02:c206:2016:2394::1]:8333", that has a feefilter set to 0.09170997...

  • From bitcoind: "minfeefilter": 0.09170997
  • From btcd: "feefilter": 9170997

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the BIP, bitcoind converts it to a float when using getpeerinfo but uses an int on the p2p layer


// checkFilterBounds returns false if the filter is unusable and true if it is.
func checkFilterBounds(filter SatPerKVByte) bool {
// Ignore values of 0 and MaxSatoshi. A value of 0 likely means that
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have some logs here

lnwallet/chainfee/filtermanager.go Outdated Show resolved Hide resolved
lnwallet/chainfee/filtermanager.go Show resolved Hide resolved
lnwallet/chainfee/filtermanager.go Show resolved Hide resolved
@lightninglabs-deploy
Copy link

@Roasbeef: review reminder
@morehouse: review reminder
@Crypt-iQ, remember to re-request review from reviewers when ready

@Crypt-iQ
Copy link
Collaborator Author

Crypt-iQ commented Mar 1, 2024

I've addressed the main contention of the moving median by removing that logic and just tracking a median without any history. I'll address the other comments later. There are two TODOs left in the code:

  • cap the value that we'll use from our peers' feefilter when deciding a minimum fee
  • potentially ignore feefilter values outside of a certain band

In the meantime, I'll run the code with both bitcoind & btcd backends.

Copy link
Collaborator

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach ACK 🙌

lnwallet/chainfee/estimator.go Outdated Show resolved Hide resolved
lnwallet/chainfee/estimator.go Outdated Show resolved Hide resolved
lnwallet/chainfee/filtermanager.go Show resolved Hide resolved
lnwallet/chainfee/filtermanager.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@ProofOfKeags ProofOfKeags left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits and organizational recommendations.

lnwallet/chainfee/filtermanager.go Outdated Show resolved Hide resolved
lnwallet/chainfee/filtermanager.go Outdated Show resolved Hide resolved
return median, nil
}

type bitcoindPeerInfoResp struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely.

lnwallet/chainfee/filtermanager.go Outdated Show resolved Hide resolved
lnwallet/chainfee/filtermanager.go Show resolved Hide resolved
lnwallet/chainfee/filtermanager.go Show resolved Hide resolved
}

var outboundPeerFilters []SatPerKWeight
for _, peerResp := range resp {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to cite this for loop as an example of one of the many classes of for loops we can permanently eliminate if we merge #8483.

There's nothing wrong with this code, just could have condensed 186-205 into a few filter and map operations.

@Crypt-iQ Crypt-iQ force-pushed the feefilter_0117 branch 2 times, most recently from 80f126d to ef14577 Compare March 12, 2024 17:34
@Crypt-iQ
Copy link
Collaborator Author

Latest change is that the median feefilter is now capped when used in RelayFeePerKW by the 1-conf block target feerate. This prevents an adversary with the majority of outbound connections from forcing us to use a high median feefilter value.

There are three scenarios to consider here:

  • block feerates are steadily high (malicious):
    • the attacker can make us use the going 1-conf block feerate as our minimum fee. This is not ideal, but is fine.
  • block feerates trending upwards (non-malicious):
    • bitcoind's estimatesmartfee lags (not sure about btcd) so our minimum fee result will be capped at a lower block inclusion rate than the current one (and may be less than our peer's feefilter values). This isn't worse than what we have now.
  • block feerates trending downwards (malicious):
    • since estimatesmartfee lags, an attacker can make our minimum fee be as high as the 1-conf block feerate even though feerates are trending down. This is also not ideal, but the worst-case is identical to case 1 and should be fine.

Copy link
Collaborator

@ProofOfKeags ProofOfKeags left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good to ship this as is. Ideally we fix the last few nits in the suggestions but there's nothing pressing that's wrong here so 🫡

EDIT: The linter is failing, though fyi.

Copy link
Collaborator

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's almost good to go, except we need some unit tests.

return 0, errNoData
}

return f.median.UnsafeFromSome(), nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: should UnsafeFromSome only be used in testing @ProofOfKeags ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a situation where you should do UnwrapOrErr and combine it with the previous if block

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

func fetchBitcoindFilters(client *rpcclient.Client) ([]SatPerKWeight, error) {
resp, err := client.RawRequest("getpeerinfo", nil)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double checked - I think btcd is behaving as expected, and I also think bitcoind is sending this msg using int but somehow printing it using float. As according to the specs, this should be an int64 field.

Tested it by connecting to this weird node, "[2a02:c206:2016:2394::1]:8333", that has a feefilter set to 0.09170997...

  • From bitcoind: "minfeefilter": 0.09170997
  • From btcd: "feefilter": 9170997

Comment on lines 215 to 218
// Log the new median.
f.median = fn.Some(med(feeFilters))
log.Debugf("filterManager updated moving median to: %v",
f.median.UnsafeFromSome().FeePerKVByte())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah if you could just allocate a stack variable here for the answer rather than using UnsafeFromSome that would be preferred.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@Crypt-iQ
Copy link
Collaborator Author

Added tests

Copy link
Collaborator

@ProofOfKeags ProofOfKeags left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Send it.

Copy link
Collaborator

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM🎉

@guggero
Copy link
Collaborator

guggero commented Mar 20, 2024

Needs a rebase.

This commit introduces a filterManager that uses our outbound peers'
feefilter values to determine an acceptable minimum feerate that
ensures successful transaction propagation. Under the hood, a moving
median is used as it is more resistant to shocks than a moving average.
@ProofOfKeags ProofOfKeags merged commit b575460 into lightningnetwork:master Mar 21, 2024
25 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
8 participants