Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add debug logging when NodeFilters fail #69

Merged
merged 8 commits into from
Oct 5, 2022

Conversation

IvanVergiliev
Copy link
Contributor

Description

High-level notes:

  1. The filters are pretty verbose now. This is because each boolean that would cause the filter to fail is checked separately, so that it can be reported accurately in the logs. The verbosity is fairly annoying, but it makes it very easy to check and verify the routing logic simply by looking at the logs. We can revert back to a more succinct version when we're more used to working with these and we're okay with dropping the logging.
  2. The IsHealthy filter is broken down into separate Peers and Syncing filters. When implemented in a single function, it was unnecessarily complicated to properly handle all the failure cases and add logging accordingly. Breaking it down makes each filter fairly simple on its own, and it's also easy to combine them because we can reuse the AndFilter 🙂
  3. The NodeFilters access some *Check internals. For example, the HasEnoughPeers filter needs to access the err field of the PeerCheck object. This seems tolerable for now, but I'm totally open to changing it.
  4. The IsPassing methods of the various Checkers are gradually becoming redundant. As I move the corresponding logic to NodeFilters, I don't think any production code uses the IsPassing methods anymore. I didn't remove them here because it would've required moving some more code around to get the tests to work and I didn't want to make the PR too huge.

Type of change

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • 😎 New feature (non-breaking change which adds functionality)
  • ⁉️ Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • ⚒️ Refactor (no functional changes)
  • 📖 Documentation (updating or adding docs)

How Has This Been Tested?

Deployed to staging and made sure I see the logging messages I expect for various <request, upstream> combinations.


checkIsHealthy := upstreamStatus.BlockHeightCheck.GetError() == nil
isClose := upstreamStatus.BlockHeightCheck.GetBlockHeight()+f.maxBlocksBehind >= maxHeight
zap.L().Debug("Upstream too far behind global max height!", zap.Uint64("UpstreamHeight", upstreamHeight), zap.Uint64("MaxHeight", maxHeight))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add upstreamConfig.ID to these debug logs? To correlate which upstream the filter is acting upon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! Was indeed already planning to do it as I browsed through the logs a bunch of times and had a bit of a hard time figuring out which message was about which upstream, especially when there were concurrent requests in-flight.

@IvanVergiliev IvanVergiliev merged commit d53f9c2 into main Oct 5, 2022
@IvanVergiliev IvanVergiliev deleted the productionize-stuff branch October 5, 2022 07:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants