-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't shutdown lit on accounts service critical error, and register to status server #642
Don't shutdown lit on accounts service critical error, and register to status server #642
Conversation
terminal.go
Outdated
"keeping litd running: %v", err, | ||
case errFunc := <-g.nonCriticalErrQueue.ChanOut(): | ||
serviceName, err := errFunc() | ||
if serviceName == accounts.ACCOUNTS_SERVICE { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't really sure how to do this in a clean way, to be able to tell which subsystem the error is sent from. I'm not a big fan of the channel returning a func() (string, error)
, but couldn't figure out a better solution. Let me know if you have a better alternative!
I guess one alternative would let it be a queue.ConcurrentQueue[error]
queue, and send it to the accounts service only and no other service. But that kind of defeats the purpose of having a queue in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this is why I think it would be cool to have a lit subsystem manager type thing which will handle receiving errors from the various subsystems (just accounts for now) and setting the status correctly, logging etc. Then you would not need this work around
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving some comments for now - just want to gauge what the consensus of the best direction is.
I think ideally we want things to be as generic as possible so that this pattern can easily be applied to other LiT sub-systems
accounts/service.go
Outdated
s.Lock() | ||
s.disable() | ||
s.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest letting disable
handle the locking and unlocking of the mutex itself. And then you can have another disableUnsafe
method which does not grab the mutex so that you can use that one in places where the mutex is already acquired.
func (s *InterceptorService) disable() {
s.Lock()
defer s.Unlock()
s.disableUnsafe()
}
func (s *InterceptorService) disableUnsafe() {
s.isEnabled = false
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it's a bit of a weird pattern with the defer
and returnErr
(pretty cool idea in general though!).
I would make it a bit more explicit by just replacing any return fmt.Errorf()
with return disableAndErrorf(...)
:
disableAndErrorf := func (format string, a ...any) error {
s.disable()
return fmt.Errorf(format, a...)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it's a bit of a weird pattern with the
defer
andreturnErr
(pretty cool idea in general though!). I would make it a bit more explicit by just replacing anyreturn fmt.Errorf()
withreturn disableAndErrorf(...)
:disableAndErrorf := func (format string, a ...any) error { s.disable() return fmt.Errorf(format, a...) }
Ok got it!
The reason I added this pattern was that I was a bit afraid that people will forget to disable the account service if the functions that need to disable the service errors.
I push this fix in separate fixup commits now, but will squash them if you think it looks good :)
accounts/service.go
Outdated
// | ||
// NOTE: The store lock MUST be held as either a read or write lock when calling | ||
// this method. | ||
func (s *InterceptorService) requireRunning() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just use the IsRunning() method instead of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left this out, due to the discussion below :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which discussion?
Why not just have something like this?
func (s *Inter...) isRunning() bool {
return s.Enabled
}
and then
use this throughout?
if !isRunning() {
return ErrAccountServiceDisabled
}
to me, returning an error from a function indicates failure of the function itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry. I ment the discussion that we need to check if the service is running under the same lock, and not drop the lock to then reacquire it, so we still need two different versions of this function which is why I kept this function before.
But agree that it's confusing with the error, so I changed the PR now to instead include a isRunningUnsafe
function that doesn't acquire the lock, and use that function instead where it is needed :)
terminal.go
Outdated
"keeping litd running: %v", err, | ||
case errFunc := <-g.nonCriticalErrQueue.ChanOut(): | ||
serviceName, err := errFunc() | ||
if serviceName == accounts.ACCOUNTS_SERVICE { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this is why I think it would be cool to have a lit subsystem manager type thing which will handle receiving errors from the various subsystems (just accounts for now) and setting the status correctly, logging etc. Then you would not need this work around
terminal.go
Outdated
// is disabled, we still want to mark that the service should be stopped | ||
// so that we can properly shut it down (canceling the ctx, and the | ||
// accounts store) in the shutdownSubServers call. | ||
g.accountServiceShouldBeStopped = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need it to be stopped if the accounts mode is disabled though?
I also think the previous name for the variable is fine btw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah agree that this is just strange. Perhaps we should just close everything in the accountService
immediately if the service is disabled :)!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added that we now stop the account service directly if it is disabled instead!
Thanks for the review @ellemouton! The main issues I see with just having one outer check that checks if a requests should be allowed through or not based on if the service is running or not, and then not checking in any of the internal functions in the server itself, is that:
This is also why it's very important that we often Now it's most clear why we'd need to check if the service is already running before we execute the |
ok cool! thanks for the explanation @ViktorTigerstrom - that makes sense. I didnt think about point 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a first pass. Great work! Looks pretty good, but have a few suggestions to make the diff even smaller.
accounts/service.go
Outdated
s.Lock() | ||
s.disable() | ||
s.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it's a bit of a weird pattern with the defer
and returnErr
(pretty cool idea in general though!).
I would make it a bit more explicit by just replacing any return fmt.Errorf()
with return disableAndErrorf(...)
:
disableAndErrorf := func (format string, a ...any) error {
s.disable()
return fmt.Errorf(format, a...)
}
accounts/service.go
Outdated
account.Payments[hash] = &PaymentEntry{ | ||
Status: lnrpc.Payment_UNKNOWN, | ||
FullAmount: fullAmt, | ||
if !ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we could ever end up here with an all-zero payment hash (e.g. because it wasn't set in one of the calls). Not sure if we should check for that at least before we associate the payment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I'm not completely sure what that I understood this correctly. I think this should be fixed with the fix for this other feedback:
#642 (comment)
But please double check if I misunderstood what you ment with this comment, and let me know if I need to fix something else :)
Thanks a lot for the reviews @ellemouton & @guggero! Working on addressing them. Though I'd like some more feedback on how we should surface the errors to the status server, so would just like to get consensus on that. Thanks for the suggestion on using a callback instead @guggero, will definitely change to that! Though would like to know if we should introduce the sub system server like @ellemouton suggested, to make it more generalizable for future sub systems we might add, or if we should leave that for now and do that later if needed? I think we will still need the internal |
I'd say that if we are on a bit of a time limit with this rn, then we can worry about making things more generalisable in a follow up 👍 Since it sounds like that would take some design work |
Was about to suggest the same! Even though turning everything into subservers sounds great, might be overkill for this PR. So let's see how the diff looks like with just the callback for now. |
Thanks! Will leave it out for now then, and then we can address it in a follow-up later :) |
1d3a4b0
to
1a40522
Compare
Addressed the latest feedback, rebased on main now that the status server PR has been merged, and added unit tests! |
Hmm sorry, looking into to addressing the In terms of
|
ceffb37
to
acbe1dd
Compare
Fixed CI errors! |
7d3af76
to
56bd496
Compare
Updated the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Can you please apply the fixup commits and I'll do a final review on the completed diff?
terminal.go
Outdated
} else { | ||
g.statusMgr.SetRunning(subservers.ACCOUNTS) | ||
stopAccountService() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This else
seems to be wrong... Why would we stop the service if we haven't started it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So agree that this is a bit confusing. The reason why we need to do this is that s.accountService.Stop()
closes the contexts and db store which are opened when we create the account service with accounts.NewService
in the beginning of the LightningTerminal.start
function. This is done regardless of if the accounts service is disabled
or if we fail to start it, so we need to close the contexts and db store either way, so this is why we run the stopAccountService
function either way.
I changed the name of the stopAccountService
function to closeAccountService
to better represent what it actually does to make it more understandable. Let me know though if you have a suggestion of how I could do this better :)
56bd496
to
f23a2d2
Compare
Thanks for the review at @guggero! Addressed the feedback and squashed the commits :)! Snap though, I notice that the |
f23a2d2
to
67c80de
Compare
Fixed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks a lot! Only nits left, LGTM 🎉
4c81420
to
aafd459
Compare
Thanks @guggero 🎉!! Addressed you're latest feedback with the last push :) |
aafd459
to
841aa33
Compare
Added a small fix that ensures that the node runner can't stop the accounts service while a request by an account user is being processed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Only nits/style comments from my side.
The only one I think we should maybe defs address here is converting the new "accounts-mode" option to a "Accounts.Disable" boolean instead?
accounts/service.go
Outdated
// | ||
// NOTE: The store lock MUST be held as either a read or write lock when calling | ||
// this method. | ||
func (s *InterceptorService) requireRunning() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which discussion?
Why not just have something like this?
func (s *Inter...) isRunning() bool {
return s.Enabled
}
and then
use this throughout?
if !isRunning() {
return ErrAccountServiceDisabled
}
to me, returning an error from a function indicates failure of the function itself.
Ensure that we don't stop the service while we're processing a request. This is especially important to ensure that we don't stop the service exactly after a user has made an rpc call to send a payment we can't know the payment hash for prior to the actual payment being sent (i.e. Keysend or SendToRoute). This is because if we stop the service after the send request has been sent to lnd, but before TrackPayment has been called, we won't be able to track the payment and debit the account.
Add the accounts service to status manager. This will allow us to query the status of the accounts service and see if it is running or not. For incoming gRPC requests to the accounts service, we also use the status manager to check if the accounts service is running or not to determine if we should let the request through or not.
841aa33
to
294e9d0
Compare
Thanks a lot for the the review @ellemouton 🎉 🚀!! Addressed the latest feedback, and left some comments for some of the comments which weren't addressed :) |
This PR is based on #541
In this PR we ensure that if the accounts service critically errors, we won't shutdown LiT, but instead log the error and reject any new requests to the accounts service.
We also add the accounts service to the status manager, and use the status from the status manager to determine if we should allow gRPC requests for the accounts service through or not.
Finally, we also add a config option to allow users to disable the accounts service, which ensures that the accounts service isn't started if the config option is set to
disable
.TODO: