-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error Handling Rethink #398
Comments
@rjrodger what release did you want to tackle error handling/logging for ? |
@AdrianRossouw It'll be major for sure. |
We're not on the lastest cause, well, time & priority. Also, some of my remarks may be related to how our stack is built, so that it may not be relevant. |
I would like a major bump to remove the timeout on messages, and possibly adding it back as a "filter" or plugin for some very specific microservices. I think the timeout is awesome for entry-point microservices, and maybe for Chairo. |
@mcollina Do you want seneca to hang in the case of a block at the other side? I'm trying to work out how we could do this and want to be clear as possible. |
Just a heads up @Wardormeur manages CoderDojo's Zen platform. |
Added related issues. |
The timeouts were responsible for cascading failures at higher loads on a system that I worked on before. Poor design decisions around how entities were used meant that each entity load / list would fan out into multiple actions loading more entities, until eventually the node callback queue would get so long that the time it takes to respond to an action would surpass the hardcoded seneca message timeout. so instead of things failing reasonably (as in, one can reason about it) in one place as matteo suggests, it would fail all over the place in a way that had no useful debugging info. |
Please also note: http://senecajs.org/contribute/principles.html - Continuity Error handling rethink != major breakage is acceptable. The challenge here is not just to rethink the error handling, but also to ensure you have a backwards compatibility path. That means code that depends on the old way still needs to work, by using an option, or a plugin. The rule for major releases is: one line of code (typically a new plugin) is all you need to make existing code work. It is also acceptable to have an option flag to revert to previous behaviour. It is an absolutely core value of the Seneca community that existing users are respected and that we do not break existing apps. This does make the work of fixing error handling harder, I know :) /cc @mcdonnelldean @AdrianRossouw @mcollina @davidmarkclements @pelger @geek |
@rjrodger - I wonder if, for each area, can we have a two step approach? Step 1 - unencumbered reimagininghere thoughts can run free, protocols/implementations do not consider backwards compatibility this would be "seneca-next-next" ... postmodern seneca Step 2 - shimmingonce we have nailed down "the best way" then we create an adapter that delivers If necessary, tweaks may then be made to the pristine implementation, but only in support of the shim. this would be "seneca-next" |
I tend to agree with @davidmarkclements, with a catch. |
I agree with the principle but we don't have the time or resources for the above. Realistically what is going to happen is we bludgeon our way through this until we have something smaller and more composable and around version 6 we can start blue sky thinking. We need to be a little more careful here. For instance @Wardormeur has a fairly big system we will need to run some tests against (CoderDojo) ensure we break as little as possible. Basically it must be reasonably backwards compatible within a version. We cannot expect a user to have to spend two days changing code to make v.next work, thats just not going to fly with customers who have built big systems. |
To be clear, I'm saying we supply our shims wholesale - v.next is v.next.next + shim The judgement call is on whether it will save more time and resources in the long run to |
@mcdonnelldean in my opinion error handling is a big picture and need good imrovements. Do you planning to take in the next roadmap? |
@StarpTech I don't think there is anyone here that doesn't agree Error handling needs to be sorted, hence this Issue item. Seneca 3.0 is just out, we have an obligation to folk to ensure current modules are in working order; standard practice for any toolkit this big. Once this work is done we will re-examine the roadmap. |
Hi @mcdonnelldean I just ask this because I cant find it in 4.0, 5.0... |
Thats because it hasn't been planned yet. As mentioned already we are still bedding down 3.0 and making sure everything is ok there. After that we will look at what's next for further versions Kindest Regards, Dean On 2 Sep 2016, at 07:20, Dustin <notifications@github.commailto:notifications@github.com> wrote: Hi @mcdonnelldeanhttps://github.com/mcdonnelldean I just ask this because I cant find it in 4.0, 5.0... You are receiving this because you were mentioned. |
Then I don't understand the prioritisation but thanks for clarification. |
Any updates? No progress since 2 months. |
@mcdonnelldean @rjrodger anything? |
This is not the way to go. There are lots of people who have issues with that and they dont know where they are going. No comment, no Roadmap and too less documentation to work on it. |
Thanks for making this amazing framework. I found it and odd choice to make errors passed in the first argument of the callback fatal but I didn't take the time to write and share an awesome framework so I can't complain. My solution was to create a wrapper function for Seneca that can be used everywhere and then include this line inside the wrapper function somewhere after Seneca had been instantiated:
It still leaves me with the problem of determining how to handle fatal errors, but this way my whole team doesn't have to be retrained on a new way to utilize callbacks that doesn't follow the norms for other libraries. Again thanks. Great framework. |
We also have Going forward, it would be nice if fatal-type errors that seneca throws itself (such as act_not_found) have a meta key attached for On that note on returning I find I need to do a lot of checks right now for Here would be our ideal pattern -- seneca.add('role:entity,cmd:get', (args, done) => {
someEntity.load(args.id, (err, entity) => {
if (err) { return done(new DatabaseError(err)) }
if (!entity) { return done(new NotFoundError(args.id)) }
done(null, entity)
})
}) And in the calling function, typically a seneca-web action using seneca-web-adapter-express - which now passes errors to next (yay): seneca.add('role:web,cmd:get', (msg, done) => {
if (msg.args.user == null) { return done(new UnauthorizedError()) }
if (msg.args.user.role !== 'specific-role') { return done(new ForbiddenError()) }
const id = msg.args.query.id
if (!id) { return done(new ValidationError('id must be provided')) }
seneca.act('role:entity,cmd:get', {id}, done)
}) And in our express app, we handle all the errors from anywhere along the action calls: app.use((err, req, res, next) => {
if (req.headersSent) { return next(err) }
if (err instanceof DatabaseError) { return res.status(500).send('db error') }
if (err instanceof NotFoundError) { return res.status(404).send('couldnt find ' + err.message) }
if (err instanceof ForbiddenError) { return res.status(403).send('you cant do that') }
if (err instanceof UnauthorizedError) { return res.status(401).send('please login') }
if (err instanceof ValidationError) { return res.status(400).send(err.message) }
res.status(500).send('unexpected')
}) This is a moderatly simple example - omitted is the localization of errors into different languages, performing replacements on specific keys in the error context, using parambulator to return the validation errors.... and a bunch of other stuff I've probably missed. For this pattern to work, we need to do a bunch of extra stuff to get around |
Hello, all. In the first place, I'd like to thank all involved in creating this great framework. I found this topic when fighting with I created simple repo demonstrating my problem: https://github.com/agarbund/seneca-error-issue Could someone please explain to me what am I doing wrong here? |
Is there more to this in the past year and a half? Intensive Googling is leading me in circles. The FAQs reference Are the workarounds here "working" so the thread is dead? We love many parts of Seneca, but the lack of clarity in how best to even think about errors causes some major issues in our program. The explanation of existing error handling reads as a very literal interpretation of microservices where each individual plugin would also be an individually hosted container that dies and restarts independently of other containers. That can't be correct as hosting would be nightmarish. I am missing some fundamental Node.js process understanding? Correct me if I'm wrong but a FATAL = SIGTERM and is going to crash the container along with all the plugins in the container, not just the individual node process? Our specific problems arise in a hosted K8 environment. When one pod dies due to a fatal error (typically TIMEOUT), all pods of that same type eventually fail as the message broker attempts to get a satisfactory response back. Yes, they restart and heal, but nothing is timely and the app is down for the time it takes for all pods to restart. The net effect is the same as if we had a more monolithic set up. The app is unusable. Intuitively this feels like an incorrect configuration/setup/pattern in our code. That said, there are quite a bit of moving pieces in a modern hosted environment so who knows! It's challenging for a small team to have experience with the issues that arise going from one to many plugins, local to small distribution, etc. We can't be the only team feeling this way. What would be very helpful to relative newbies like myself is a written explanation of how to think about this issue and why the workarounds "work". I'm hesitant to just start throwing Thanks to all who have contributed their time and energy to this. I know it's easy to just ask questions, but for the sake of future readers, it does seem time to stir the pot. I've admitted my lack of understanding so I'm happy to answer any clarifying questions if it can add to the thread. |
Crickets... |
Do you have any remote idea as to what may be causing the errors? Like, is
it a JS rte, null pointer of some kind? Or like timeouts? Or database?
Are the errors happening in your application actions?
Do your actions work at some point and then fail, so you know the code
works (kind of) and then something happens?
-omar
On Sun, May 12, 2019 at 11:35 AM GL ***@***.***> wrote:
Crickets...
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#398 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACHMRKT5NWQHPDRLBOVC7LPVBPNTANCNFSM4CAVOW6A>
.
--
Omar Gonzalez
s9tpepper@apache.org
|
@s9tpepper timeouts are common. |
But I wasn’t asking if they’re common.
I was asking for more details from Jarek as to get more context.
And they’re not that common if you know what’s timing out and you fix it.
-omar
On Sun, May 12, 2019 at 4:38 PM GL ***@***.***> wrote:
@s9tpepper <https://github.com/s9tpepper> timeouts are common.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#398 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACHMROZPCSTG7JEYDVVBTDPVCTANANCNFSM4CAVOW6A>
.
--
Omar Gonzalez
s9tpepper@apache.org
|
Sorry, my email client messed up my view.
If you have timeouts a lot then something is taking too long, you should
adjust the timeout settings on your Seneca instance so the actions have
adequate time to perform whatever you are doing in your actions.
If you are using default web transport, you must change the transport
timeout as well.
-omar
On Sun, May 12, 2019 at 4:38 PM GL ***@***.***> wrote:
@s9tpepper <https://github.com/s9tpepper> timeouts are common.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#398 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACHMROZPCSTG7JEYDVVBTDPVCTANANCNFSM4CAVOW6A>
.
--
Omar Gonzalez
s9tpepper@apache.org
|
Something like:
const seneca = Seneca({
timeout: 10000, // 10 secs
transport: {
web: 10000,
tcp: 10000
}
})
-omar
On Sun, May 12, 2019 at 4:38 PM GL ***@***.***> wrote:
@s9tpepper <https://github.com/s9tpepper> timeouts are common.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#398 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACHMROZPCSTG7JEYDVVBTDPVCTANANCNFSM4CAVOW6A>
.
--
Omar Gonzalez
s9tpepper@apache.org
|
Error handling needs a rethink, my thoughts here,
A lot of why error handling is bad is due to cryptic messages. We need to rethink the sort of messages we are returning. Gitter is littered with people asking about gate executor timeout questions because they are basically unreadble if you don't know the code.
We are also way to safe in my opinion. We shouldn't be trapping errors, the system ends up in very peculiar states. I'm a firm believer in fail fast and restart. From gitter and issues it does seem people's expectation is fail fast, we really need to consider this.
Bear in mind, our own tests are in the state they are in due to weird error handling, if we had untrapped exceptions and proper err flow we could cut our test suite in half and end up with more coverage.
Contributors and users, please feel free to add pain points and constructive feedback.
The text was updated successfully, but these errors were encountered: