-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NWBib down due to problems on quaoar2 #302
Comments
The issue seems to be that quaoar2 didn't recover (yet) from its critical state. The lobid API is configured to access the cluster via quaoar3, so that remained working. NWBib accessed the cluster via quaoar2 for its classification data, so only NWBib titles failed. Configured NWBib to access the cluster via quaoar3: http://nwbib.de/HT018866841 Also deleted old indexes on the cluster via quaoar3 to make space (which seems to be the original issue on quaoar2). Load of elasticsearch process on quaoar2 is high, so I'll just leave it working and we'll see if the changes via quaoar3 propagate and quaoar2 sorts out itself. |
Doesn't look like quaoar2 would recover. Nagios mail from 30.03.2016, 04:17:
|
After manual deletion of some indexes, a restart of elasticsearch on quaoar2, and some time, the cluster is now back to green status. Closing. Attempted restart on quaoar2 with: But did not start up, checked with: Came back after: Opened hbz/lobid-resources#67 to avoid this kind of problem in the future. |
On easter (26-28 March) several Nagios messages concerning quaoar2 and also emphytos came in.
The first critical one (26.03.2016, 16:07):
From then on, CRITICAL ones like that for quaor2 repeated.
The first regarding emphytos (26.03.2016, 23:58):
I realized yesterday evening that NWBib is down and asked @jschnasse to restart the play app this morning. NWBib is up again, but strangely the detail view for NWBib resources doesn't work, e.g. http://nwbib.de/HT018866841. For other hbz01 resources it works, though, e.g. http://nwbib.de/HT018715226.
Besides resolving this bug, I need some more documentation and probably training to deal with something like this myself.
The text was updated successfully, but these errors were encountered: