Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Health check of an instance #18922

Closed
MorrisJobke opened this issue Sep 9, 2015 · 37 comments
Closed

Health check of an instance #18922

MorrisJobke opened this issue Sep 9, 2015 · 37 comments

Comments

@MorrisJobke
Copy link
Contributor

MorrisJobke commented Sep 9, 2015

@butonic and I had thought about this many times: It would be nice to have a health check for an instance. This could check for stuff like:

This is maybe something for a cleanup step or just for info to start investigations.

I started this ticket to collect possible candidates of weird entries and to be able to check other existing instances for the same symptoms.

cc @karlitschek @PVince81 @nickvergessen @schiesbn @icewind1991 @rullzer @Xenopathic Opinions on this?

@oparoz
Copy link
Contributor

oparoz commented Sep 9, 2015

  • wrong media types
  • broken thumbnails
  • files which think they're folders

@schiessle
Copy link
Contributor

Great idea. Some more ideas:

  • encrypted files where "encrypted" is set to '0' at the file cache
  • unencrypted files where "encrypted" is set to '1' at the file cache

@butonic
Copy link
Member

butonic commented Sep 9, 2015

Some of the checks do take a while. For the implementation I would recommend a section in the admin settings that has buttons to trigger individual checks as well as a check all button.
It should be possible to get the SQL that is executed to check for problems as well as SQL that might have been generated to clean up inconsistencies before executing it. Thet won;t be possible for all checks but anyway.

Also checks for:

  • files for not existing storages
  • share entries for not existing files
  • home storages starting with local::
  • incorrectly formatted etags (should not start or end with double quotes)

@karlitschek
Copy link
Contributor

Very good idea. Should this be part of the repair script? I guess some of the issues are repairable and others not?

@MorrisJobke
Copy link
Contributor Author

Very good idea. Should this be part of the repair script? I guess some of the issues are repairable and others not?

This is more for non-repairable stuff. If it is possible to repair we can do this instead of showing it, but this should primarily help to detect problems that not yet occured, but could be bad.

@karlitschek
Copy link
Contributor

true. but maybe it should be added to the same occ command. a user does't know if something can be repaired or not.

@tflidd
Copy link
Contributor

tflidd commented Sep 9, 2015

  • remove old tables (i.e. from old apps that are not used any more)

@MorrisJobke
Copy link
Contributor Author

remove old tables (i.e. from old apps that are not used any more)

There is already a repair step for this. Any specific tables, then we can add them.

@nickvergessen
Copy link
Contributor

@butonic

share entries for not existing files

This is done from 8.1 or 8.2 in onwards with a cron job

@rullzer
Copy link
Contributor

rullzer commented Sep 11, 2015

Yes this would be great. But! We must be careful here. Automatic repairs need to be very very well analyzed before they are done! And in some cases it might be best just to advice people to post an issue here if the step reports something is broken.

I would suggest to make sure this can only be run for the CLI. Timeouts are dangerous when handling potentially complex/long running tasks.

@butonic
Copy link
Member

butonic commented Sep 11, 2015

Agreed. Let us start with a health check first. Keep repair steps in occ.

@MorrisJobke
Copy link
Contributor Author

  • detecting filecache entries that has a parent in a different storage -> we have seen this, but can't find the reason

@PVince81
Copy link
Contributor

@oparoz
Copy link
Contributor

oparoz commented Sep 18, 2015

  • Find a way to detect id oc_file_locks is full of rubbish by comparing the list of cached files to the actual list of files

@nickvergessen
Copy link
Contributor

@oparoz that is more of a repair step? 😉

@oparoz
Copy link
Contributor

oparoz commented Sep 18, 2015

@nickvergessen It's both. First you need to know that something is completely wrong.

  1. As a user, you get all the weird messages about files being locked, so you suspect something is wrong
  2. You notify the admin
  3. As an admin, you test the health of you system
  4. You see that something is wrong
  5. You run the repair step

No?

@nickvergessen
Copy link
Contributor

Well repair steps dont need "first you need to know", you just run them and they fix it.
The health check is for cases where we can't fix stuff automatically.

@MorrisJobke
Copy link
Contributor Author

  • LDAP entries that are created before a change was done in the LDAP settings (i.e username attribute, homefolder naming rule, ...)

@MorrisJobke MorrisJobke self-assigned this Oct 20, 2015
@PVince81
Copy link
Contributor

@MorrisJobke
Copy link
Contributor Author

child share entries without parent (repair step raised here: #20130)

Nice one :)

@MorrisJobke
Copy link
Contributor Author

@MorrisJobke
Copy link
Contributor Author

@nickvergessen
Copy link
Contributor

duplicate tags per user #20952

That should not be a health check, it needs a fix in the db layer and a pre-update script to fix it.

@MorrisJobke
Copy link
Contributor Author

That should not be a health check, it needs a fix in the db layer and a pre-update script to fix it.

Correct - but we should also find out the reason. We can drop it from here if it is not needed anymore - just collecting stuff.

@cdamken
Copy link
Contributor

cdamken commented Feb 19, 2016

@bboule This is related to https://github.com/owncloud/enterprise/issues/832#issuecomment-143771260 Shouldn't it have the same milestone?

@nickvergessen
Copy link
Contributor

@cdamken this is an overview ticket with things that could be implemented in little steps...

@bboule
Copy link

bboule commented Feb 22, 2016

I think we need to loop @MTRichards into this as well as it seems product related

@MorrisJobke
Copy link
Contributor Author

I have some alpha grade code ... needs some love: https://github.com/owncloud/serverhealth and has already first tests.

@PVince81
Copy link
Contributor

PVince81 commented Mar 4, 2016

@PVince81
Copy link
Contributor

  • detect unmigrated legacy storages, in case the warnings were ignored...

@PVince81
Copy link
Contributor

@pmaier1

@PVince81 PVince81 added this to the backlog milestone Jan 27, 2017
@PVince81
Copy link
Contributor

repair parent-child relationships (non-matching path or storage): #28253

@PVince81
Copy link
Contributor

PVince81 commented Jul 3, 2017

  • repair mime type of non-folders that do have children

@jvillafanez FYI ^

@PVince81
Copy link
Contributor

PVince81 commented Jul 3, 2017

  • delete stray file cache entries that have no matching files and are not accessible through any parents. Normally entries accessible through parents are already cleared by occ files:scan --all, but inaccessible ones aren't and need a different approach.

@ownclouders
Copy link
Contributor

Hey, this issue has been closed because the label status/STALE is set and there were no updates for 7 days. Feel free to reopen this issue if you deem it appropriate.

(This is an automated comment from GitMate.io.)

@PVince81
Copy link
Contributor

PVince81 commented Jan 15, 2018

@ownclouders
Copy link
Contributor

Hey, this issue has been closed because the label status/STALE is set and there were no updates for 7 days. Feel free to reopen this issue if you deem it appropriate.

(This is an automated comment from GitMate.io.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests