Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As a user I'd like to replace failed drives in a RAID with new ones #737

Closed
therealprof opened this issue Jul 14, 2015 · 4 comments · Fixed by #1971
Closed

As a user I'd like to replace failed drives in a RAID with new ones #737

therealprof opened this issue Jul 14, 2015 · 4 comments · Fixed by #1971
Assignees

Comments

@therealprof
Copy link

For some reason unknown to me there does not seem to be an obvious way to replace a failed drive which is part of a RAID pool by a new one and resync the pool.

Steps to reproduce:

  1. Set up a RAID-1 pool
  2. Shut down the system
  3. Pull a drive
  4. Reboot the system
  5. Have a look around and discover that there's no evidence of a degraded RAID or any possibility to remove the failed drive or add a new one in its place
@iFloris
Copy link

iFloris commented Jul 28, 2015

Ran into a situation comparable to what @therealprof describes here just now.
My steps are the following:

  • Raid 6 pool, running for a few weeks.
  • 1 drive failed.
  • Shares no longer mountable.
  • Bunch of errors in the web-ui such as: Unknown internal error doing a GET to /api/pools?page=1&format=json&page_size=15&count= and Another pool(rockstor_rockstor) has a Share with this same name(home) as this pool(everyraid). This configuration is not supported. You can delete one of them manually with this command: btrfs subvol delete /mnt2/[pool name]/home
  • Delete dropped drive from disk view.
  • Searched for but did not find: a way to repair or resize raid pool.
  • Try to delete pool and start from scratch :(
  • Systems complains I need to delete shares first
  • Try to delete shares - system complains it is in read only mode
  • Try suggested terminal commands to remove shares - errors

End result as in step 5 as described by @therealprof : File system gets stuck in read only mode, shares inaccessible, with no (visible?) way to repair

Next step: Reinstall Rockstor from usb and start over.

@schakrava
Copy link
Member

Thanks @iFloris and @therealprof for your detailed comments. I am holding off until 4.2 kernel to test DR scenarios including this one. The prediction is sometime mid august. Once the behavior is consistent with the kernel, we can add support in Rockstor. @gkadillak is working on a useful alert framework, so it's all coming together. Thanks for your patience.

@iFloris
Copy link

iFloris commented Jul 29, 2015

@schakrava Thanks, sounds great! Other than a slight inconvenience, it is not a problem for me as I currently only use Rockstor to store backups from other machines on and do some testing. (Off topic: Your last two sentences sounded like 80's A-Team Hannibal's voice in my head)

@therealprof
Copy link
Author

@iFloris For me rockstor still feels quite a immature for production use so similarly to you I'm making sure that I do have plenty of fresh backups around (remote and local on external drives) so I can fully restore any valuable data quickly. This also helped me to work around the broken RAID issue by simply manually removing everything from the database and starting fresh from a backup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants