Skip to content
This repository has been archived by the owner on Feb 26, 2020. It is now read-only.

VERIFY(BSWAP_32(sa_hdr_phys->sa_magic) == SA_MAGIC) failed #157

Closed
quackylad opened this issue Aug 18, 2012 · 4 comments
Closed

VERIFY(BSWAP_32(sa_hdr_phys->sa_magic) == SA_MAGIC) failed #157

quackylad opened this issue Aug 18, 2012 · 4 comments
Labels

Comments

@quackylad
Copy link

Hi all,

I've just built a hybrid (amd64 kernel with x86 user land) Gentoo system on a OCZ RevoDrive 3 X2 which I'm using with mvsas built into the kernel (rather than a module). The RevoDrive is a PCIe card with 4x60GB SSDs that I'm using with ZFS for striping. Sadly, I can't get GRUB2 to boot off the RevoDrive (unknown filesystem) so I've put /boot onto a ATA drive.

My first problem is the 2012.1 LiveDVD saw my ATA drive as sda and my RevoDrive as sd[b-e], but once inside the initrd the ATA drive is sde therefore the zpool I've created is now on sd[a-d] rather than sd[b-e]. I work around the problem by "shell"ing out and:-

zpool export rpool
zpoo import -f rpool
zfs umount -a
zfs umount -a
exit

but I have to do that every time I boot, just wondering if there's an easy way?

My second problem is more serious, using

kernel 3.3.8
spl-0.6.0-rc10
zfs-0.6.0-rc10

I can panic the kernel anytime because of a problem in my filesystem. I built my system using mostly the ZFS install script by rayo on github and it was stable. But writing a lot of data to ZFS seems to have caused a problem (I was emerging gnome-3.4 from the overlay)

In /var/tmp/portage I have a folder called .unmerge, if I:-

cd /var/tmp/portage
ls -la
echo .unmerge/*

Everything is ok, but any of these three commands panics my kernel:-

echo .unmerge//
ls .unmerge
rm -r .unmerge

Giving me:-

VERIFY(BSWAP_32(sa_hdr_phys->sa_magic) == SA_MAGIC) failed
SPLError: 12693:0:(sa.c:1281:sa_build_index()) SPL PANIC

Scrubbing doesn't seem to see or fix the problem (zpool status -v saying no data errors).

Any suggestions or more information I can provide?

Kind regards,

Paul

@quackylad
Copy link
Author

Okay, I've shortened the steps needed for a work-around for problem one:-

zpool export rpool
zpool import -fN rpool
exit

And found why problem two occured in the first place. Seems my DIMMs have a problem around the 5017MB & 14GB points (thanks to memtest86+ on the Gentoo DVD) so when emerge'ing gnome I must have used more than 5GB of memory and caused the corruption in the first place.

Sadly for me, I've tried using just 1 of the 4 DIMMs and that didn't work, and used another DIMM. I've also dropped thier speed from 1866 to 1800 to 1600 to 1333 to 1066 to 800 and that didn't help. I've also downgraded my MBs BIOS and to the one I first flashed to when I got the kit back in April and that hasn't helped. I'm sure I must have used memtest86+ back then just to check the box out. So now I have 2 check the DIMMs are compatable with the MB, oh the joys of building your own PC.

Paul

@behlendorf
Copy link
Contributor

Right, I doubt ZFS will be able to fix the second issue. It was likely damaged in memory and then that damaged data was written to disk with a good checksum. So to ZFS scrubbing the disk everything looks fine. There's nothing that can really be easily done about that other than perhaps some manual repair.

You could try building ZFS with that VERIFY disabled and then see if you can remove the offending files. Once removed the VERIFY could be reenabled. Do you want to leave this bug open, or shall we just chalk it up to bad hardware.

@quackylad
Copy link
Author

I worked around the second issue, I moved all the files to my ATA drive, destroyed the ZFS pool and re-built the host from scratch with a kernel command line of mem=4096M so no more corruption.

The first issue is still there though, every time I boot I have to export/import the zpool. Perhaps if I had built the zpool using /dev/disk/by-id/wwn* it would have worked but then does ZFS search this folder for all it's misc actions?

I've just gotten GNOME 3.4 working and my app in development going so I don't want to break it all again to see if by-id/www would fix it, has anybody else tried it?

Paul

@behlendorf
Copy link
Contributor

OK, I'm closing this issue because the crash was determined to be due to bad memory. That's not a lot we can do about that. The other issue about path names can be resolved by using the recommended by-id paths.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants