Skip to content
This repository has been archived by the owner on Dec 5, 2022. It is now read-only.

dpkg corruption errors when using vmware_fusion provider #24

Closed
phinze opened this issue Jul 10, 2013 · 25 comments
Closed

dpkg corruption errors when using vmware_fusion provider #24

phinze opened this issue Jul 10, 2013 · 25 comments

Comments

@phinze
Copy link

phinze commented Jul 10, 2013

It is possibly compatible with the VMware providers as well but I haven't tried yet.

Alas, this seems not to be the case. And the behavior is weird.

I was setting up a multi-vm environment so at first I thought these errors were coming from contention on the shared apt cache dir, but then I realized I can reproduce it with a fresh single machine.

Steps to reproduce

  1. Use a simple Vagrantfile something like this
  2. Spin up a fresh machine using the official precise64 vmware_fusion box
  3. Attempt to install a package (I tried vim-nox)

Expected Behavior

Happiness, sunshine, flowers

Actual Behavior

Package corruption errors left and right!

dpkg-deb (subprocess): data: internal gzip read error: '<fd:4>: invalid block type'
dpkg-deb: error: subprocess <decompress> returned error exit status 2
dpkg: error processing /var/cache/apt/archives/perl_5.14.2-6ubuntu2.3_amd64.deb (--unpack):
 short read on buffer copy for backend dpkg-deb during `./usr/lib/perl/5.14.2/auto/Encode/JP/JP.so'
Preparing to replace perl-base 5.14.2-6ubuntu2.1 (using .../perl-base_5.14.2-6ubuntu2.3_amd64.deb) ...

I'm guessing it has something to do with the way shared folders are implemented in VMWare, but I haven't had a chance to look into it further. I'll follow up here if I figure anything else out.

@patcon
Copy link
Collaborator

patcon commented Jul 10, 2013

@phinze!!!!

Hello.

@fgrehm
Copy link
Owner

fgrehm commented Jul 11, 2013

Unfortunately I haven't got access to the vmware plugin, so can't be of any help here =/

@patcon
Copy link
Collaborator

patcon commented Jul 11, 2013

Hm. I don't have vmware to reproduce the issue, but do you think it's worth running with VAGRANT_LOG=debug vagrant up and tracking down exactly which command is failing?

Also, seems relevant:
http://raphaelhertzog.com/2011/06/27/deciphering-one-of-dpkgs-weirdest-errors-short-read-on-buffer-copy/

If you manually clean the apt cache after the VM is up, does that make the error go away? Just wondering if we can narrow this down any more without one of us buying vmware :)

cc: @mitchellh just as a random thought, but is this a common dilemma? ie. plugins not being compatible/tested against against the vmware provider? If so, have you considered an initiative to get vmware plugins to the primary maintainer of plugins that meet certain criteria (x number of stars, or whatever)

@tmatilai
Copy link
Contributor

I have VMware Fusion and the provider plugin license, but not sure how much spare time I have next couple of weeks.

I've been using this plugin with Fusion without problems, but only for gem cache as I already had local apt-cacher-ng running.

@patcon
Copy link
Collaborator

patcon commented Jul 11, 2013

You're a rockstar @tmatilai

@tmatilai
Copy link
Contributor

I could not reproduce the corruption at least yet. My guess is that this is not a Fusion issue, but a multi-vm issue. If the VMs were provisioned at the same time, some packages in the cache might have been corrupted, as not all of the Apt's lock files are shared between the guests. The readme already warns about issues of using shared cache between multiple running machines.

@phinze, you can just remove the ~/.vagrant.d/cache/precise64/apt/perl_5.14.2-6ubuntu2.3_amd64.deb from the cache and try again. Or nuke the whole cache as the mighty @patcon suggested. :)

@phinze
Copy link
Author

phinze commented Jul 11, 2013

@patcon!!! my how the open source tables turn!! 😀

@tmatilai thanks for helping out - i just nuked the cache again to be sure i can repro locally, so let's see if i can spit out some cleaner steps:

Prereqs

  • the box at http://files.vagrantup.com/precise64_vmware.box installed as precise64
  • vagrant plugin `vagrant-cachier (0.2.0) installed

Steps

  1. Nuke cache at ~/.vagrant.d/cache/precise64/
  2. make a barebones Vagrantfile that contains config.cache.auto_detect = true
  3. vagrant up --provider=vmware_fusion
  4. ssh in and attempt sudo apt-get update && sudo apt-get install -y vim-nox

Error I see

Building dependency tree
Reading state information... Done
The following extra packages will be installed:
  liblua5.1-0 libperl5.14 libruby1.8 perl perl-base perl-modules tcl8.5
Suggested packages:
  perl-doc libterm-readline-gnu-perl libterm-readline-perl-perl libpod-plainer-perl tclreadline cscope vim-doc
The following NEW packages will be installed:
  liblua5.1-0 libperl5.14 libruby1.8 tcl8.5 vim-nox
The following packages will be upgraded:
  perl perl-base perl-modules
3 upgraded, 5 newly installed, 0 to remove and 77 not upgraded.
Need to get 0 B/13.5 MB of archives.
After this operation, 12.7 MB of additional disk space will be used.
(Reading database ... 23067 files and directories currently installed.)
Preparing to replace perl 5.14.2-6ubuntu2.1 (using .../perl_5.14.2-6ubuntu2.3_amd64.deb) ...
Unpacking replacement perl ...
Preparing to replace perl-base 5.14.2-6ubuntu2.1 (using .../perl-base_5.14.2-6ubuntu2.3_amd64.deb) ...
Unpacking replacement perl-base ...
dpkg-deb (subprocess): data: internal gzip read error: '<fd:4>: data error'
dpkg-deb: error: subprocess <decompress> returned error exit status 2
dpkg: error processing /var/cache/apt/archives/perl-base_5.14.2-6ubuntu2.3_amd64.deb (--unpack):
 subprocess dpkg-deb --fsys-tarfile returned error exit status 2
No apport report written because the error message indicates an issue on the local system
                                                                                         Errors were encountered while processing:
 /var/cache/apt/archives/perl-base_5.14.2-6ubuntu2.3_amd64.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)

Sometimes the crucial line is this instead:

dpkg-deb (subprocess): data: internal gzip read error: '<fd:4>: invalid stored block lengths'

Important Details

  • Host is OSX 10.8.4
  • Host disk is encrypted with FileVault (<-- possibly the key to the problem maybe could be might be...?)

@phinze
Copy link
Author

phinze commented Jul 11, 2013

Wow so this gets even freakier. I compared the hexdumps of a downloaded file from the host and the messed up file in the apt cache. There are just chunks of zeros in the bad file.

https://gist.github.com/phinze/f56e26549121c9d208d3

I'm thinking we're probably looking at an upstream bug here. Just a guess, but I don't think vagrant-cachier is the culprit here. 😉

@phinze
Copy link
Author

phinze commented Jul 11, 2013

UGH confirmed VMWare Fusion Bug.

From the "Shared Folders" developer himself:

I have rebuilt your test application in Ubuntu VMs and can reproduce this issue.
I have so far discovered that it is appears that not all the pages to be written are sent from the client.
It seems that after a long sequence of reads requests from the client the next page which gets sent from the client > to be written has skipped one which then leaves an unwritten empty region of 4096
(which will be all zeros) bytes.

So in addition to prove this is a client side issue as I can reproduce this with Workstation on a Windows host as well as our Fusion products.

http://communities.vmware.com/thread/438804?start=0&tstart=0

💩 💩 💩 💩 💩 💩 💩 💩 💩 💩 💩

@tmatilai
Copy link
Contributor

@phinze wow, amazing work investigating this! I confess that in my limited time window last night I couldn't wait for installing the perl package as my connection to the US mirror was really slow for some reason. Seems that a smaller package didn't trigger the bug. But sure enough, now I can reproduce it with the very same vim-nox/perl package. I'm very sorry for my false testimony.

While waiting for VMware to fix it, a workaround can be to use vagrant-cachier >= 0.2.0 and use NFS mounts (config.cache.enable_nfs = true). Again a quick test, but seemed to work as expected with vim-nox.

EDIT: To use the NFS mounts with Fusion you also need to add config.vm.network :private_network configuration, even if Fusion already sets up "host-only networking". I have reported this to Mitchell but for sure it not a show stopper.

@fgrehm
Copy link
Owner

fgrehm commented Jul 11, 2013

tks for all the help guys :)
shall we add a note to our readme pointing people to this thread and close the issue?

@patcon
Copy link
Collaborator

patcon commented Jul 11, 2013

Oh man, classic triaging :)

What about throwing an actual warning in the logs when vmware is detected and nfs isn't enabled? Perhaps linking to the upstream vmware issue url and suggesting nfs? Might make more sense if this is a long-term issue (it's been open 4 months already)...

@phinze
Copy link
Author

phinze commented Jul 11, 2013

@patcon I like that idea. We can always pull the warning out if it gets fixed, but this prevents future issues from being opened and time from being wasted by Our Valuable Users ™️

If I get a free minute I'll try to throw something together. Anybody else can feel free to grab it too.

@fgrehm
Copy link
Owner

fgrehm commented Aug 3, 2013

@phinze @tmatilai just to double check, the plugin really works fine under vmware with NFS enabled? I want to add a note to the README while we don't have the warning in place but I wanted to make sure it actually works :)

@tmatilai
Copy link
Contributor

tmatilai commented Aug 3, 2013

@fgrehm yes it worked for me when I tested. But I'm not using the apt bucket normally...

@phinze
Copy link
Author

phinze commented Aug 3, 2013

Yup with NFS in place everything worked for me.

On Aug 3, 2013, at 12:12 PM, Teemu Matilainen notifications@github.com wrote:

@fgrehm yes it worked for me when I tested. But I'm not using the apt bucket normally...


Reply to this email directly or view it on GitHub.

@fgrehm
Copy link
Owner

fgrehm commented Aug 3, 2013

@phinze @tmatilai tks guys! I'll add a note to the readme before releasing 0.3.0 :)

@fgrehm
Copy link
Owner

fgrehm commented Aug 5, 2013

I've added a note on 5b5befa and I'm closing this in favor of #37 :)

@nelsonjchen
Copy link

It looks like the developer for the shared folders feature has tracked down the bug! Unfortunately, it doesn't look like a new release of the VMWare Tools with the fix has been released at this time.

@dogweather
Copy link

FYI, VMWare has the fix, but will not release it: https://communities.vmware.com/message/2359275#2359275

IMO, this is not a good way to treat customers.

@tmatilai
Copy link
Contributor

Err, I read that message that the fix will be included in the next release. Am I missing something?

@dogweather
Copy link

@tmatilai This bug was reported early last year and breaks the shared directories feature in VMWare Fusion+Linux. Something you only find out after you've purchased VMWare (This problem is more sweeping than just a Vagrant issue).

To add insult to injury, VMWare has actually written a fix for this bug, but is not releasing it immediately to their customers.

@tmatilai
Copy link
Contributor

Well, the bug seems to only affect big files in some cases. I have personally never experienced it (except when reproducing this issue) even though I use Fusion on daily basis.

I also find it fair that they want to test the fix properly before releasing. Sure I too would appreciate more frequent releases.

@nelsonjchen
Copy link

VMWare Fusion was just updated to 6.0.3 on the 14th. I noticed it went ahead and downloaded tools. I don't have a test case to run that reproduces the original issue to see if it is fixed now but is this still happening?

@dogweather
Copy link

Got the update, and am getting much better results. Syncing seems to be working well with the vm ware vagrant boxes.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants