Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows 10 Insider: Cannot repeatedly write an image with a mountable partition #883

Closed
6 of 10 tasks
ptr727 opened this issue Jan 8, 2017 · 14 comments
Closed
6 of 10 tasks
Assignees

Comments

@ptr727
Copy link

ptr727 commented Jan 8, 2017

Checklist

  • I looked at https://github.com/pbatard/rufus/wiki/FAQ to see if my question has already been answered.
  • I performed a search in the issue tracker for similar issues, using keywords relevant to my problem.
  • I clicked the Log button in Rufus and copy/pasted the log into the line that says <FULL LOG> below.
  • The log I am copying is the FULL log, starting with the line Rufus version: x.y.z - I have NOT removed any part of it.

Additionally (if applicable):

  • I ran a bad blocks check, by clicking the "bad blocks" check box in Rufus, and confirmed that my USB is not defective
  • I also tried one or more of the following:
    • Using a different USB drive
    • Plugging the USB into a different port
    • Running Rufus on a different computer
  • If using an ISO image, I clicked on the # button (at the bottom of the Rufus interface), to compute the MD5, SHA1 and SHA256 checksums, which are therefore present in the log I copied. I confirmed, by performing an internet search, that these values match the ones from the official image.

Issue description

The problem appears to be that if the disk layout in a compressed disk image is similar to the disk layout on the destination drive, the image contents are not written to disk.

On Win10 Pro x64 Insider Fast Ring, build 14986.

Insert USB key, run diskpart, and clean the disk.
Download a LibreELEC generic disk image, e.g.: http://releases.libreelec.tv/LibreELEC-Generic.x86_64-7.0.3.img.gz
Write the compressed image to the USB stick, interesting to note the disk write spends a long time at 99.9%
The result is a 32MB FAT and 32MB Ext4 partitions, and the rest of the drive unallocated.
Eject the drive.
Install LibreELEC, the result is an install for LE 7.0.3.

Re-insert the drive, cancel the Windows prompt to format the partition.
Download a newer LibreELEC generic disk image, e.g.: http://releases.libreelec.tv/LibreELEC-Generic.x86_64-7.90.010.img.gz
Write the compressed image to the USB stick, acknowledge the multiple partitions warning.
The write completes near instantaneously, as if the image contents are never written.
Eject the drive.
Install LibreELEC, the result is again that for LE 7.0.3, and not 7.9.0.

Insert USB key, run diskpart, and clean the disk.
Write the 7.90 image again, and this time the install is that for 7.90.

Log files are in the attached zip:

Log

rufus.clean.703.log, clean install
rufus.clean.703.partitions.log, partitions after clean install
rufus.730.790.log, write 790 image on top of 703 disk

Logs.zip

@pbatard
Copy link
Owner

pbatard commented Jan 8, 2017

Thanks for the report. I'll take a look at it (and try to replicate your issue) when I get a chance.

I can however already answer one of your questions:

interesting to note the disk write spends a long time at 99.9%

That's because the progression is based on the compressed input rather than the decompressed output. This means that, if the compressed image ends with section that is a few GB of zeroed data then, because that data will occupy next to no space in the compressed source, the progress will barely move even though Rufus will still be writing all these bytes to the USB. Thus the progress may seem frozen even though a lot of data is actually being processed.

I tried to see if there was a way to have the progress based on the uncompressed output, rather than the source, but the issue is that few compression formats include the total size of the decompressed output (which, of course, we need before we can set up a progress based on that output). Especially, if I recall correctly, gzip is one of the formats that doesn't have the decompressed size.

So, to be able to produce a more user-friendly progress, that would be based on the decompressed gzip output, we would first have to decompress the whole stream, to find its total size, and then decompress it a second time to write to USB... which isn't exactly optimal. Thus, we simply base progress on the amount of compressed bytes read, which may result in apparent slowdowns, when a large amount of data that compresses very well is processed, but ultimately results in the fastest completion.

@pbatard pbatard self-assigned this Jan 8, 2017
@pbatard
Copy link
Owner

pbatard commented Jan 9, 2017

Okay, I've been trying to replicate your issue, but no luck so far.

The problem appears to be that if the disk layout in a compressed disk image is similar to the disk layout on the destination drive, the image contents are not written to disk.

No, that's a wrong assumption. If that was the case, the many people who have rewritten similar images over and over would have picked that issue a long time ago, and I would be able to easily replicate your findings, which I am not able to do (Yes, I started with a clean in diskpart and booted LE 7.0.3 before writing the LE 7.9.0 image, but, no, I'm not going to go as far as installing LE to test this, as, if the issue has anything to do with LE altering the USB during installation, then there's something wrong with LE in my book).

Instead, your rufus.730.790.log indicates that you are actually getting a write error:

Writing Compressed Image...
write
\\?\Volume{2c5a4d82-cc52-11e6-bf3a-c724bc1f81f5}\ was already mounted as F:\
Remounted \\?\Volume{2c5a4d82-cc52-11e6-bf3a-c724bc1f81f5}\ on F:\

The write above is a message issued by the decompression routine when it fails to write data. Granted, this is not a very explicit message (I am mostly reusing code here, so it wasn't exactly my idea to make it that terse), and I'll make sure a more explicit error is provided in future versions of Rufus, but it does indicate that there was a write error.

This could be due to multiple things. So, the first thing I'd like to ask you is to double check the SHA-1 of your LibreELEC-Generic.x86_64-7.90.010.img.gz, which you can do in Rufus by clicking the # button at the bottom. It should be c104a511317b93d05dfc65dd4da519f026eac247.

Then, I will ask you to see what happens when you try to write an uncompressed LibreELEC-Generic.x86_64-7.90.010.img, as this will not be using the decompression library write routines. You can uncompress your image using 7-zip.

Finally, if you're still seeing an issue, I would encourage you to see if you have a security solution that could interfere with USB access. There exists a number of security software out there that keeps a lock on USB drivers (which they may do if the drive cannot be mounted in Windows, in order to check boot records, etc.), which may explain why the decompression library fails early.

@ptr727
Copy link
Author

ptr727 commented Jan 10, 2017

You're right, I described the symptoms, not the cause, the main UI did not reporting an error, and no "error" text in the log threw me off.

I tried to write 7.90 over 7.90, and I got the same "write" line, which I now assume means an error, and the UI does not report and error.

Deleting partitions...
Writing Compressed Image...
write
Remounted \\?\Volume{e7ff376c-717e-11e4-bed4-80ee7341a48d}\ on F:\

After the write the disk is no longer recognized in Windows, so I assume something was written, but not enough to result in a valid file system.

Speculation, maybe timing, maybe related to multiple partitions one FAT and recognized and one Ext4 and not recognized, maybe automatic drive letter assignment changes.

The SHA1 of the image is c104a511317b93d05dfc65dd4da519f026eac247, looks like a match.
This machine runs only built in Defender, no 3rd party security.

I did a diskpart clean, wrote the uncompressed image, removed and remounted, and wrote the uncompressed image again.
This time I got a main UI write error:

Deleting partitions...
Writing Image...
write error: [0x00000002] The system cannot find the file specified.
  RETRYING...
write error: [0x00000005] Access is denied.
  RETRYING...
write error: [0x00000005] Access is denied.
  RETRYING...
write error: [0x00000005] Access is denied.
Remounted \\?\Volume{2c5a4d82-cc52-11e6-bf3a-c724bc1f81f5}\ on F:\
\\?\Volume{2c5a4d82-cc52-11e6-bf3a-c724bc1f81f5}\ was already mounted as F:\
Re-mounted volume as 'F:' after error

If I again do a diskpart clean, and then write the image, compressed or uncompressed, all is well.
Speculation, maybe clean the disk before writing, and for the compressed image case, report the error?

Logs attached for 790 over 790, one for the compressed image, and one for the uncompressed image.
logs1.zip

@pbatard
Copy link
Owner

pbatard commented Jan 10, 2017

Thanks for the update.

maybe clean the disk before writing,

Well, the thing is, I used to do just that... and it was causing problems. So I'm reluctant to revert that change, especially as I can't replicate the issue, if the result is that it will help one person... and I will revert to receiving e-mails from people who can no longer write images.

I'd rather understand more precisely why, for your specific configuration, you are seeing such an error, when (as far as I know) nobody else seems to (since the layout of the LE images is common enough that I suspect other people would report a similar issue if it was that common).

Also, please be mindful that you are using an insider version of Windows (i.e. BETA), and I am acutely aware, as has happened in the past, that Microsoft is no stranger to screwing their USB stack in these builds.

So, one thing I'd really like you to test is: Do you see the same issue with the latest non-insider release of Windows.

I may try to replicate the issue on an insider build on my end, but I can't promise anything.

Also, have you tried with a different USB than SanDisk Cruzer? It seems to me that a lot of people are reporting issues/annoyances with these drives (even when not using Rufus at all) that disappear when using a different make/model, so, even as I have seen no verifiable proof that SanDisk Cruzer drives might actually be problematic (and to be fair, I do the bulk of my testing with SanDisk Extreme drives, so I don't have much of an issue with SanDisk as a brand), I'd encourage you to test with a different drive if you can.

Another thing I'd like to know, since it seems cleaning the drive is somewhat involved, is: What happens if you try Alt-Z to zero the drive, before attempting to write the image a second time? Be mindful that, since it writes the whole drive, as opposed to part of it, the zeroing operation may take a little time to complete.

and for the compressed image case, report the error?

Yes. I already implied that I wasn't happy with the error message for compressed images, so that's something I'll fix in 2.12.

@ptr727
Copy link
Author

ptr727 commented Jan 10, 2017

I've noticed the problem with my Kingston drives as well, and I'll try to reproduce on a mainline Win10 build.

I also noticed that if I write, multiple partition warning, write, fail, and write again, the second write succeeds, so it could really just be timing related.
Speculate, write over partition table, windows reacts to drive change, write fails, write again, no reaction, write succeeds.

Can you give me some pointers to where the code is that fails, and I can debug that and see how timing changes, while running e.g. procmon.

@pbatard
Copy link
Owner

pbatard commented Jan 10, 2017

so it could really just be timing related.

I doubt that, as the image writing code is fairly dumb (since there's nothing that complex about copying bytes from an image onto a physical drive).

If I were to bet on something, and considering that LE creates an ESP (EFI System Partition), which Windows tries to hide most of the time (which is yet another very stupid decision from Microsoft, as it creates all kind of annoyances), I suspect that it could have to do with Insider builds going even further and taking a lock or something on some of the ESP files, when they see an ESP. Still, it's just a wild guess.

Can you give me some pointers to where the code is that fails

For uncompressed images, it's this one. You'll notice that it's the same code as the one that's used for zeroing a drive, which is why I'd like you to try Alt-Z.

The part of the code that calls on WriteDrive() is this one. Feel free to hack away! 😉

@pbatard
Copy link
Owner

pbatard commented Jan 10, 2017

Oh, and actually, I remembered wrong when I said that I removed the call to cleaning the drive before writing an image. What actually happened is that I had to add an extra call to zap partitions, before I could call on the same IOCTL as the one Microsoft's diskpart uses to clean drives (IOCTL_DISK_CREATE_DISK) because there is a bug in Windows that may render a drive unformattable (unless you use Linux or a non Windows platform) if you just try to duplicate what diskpart does...

For more on how Windows is super brittle when it comes to actually being able to write to a drive, see this. You may try to remove the ClearMBRGPT() call there to see if that helps.

@ptr727
Copy link
Author

ptr727 commented Jan 10, 2017

Ok, I'll see if I can come up with a way to make it work reliably on my machine, then you can evaluate the risk for a broader audience.

@ptr727
Copy link
Author

ptr727 commented Jan 15, 2017

I did some more testing.

Most LE users would be using the LE USB-SD creator app, I just prefer to use Rufus as a general tool.
The LE USB creator app; writes the image much slower than Rufus, reports the correct % completion (not stuck at 99.9% for a long time), and does not have any issues re-writing the same image (this issues).
The app is here: http://releases.libreelec.tv/LibreELEC.USB-SD.Creator.Win32.exe
The code is here: https://github.com/LibreELEC/usb-sd-creator

During debugging the error in Rufus happens after writing several successful blocks of data writes in WriteDrive().
When wb = 1048576 I can see from notification activity that Windows recognized the FAT partition and assigned it a drive letter, as soon as the drive letter was assigned, the next WriteFile() fails with access denied.

The LE USB creator app is written in Qt, and maybe I'll try to create a debuggable project, but for now I just followed the reasonably simple code flow:

creator.cpp : Creator::writeFlashButtonClicked()

...
bool unmounted = devEnumerator->unmountDevicePartitions(destination);
qint64 deviceSize = devEnumerator->getSizeOfDevice(destination);
...
uncompressedImageSize = getUncompressedImageSize();
ui->flashProgressBar->setValue(0);
ui->flashProgressBar->setMaximum(uncompressedImageSize);
...
emit proceedToWriteImageToDevice(imageFile.fileName(), destination);
...
connect(this, SIGNAL(proceedToWriteImageToDevice(QString,QString)), diskWriter, SLOT(writeImageToRemovableDevice(QString,QString)));

diskwriter.cpp : DiskWriter::writeImageToRemovableDevice(const QString &filename, const QString &device)
if (!open(device)) {

diskwriter_windows.cpp : bool DiskWriter_windows::open(const QString &device)
hVolume = getHandleOnVolume(device, GENERIC_WRITE);
if (!getLockOnVolume(hVolume)) {
...
ok = DeviceIoControl(handle, FSCTL_LOCK_VOLUME, NULL, 0, NULL, 0, &junk, NULL);
...
hRawDisk = getHandleOnDevice(device, GENERIC_WRITE);

...
writeUncompressedImage(filename, device);

diskwriter.cpp : DiskWriter::writeUncompressedImage(const QString &filename, const QString& device)
writeGzCompressedImage(filename, device);

diskwriter.cpp : DiskWriter::writeGzCompressedImage(const QString &filename, const QString& device)
ok = WriteFile(hRawDisk, data.constData(), data.size(), &byteswritten, NULL);
this->sync();
FlushFileBuffers(hVolume);
this->close();
removeLockOnVolume(hVolume);

The logic looks reasonably similar to Rufus, but there is no volume announcement during the image write in LE, and LE works every time.

@pbatard
Copy link
Owner

pbatard commented Jan 16, 2017

I did some more testing.

Sadly, you're not reporting on the testing I asked, which was to confirm whether it is limited to Windows 10 insider (it is) and whether using Alt-Z before attempting to rewrite the data can be used as a workaround (it can).

I have already been burned with some of Windows 10 insider super aggressive policies that Microsoft introduces to test waters (similarly, last time, it had to do with aggressively trying to keep a lock on USB devices, which actually resulted in the safe ejection mechanism being broken), and a lot of people complaining that Rufus didn't work, whereas other applications did, and asking me to fix something that didn't need fixing, because Microsoft eventually identified that they went a bit too far, and fixed the issue they introduced before release.

All this to say that, while I appreciate your pointers, you'll excuse me for not seeing that issue as that high a priority, when it is confined to Insider Builds, and Microsoft is known to break stuff, that may affect an application but not another, in the Insider fast track.

So, for the time being, and especially since I have other stuff to keep me busy, I think I'll take a wait and see approach regarding this Insider specific problem...

The LE USB creator app (...) reports the correct % completion (not stuck at 99.9% for a long time)

That is because they limit themselves to dealing with only two types of compressed images: .gz and .zip. Sadly, Rufus doesn't have that luxury. Instead it also needs to handle .Z, .lzma, .xz, .bz2, and so on. While I may have not have remembered quite precisely which formats provide easy access to decompressed size (it's actually .gz and .zip), I very much remember that I had a serious look at it, and found it wasn't worth the effort if popular image compression formats like .bz2 couldn't be handled. I still have some vague plan that, maybe, when I have nothing better to do, I may try to report progress against the decompressed data size, for compression format that lend themselves to it, but this is hardly a priority...

the error in Rufus happens (w)hen wb = 1048576

Or 1024x1024 = 1MB which is the amount of free space most images/disks tend to reserve before the first mountable partition... and which allows me to deduce, as I suspected, that the issue has to do with Windows Insider trying, a bit too aggressively, to mount the first logical drive it can mount (which is the ESP, starting at the 1MB mark) as soon as it can, with the effect of preventing any physical drive access that also happens to be fall within that logical drive.

This has long been a major bane of Windows (after all, what's the point of locking a physical drive, which Rufus does, if Windows doesn't realize that it should also not allow anyone, including itself, to mount or lock a logical drive within said physical drive?), and I'm already taking all kind of provisions to prevent that from happening. So, before I go and try to add even more empiric code (and maybe some super dedicated code to prevent Windows from mounting said drive — I already go the trouble of temporarily disabling AutoRun, using the cumbersome LGP method, which is the one Microsoft advertises, for crying out loud!), I'd rather be certain that this is not yet another Insider bug that Microsoft will fix.

but there is no volume announcement during the image write in LE

This makes me wonder if the issue could be that Microsoft broke LGPs or AutoRun in Insider builds.

Anyway, once again, I'd rather have some certainty that this is a real issue that will make its way into Release, before going down another rabbit hole. I spent way too much time last time around trying to explain to people that, yes, Insider can and does introduce major bugs, and no, it's not because an application might be affected by it, whereas another similar application is fine, that the first application needs to be fixed post hate.

This is even more true as, for the time being, Alt-Z can be used as a workaround. This means, I'll keep this issue open as low priority for now.

@pbatard pbatard changed the title Compressed disk image not written after writing similar image multiple times Windows 10 Insider: Cannot repeatedly write an image with a mountable partition Jan 16, 2017
@ptr727
Copy link
Author

ptr727 commented Jan 17, 2017

Tested second write on Win10 Pro x64 release 10.0.14393: Works fine

Tested Alt-Z after first write on Win10 Pro x64 Insider Fast 10.0.15007: Alt-Z, drive wipe took a very very long time, but I could second write after wipe. There were a couple of retry errors in the zeroing log:

Format operation started
Requesting disk access...
Opened drive \\.\PHYSICALDRIVE3 for write access
Will use 'F:' as volume mountpoint
I/O boundary checks disabled
Zeroing drive...
write error: [0x00000002] The system cannot find the file specified.
  RETRYING...
write error: [0x00000015] The device is not ready.
  RETRYING...

I also tested using Win32DiskImager, and like the LE USB creator app, it works fine on both release and Insider builds of Win10...

@pbatard
Copy link
Owner

pbatard commented Jan 17, 2017

Thanks for the test.

One thing I forgot to mention is that you can cancel the Alt-Z after a short while, rather than let it complete through the end. It just needs to overwrite the EFI partition, and then, since Windows won't see it any more, it won't try to reaccess it, which seems to be the issue (i.e. it appears that the Insider version of Windows is quite aggressive in trying to remount a partition that it previously had access to, even if was properly unmounted and AutoRun was disabled).

I can't say I'm not curious as to why Rufus seems to be more affected by this than other apps. If anything, we are going further than what other applications do, in that we are formally unmounting the logical volume (FSCTL_DISMOUNT_VOLUME), whereas other utilities don't seem to go through that. We still have a lock to it though, so Windows shouldn't be attempting to mount it back... but it seems Insider does anyway, which, at first glance, could indicate an Insider bug with regards to lock exclusivity.

If the problem still is still present in the next few releases of Insider, I'll try to have a closer look.

@pbatard pbatard removed the deferred label Feb 14, 2017
@pbatard
Copy link
Owner

pbatard commented Feb 14, 2017

Okay, since Microsoft isn't showing much of any sign of altering this new behaviour in rs2_release, and I'd rather have a workaround in case they carry it into the release, I found that, paradoxically, the core of the problem has to do with the disk cleanup we issue before we attempt to write the image (which Win32 Disk Imager doesn't do). I'm still not sure what Microsoft did to make a standard disk cleanup operation that problematic (from what I can tell, they seem to be caching disk data, such as partition layout or mounted drives, a lot more aggressively than in previous Windows versions, which leads me to think that they might not have quite figured out yet how to make sure their cached data is in sync with the latest changes), but at least, I can add a workaround that tells Rufus not to clean up partition data when writing a disk image, which seems to do the trick.

Oh, and one thing I noticed in rs2_release is that Windows now allows you to mount ALL mountable partitions from a REMOVABLE drive, rather than just the first one. This is quite a massive change in a long standing annoying Windows behaviour...

@lock
Copy link

lock bot commented Apr 6, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue if you think you have a related problem or query.

@lock lock bot locked and limited conversation to collaborators Apr 6, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants