Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor and improve CreateUSBStick #116

Merged
merged 20 commits into from Oct 30, 2017
Merged

Refactor and improve CreateUSBStick #116

merged 20 commits into from Oct 30, 2017

Conversation

wjt
Copy link
Member

@wjt wjt commented Oct 23, 2017

This is partly improving our own logic, and partly copying existing logic from Rufus upstream's FormatThread. (It would be nice to add support for our images upstream but I think we should first switch to consuming ISOs.)

This branch is not totally reliable but it is much more likely to succeed (in my testing) than the old incarnation. One fly in the ointment is that re-writing a USB stick that we previously wrote with this process fails very reliably at the ClearMBRGPT() stage. Writing the first sector-ful of zeros works fine, but writing successive sectors fails with weird errors like "file not found" in the same way that writing the second-stage GRUB used to fail. I think this is because overwriting the MBR causes Windows to rescan the drive. What I do not understand is that Rufus upstream does exactly this, and it successfully overwrites drives with our layout almost all the time. The nice thing is that, if you retry, the process works reliably.

Still outstanding:

  • Drop the DO NOT MERGE patch :-)
  • Maybe put a retry loop around ClearMBRGPT?
    • I tried this and it actually made things worse. The process got further but then the volume manager (I think) got stuck and I had to reboot. I think the real problem might be that we are assuming there is only one Volume{GUID} -type path for the USB stick but due to Win 10 1703 mounting all partitions there are actually three. There's a TODO in the code but perfect is the enemy of released.
  • Add the dozen or so new error codes to the big switch which controls whether "Try Again" appears on the final page. I don't really like this design so I might look at doing something better.

https://phabricator.endlessm.com/T13969

This header is included by format.c so it's relevant to our interests.
If you peer through the #ifdef soup, this is all there is to it. We will
make further changes so that it accepts a path rather than formatting
the first basic data partition on the drive, so best make a clean break.

Rather annoyingly, FormatEx accepts non-const strings as parameters.
Fine: we can copy them. (In practice Rufus itself passes literals for these
two parameters at one call site so it would probably be okay to just cast
away the const.)
@wjt wjt force-pushed the T13969 branch 2 times, most recently from 7c2af78 to 70d653d Compare October 25, 2017 14:24
@wjt wjt requested a review from ramcq October 25, 2017 14:24
@@ -4120,7 +4120,8 @@ DWORD WINAPI CEndlessUsbToolDlg::FileCopyThread(void* param)
// Format the partition
IFFALSE_GOTOERROR(FormatFirstPartitionOnDrive(DriveIndex, L"exFAT", EXFAT_CLUSTER_SIZE, dlg->m_cancelOperationEvent, EXFAT_PARTITION_NAME_IMAGES), "Error on FormatFirstPartitionOnDrive");
// Mount it
IFFALSE_GOTOERROR(MountFirstPartitionOnDrive(DriveIndex, driveDestination), "Error on MountFirstPartitionOnDrive");
IFFALSE_GOTOERROR(MountFirstPartitionOnDrive(DriveIndex, driveDestinationA), "Error on MountFirstPartitionOnDrive");
driveDestination = driveDestinationA;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tabs/spaces

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a commit fixing the mixed indentation for this whole function.

PARTITION_INFORMATION_EX exfatPartition;
PDRIVE_LAYOUT_INFORMATION_EX DriveLayout = (PDRIVE_LAYOUT_INFORMATION_EX)(void*)layout;
PDISK_GEOMETRY_EX DiskGeometry = (PDISK_GEOMETRY_EX)(void*)geometry;
int formatRetries = 5;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be deleted later, but unused in this commit at least.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Squashed removing this unused variable into this commit.

@ramcq
Copy link

ramcq commented Oct 27, 2017

Very minor comments but basically OK to merge. Should be an improvement.

wjt and others added 17 commits October 30, 2017 11:13
There is no need to perform this juggling act: if you can address a
drive, you can format it. We know how to address a non-basic-data
partition on a disk, even though it doesn't get assigned a
Volume{GUID}-style path (on Windows versions prior to 1703 "Creators
Update", that is...).

Previously, on error we would unmount whichever partition was mounted at
the time; on success, eoslive would remain mounted. Instead, we now
unconditionally unmount the ESP at the end of the process, and leave the
image partition mounted (if indeed it is).
WriteFileWithRetry is from Rufus. Who knows if the retry will help, but
it can't hurt.
It's not the "MBR part[ition]"; it does have a proper name.
Our fork of Rufus performs these steps the other way round but upstream
commit 258a4f7 explains that calling
CREATE_DISK with PARTITION_STYLE_RAW leaves GPT artefacts on the disk, and
can render it unusable on Windows. I've seen this happen! You can recover
only by zeroing it more thoroughly (or writing a new partition table) on
Linux or Mac.

I have found that, if this step completes successfully, the rest of the
process is much much more likely to work. Unfortunately this step fails
pretty reliably for me when re-writing a combined USB; re-trying works
so hooray. Tempted to put a retry loop around ClearMBRGPT(). I do not
know why Rufus upstream does not have this problem.
There's some rationale in the comment. Some further rationale is that
Rufus actually does the same thing when writing its NTFS-UEFI partition (a
VFAT partition containing a UEFI bootloader that loads an NTFS driver,
then chainloads the real bootloader from an NTFS partition).

I believe that writing to the MBR causes Windows to rescan the drive, so
we continue to do that just before we are about to sleep and wait for
the logical volumes to re-appear.
Compared to our previous version of this change, this has a number of
advantages:

* handle negative return values from bled_uncompress_with_handles besides
  -1, since at least the unxz decompressor can return
* log the error code returned
* return FALSE from WriteDrive on error (even though no call site checks
  it)
* use ERROR_WRITE_FAULT rather than ERROR_NO_MEDIA_IN_DRIVE, which seems
  closer to the likely truth
* accepted upstreamed as 8863180:

  > [cmp] propagate decompression errors from bled
  >
  > If, for example, you have a truncated gz-compressed file and try to
  > write it to disk, bled_uncompress_with_handles() will return an error.
  > Previously, this was not reported back to the user.
Manually-applied cherry-pick of upstream commit
9dd06e9.

Signed-off-by: Will Thompson <wjt@endlessm.com>
This appears to improve the reliability of the process (and it's what
upstream does).
We do not care where the image partition is mounted; we just want it
mounted *somewhere*. Remounting it is relatively harmless but causes
Windows to pop up more spurious Explorer windows.
If not, a generic error is shown, which is not the end of the world. But
it's nice to get a bit of computer-aided assistance to do better.
This will allow us to see from analytics data what errors occur in the
wild, and the relative impact of each.
Previously, we'd just display whatever the previous suggestion was, or the
untranslated English placeholder string from the template if there was no
previous suggestion.
This means we don't have to remember to update this switch when we add new
recoverable errors.

I don't really look at the exceptionTracking data but since it's here...
@wjt wjt merged commit 4824a79 into master Oct 30, 2017
@wjt wjt deleted the T13969 branch October 30, 2017 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants