-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The ext3 file system in the partition for persistence has errors #1396
Comments
Thanks for logging an issue. As I explained in the e-mail I sent earlier, the "tool" we use to create the Also, as opposed to what many other utilities do, and since Rufus doesn't have an installer in the first place anyway, we don't use external DLLs apart from the system ones (but of course, Windows does not have a system DLL that can be used to format a drive to Therefore, our expectation has been that, considering that the only code that we changed was for the I/O backend, and that this I/O backend doesn't appear to have much of an issue when we're writing root inodes and other stuff, the various ext2fs initialization calls we make, which again are supposed to perform the exact same operations as they do on Linux, might result in a file system that is properly initialized. If anything, this would lead us to think that the issue might be with the official code, whose Windows side probably hasn't received as much love as the Linux side, which might explain why the created file system trips e2fsck. Which means that we now have to analyse the internals of ext2fs file system creation, to figure out why code that, for all intent and purposes, we do expect to behave in the same manner regardless of the platform, might not, and this is going to take a while since we have to familiarize ourselves with exactly the kind of stuff we were trying to avoid (i.e. getting super-knowledgeable about ext2/ext3 intrinsics) when we chose to go with "the reference" for formatting an Realistically then, unless someone helps us with that (for instance by providing some reports of "Rufus writes data X into inode Y at position Z whereas it should write data X' into inode Y' at position Z'", or even better, explaining precisely what we might be missing in the code to make our sequence of calls to what should be the same functions as Also, as I mentioned elsewhere, whereas I did indeed see All this to say that, whereas we will be investigating the issue to see if we can fix the e2fsck results, it's unlikely to happen as soon as you wish it will and that if some Linux folks, with knowledge of the ext2/ext3 intrinsics, want to shed some technical light as to what exactly e2fsck seems to complain about, and what we might be missing from our ext initialization (which, it should be pointed out, is not directly derived from |
Have you seen that e2fsprog is part of Cygwin? It might help in debugging (to check if it is subject to the same bug, and in general how it behaves. I guess Cygwin is FOSS (freer than AOMEI's tool). https://cygwin.com/cgi-bin2/package-cat.cgi?file=x86%2Fe2fsprogs%2Fe2fsprogs-1.42.12-1&grep=hat |
Yes, that's where the Windows support comes from in e2fsprogs, coz it sure isn't compatible with MSVC and MinGW. My worry however is that cygwin is doing so much to fool an application into believing that it runs on Linux (that's basically the whole motto of cygwin) that even the issue doesn't manifest itself there it won't tell much of anything except that code that was mostly designed for Linux like environments appears to work well in Linux-like environment... Oh, and if your idea is that Rufus could/should use cygwin behind the scenes, I'm going to have to disappoint you because any application that relies on cygwin has to rely on a (rather large) As I said, I don't see any easy shortcut here, so troubleshooting this issue is going to take time. |
I understand, that you don't want to rely on any dll, and I think you have good reasons for that. As I said, it might help in debugging. |
Debugging resultsI installed Cygwin and e2fsprogs and fdisk into it. Then I could
So it seems to me that Cygwin can do it. It means that the source code is healthy, and you should be able to use it successfully. Maybe the compiler options/flags used by Cygwin are available (in a makefile or similar). Maybe there is file system corruption because the write operations are not flushed completely, and you should |
I appreciate the help but I would also appreciate if you weren't trying so hard to clutch at straws. Fixing issues is not "guesswork". It is deducing the cause from properly analysed effects, which still hasn't been done here. And for the record, we are opening the partition with Also, a much more relevant test, since of course you've been using Also, please be mindful that we're using At the moment, one of my first planned tests is going to be this: Create a virtual image in Linux (e.g. 32 MB) set with If you really want to help, you can actually carry out that test yourself, since, if you have a Windows platform, you can install VS2019 to run Rufus & debug Rufus, and I already have code to deal with the formatting of a 32 MB image in Rufus which all you have to do to use is uncomment the define for |
|
That's not exactly what I'm saying. What I am saying is that, to properly troubleshoot this issue it is much better to start by figuring out what needs to be done for the lowest common denominator ( I am not planning to drop
Well, if the issue is the inode/bitmap layout (which is what e2fsck seems to report) then I don't think testing
I would have been very surprised if cygwin produced an issue, as, if e2fsprogs does have a Windows implementation, and that one implementation is for cygwin, I'd expect there are quite a few people testing it for errors, as opposed to my MSVC/MinGW, which is something that I had to add myself for Rufus usage (and that I had to implement in a semi-blind manner because only part of the e2fsprogs source is compatible with Rufus) and therefore that I can only wish had as many eyeballs as e2fsprogs to check. I'm still kinda hoping the issue is a simple missing call during all the various steps that are required to set up an It might be interesting to know if the e2fsprogs folks have a test suite that they use to validate file system creation (I haven't really looked into that), as opposed to create the file system and see if e2fsck complains...
Which is not something I am requesting you to do. I am planning to do that when I have a chance. If you do have time to spare and want to help (and don't want to suffer the annoyance of having to contend with Windows), what I'd be more interested into is a technical explanation of what the e2fsck errors mean. For instance: "If e2fsck reports error X then it means that element Y, which is used to provide the file system data about <some specific thing>, is not set to its expected value Z". That's actually one of the other avenues of what I'm planning to do: look at the e2fsck implementation to understand in details what it's really complaining about, as if we properly understand this data, it may become a lot easier to tell "Here, this is what's wrong!" |
In the FAQ, Rufus users are urged to "remind distro maintainers that an open issue is affecting them too, rather than assume that things will get fixed on their own". So although I am basically a satisfied 'customer' of Rufus, I would like to report how this issue may be affecting me. When Ubuntu 19.10 came out, I used Rufus to create a persistent live USB system. I got a workable system with Firefox and LibreOffice. However, even a 32GB USB stick did not have enough space for Steam (which should need <2 GB). I have described the situation in more detail in an Askubuntu question. Something seemed to be consuming huge amounts of space in the persistent partition. After posting on Askubuntu, I tried maxing out the persistent partition (same problem reoccurred) and creating the persistent 18.04LTS system recommended by Steam (triggering the expected casper bug). I have some screenshots showing oddities in disk utilization, but none of the low-level information requested above by @pbatard . This week I tried the answer recommended by @sudodus . By using mkusb+dus (run on a non-persistent 18.04 created in Rufus), I have been able to get persistent 18.04LTS on the same USB stick as before, and to successfully install Steam. Although this thread suggests that the two of you (@pbatard and @sudodus ) have had no previous contact, I have relied on both of your tools in order to get from Windows 7 to a working persistent Ubuntu USB, and I am grateful to both of you for the time that you have donated to your projects. Thank you! In my opinion, Rufus is much easier to use for users coming from an English-speaking Windows background. But at the moment, it seems only mkusb is reliable. I wish both of you the best of luck in finding ways to work (together or separately) for the benefit of Linux users. To finish, I would like to add an 'obiter dictum' on the FAQ comment chiding users for not reporting the Ubuntu casper bug. I guess that poor reporting will always be a particular headache for Rufus, and mkusb, because many of their users will also be encountering Ubuntu and/or Linux for the first time. In my case, I only started using Ubuntu as a sysadmin a few months ago, so my default assumption is that "I'm doing it wrong", rather than "I've encountered a bug"! |
I hope that all of us can co-operate for the benefit of Linux users :-) Do you remember Alexander the great and the Gordian knot? https://en.wikipedia.org/wiki/Gordian_Knot I suggest that you let an experimental Rufus create a partition just big enough to extract everything from the iso file, extract the files, create the BIOS bootloader and add the boot option 'persistent'. Then skip creating a partition behind it (no partition, no ext3 file system) and let Ubuntu 19.10 create it automatically. It will work rather well, but maybe not perfect yet. You can see details about it at the following link, https://help.ubuntu.com/community/Installation/iso2usb/diy and particularly If we find that it works well enough, this can be an option in Rufus (for Ubuntu 19.10 and later versions) alongside the current options. |
Thanks. User reports like yours are an essential way to gauge the severity of an issue, as it provides additional elements to try to replicate an issue and come up with something that can be replicated for troubleshooting. I'll start by restating that my main limiting factor to try to address the problem here is time. Unfortunately, even as it may look so externally, a file system intrinsic issue like this one is unlikely to be resolved in a "Oh, it behaves like this? Then I know exactly what section of the code needs to be patched and this shouldn't take more than 5 minutes to fix". Instead, what's likely to need to happen is something closer to "In order to fix this problem, I need to understand how Now, this being said, I tried to run a few tests similar to what @MatthewForrester described (32 GB drive with a 17 GB partition) and I can confirm the problem with space allocation. Curiously, during my first test, everything seemed to be good, even after copying the Ubuntu 19.10 ISO to the casper partition (in order to see how the amount of used/free space would be reported), until I started to copy a 5 GB Windows ISO and found that I ran out of space. Then, after running the same thing a second time I found that the used space jumped way beyond what was expected after copying the Ubuntu ISO. So this would seem to indicate that the file system is pointing to inodes that haven't been set properly, which can either mean we're creating the inodes properly, but then not properly setting the table or whatever ext# uses as a means of pointing to inodes, or we are setting that table properly, but we are missing an inode initialization step so that Linux doesn't attempt to read garbage data from those inodes. Or it could be an issue with the bitmap tables (which are used in what manner, I don't know yet) if this is what's being used to report used/free space. At any rate, as I pointed above, there's no way around getting familiar to what exactly is going on behind the scenes, by, for instance, comparing a sane file system with the one created by Rufus, to try to understand where exactly the problem lies, and then try to venture a fix. And there's no shortcut to accomplishing that besides devoting it the required time. |
"... tell users that they should just format the casper-rw partition in linux. First of all this is super inconvenient ..." I'm sorry, but you did not read the link. Ubuntu 19.10 creates the partition and file system automatically the first time that the [persistent] live system is booted. It is not inconvenient, only a little delay (for this action to run). Furthermore, Debian 10 does not have this feature, so your ext3 file system is necessary there. |
Ah, I heard the Ubuntu maintainers talk about adding this but I didn't realize that they had done it for 19.10. Still, it's going to be too confusing for users to handle things differently or even subvert the expectation that, if you set a persistent partition in Rufus, then it should be formatted before you boot the OS. Plus my other worry is that the bugfix will be retrofitted to 18.04, but not the automated partition creation (or other Ubuntu-based distros might not follow with the automated formatting), in which case we're going to have another slew of issues. I'm still going to point out that what you are proposing is a workaround, whereas the proper way to address an issue is to fix it. If I had stated that I'm not planning to look into fixing this, it would make sense to try to go with a workaround. But not when my plan is and continues to be to try to address the issue when I have enough time to do so. |
Thank you for taking the time to respond to my post. > I'll start by restating that my main limiting factor to try to address the problem here is time. Unfortunately, even as it may look so externally, a file system intrinsic issue like this one is unlikely to be resolved in a "Oh, it behaves like this? Then I know exactly what section of the code needs to be patched and this shouldn't take more than 5 minutes to fix". As I know nothing about filesystems, I completely accept that a solution will take time. I have made some very minor contributions to another open-source community and I know that my heart sinks when I realize that I'm going to have get my head around yet another subsystem in order to move ahead with the main project. I guess you must have a similar feeling. Just a thought about how to manage this. As an interim step, would it perhaps be worth updating the FAQ so that Rufus users know that there is a problem? I may try to do this myself, but I won't be offended if you roll back since 'a little knowledge is a dangerous thing' and my text might be counterproductive. |
Nah, what I'm planning to do is add the EXPERIMENTAL tag back to the feature in the ChangeLog because that's where you want to advertise that a new feature may still have bugs before people try to use it. It actually used to be flagged EXPERIMENTAL until the last release, since it was still too new to declare as stable, so I'm just going to reinstate it as such for a while. |
OK, I won't touch the FAQ then. Thank you for your software and best of luck. |
👍 |
…d as EXPERIMENTAL * This is in relation to #1396 * Also fix a small typo
Just a small update for those who must be despairing to see progress on this. First of all to tell you that I am working behind the scenes to try to fix this issue, but, as indicated above, it does require time. I can however provide a few elements of what I've found so far:
Once again, I have no idea when I'll get a chance to dive more deeply into these, but, despite what it may look like, progress is being made on this issue, albeit slowly. |
With EXT4, FSArchiver has some logic for this: https://github.com/fdupoux/fsarchiver/blob/master/src/fs_ext2.c |
I appreciate the help, but I'm not using Also the code you pointed to is GPLv2 ONLY (not GPLv2 or later) which means that, even if I wanted, I couldn't use any of it in Rufus (GPLv3 or later)... which actually is probably part of the reason we're having trouble with Still the point I was making above was about |
Okay, I think I have finally pinpointed the issue. The issue has to do with this specific line of code: offset.QuadPart = block * channel->block_size + nt_data->offset; As I mentioned above, it has to do with a 32-bit overflow when computing values, and more specifically with DUMB compilers not performing 64-bit computations when they really should. You see, in the line above, Alas, a 32-bit multiplication instead of a 64-bit one is exactly what's happening here. And it looks like both MSVC and gcc suffer from that illogical behaviour, because, unless you specifically tell the compiler that you do want to compute Damn, and I was always told that trying to micro-manage or micro-optimise what compilers do was a waste of time because they are supposed to be designed to do the smart thing. So much for that... Heck, it almost makes one wish we were still programming in pure assembly, because such an issue would certainly not happen then (since we'd very explicitly specify that we are working on 64-bit register for that multiplication)... Oh and for those who wonder why we're not using a 64-bit block variable in the first place, or more specifically the 64-bit version of the errcode_t unix_write_blk(io_channel channel, unsigned long block,
int count, const void *buf)
{
return unix_write_blk64(channel, block, count, buf);
} Now, as to the reason why cygwin seems to be fine, it's because, from what I could see, it uses the UNIX I/O manager rather than the NT I/O manager for low level disk I/O, so of course, the 32-bit truncation of the multiplication, that (unless it's a side effect of adding the offset, which I don't believe it is) you do get in the original ext2fs code, does not apply since this specific code is actually not the one the official e2fsprogs uses on Windows/cygwin. Oh well, I already had a few fixes I picked, that I was planning to upstream to e2fsprogs once I had finalized |
Congratulations to this good catch, Pete :-)
Best regards
Nio
|
Okay, I will close this issue with the next commit I push, since this should now be fixed. There's a TEST version of Rufus 3.9 you can use, that includes the fix, and which you can download here if you want to confirm for yourself that the problem reported above has been addressed. |
Hi Pete,
I made a persistent live Lubuntu 19.10 with this TEST version of Rufus.
It looks good.
- I tested the ext3 file system with e2fsck and it was healthy.
- When booted into the persistent live system, it worked as it should.
Congratulations! Now I feel happy with Rufus again :-)
Best regards
Nio
|
Great! Thanks for reporting. |
Hi Pete,
I noticed a regression. Our new kind of persistence no longer works in
the developing version of Ubuntu. I wrote a bug report. I am rather sure
that it will also affect Rufus, so I suggest that check if it'affects
you and in that case that you add heat to the bug report by clicking on
'affects me too'.
https://bugs.launchpad.net/ubuntu/+source/casper/+bug/1863672
You find the current daily iso files via the ubuntu iso tracker
http://iso.qa.ubuntu.com/qatracker/milestones/408/builds
Best regards
Nio
|
Thanks for the heads up. I'll try to test this when I get a chance. |
* So, as it happens, when assigning the product of two 32-bit variables into a 64-bit one, compilers default to being *DUMB* and, against all reasonable expectations, do not perform that multiplication as a 64-bit operation (even when the code is compiled as x64). Wow, that's really great decision making by compiler designers if I ever saw some... Whoever decided that C developers would much rather want truncation and 32-bit overflows, instead of the expected *LOGICAL* behaviour of conducting arithmetic operations as 64-bit when the result will be assigned to a 64-bit variable, need to be condemned to a lifetime of trying to help elderly folks trying to conduct simple computing tasks as a punishment... Anyhoo, nt_write_blk()'s offset.QuadPart = block * channel->block_size + nt_data->offset was overflowing 32-bit as soon as block * channel->block_size went over the 4 GB mark, with the disastrous results one can expect. Considering that this is code we practically lifted verbatim from e2fsprogs, I guess e2fsprogs' NT I/O manager was never properly tested with anything larger than a 4 GB. Awesome! * We fix the above by doing what unix_io.c does and setting the 32-bit read/write_blk() calls to be wrappers around their 64-bit counterpart (since, once you deal with a 64-bit block variable, the computation is conducted as 64-bit). * Also remove a bunch of stuff we don't need from config.h * Closes pbatard#1396
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue if you think you have a related problem or query. |
Checklist
<FULL LOG>
below.Rufus version: x.y.z
- I have NOT removed any part of it.Additionally (if applicable):
I ran a bad blocks check, by clicking Show advanced format options then Check device for bad blocks, and confirmed that my USB is not defective.
I also tried one or more of the following:
If using an image, I clicked on the
(✓)
button to compute the MD5, SHA1 and SHA256 checksums, which are therefore present in the log I copied. I confirmed, by performing an internet search, that these values match the ones from the official image.I tested the checksum of the iso file separately at/after downloading
Actually a got it via zsync, which has a built-in checksum test.
Issue description
When creating persistent a live drive Rufus makes an ext3 file system in the partition for persistence. This file system has errors. It is still possible to use the file system for example with
Lubuntu 19.10, but
Part of the drive space in the file system is lost, wasted, compared to a healthy file system, when Lubuntu or other flavours of Ubuntu is booted. This might not be seen when connected to an operating system that is already running. The size of the lost drive space is small in a 4 GB pendrive, significantly bigger in a 16 GB pendrive, and huge ina 60 GB SSD. I used an SSD in several test runs because the loss is so big, that it very easy to see, and also because it is a high speed device, so that I could do several tests within a rather short time.
The errors in the file system may not create any corruption of files in the beginning, but I am afraid, that it may cause severe problems later on. When booting live-only I noticed that a log directory was mounted (and some of the errors are that some inodes have 2 links).
I tested various things in three different computers and another user tested it in a fourth computer. I ran standard Windows 10 fully up to date in two computers and a 'Windows Insider' version 10 fully up to date in the third computer. After some repetitions things started to look good, but when I specified the starting conditions in a strict way I could reproduce similar (not exactly the same) size of lost drive space.
In these cases between 20 GiB and 40 GiB were lost, already marked as used by df -h when booting into the persistent live system for the first time. A single test starting from exFAT showed 12 GiB lost drive space.
Thinking aloud
It seems to me that the tool to create the file system is buggy, or is using some buggy or not compatible system software (dll-file?).Or maybe you could fix it by compiling with some flags for more strict, less optimized for speed operation. Or you could test if there is a version for ext2 or ext4, that is more robust. I think that ext4 gets much more attention than ext3 nowadays.
Repair
I tested that it was possible to repair the file system with e2fsck -f and then there was no lost drive space, only the normal amount reserved to management (and similar size as that of the linux tool mkusb).
Re-format
I tested also that it was possible to re-format the ext3 file system with AOMEI Parition Assistent (the freeware version), and that file system was working almost correctly, there was a "Resize inode not valid" message. But no other complaint, not the huge amount of errors caused by Rufus, and there was no obvious loss of drive space for the user.
Final words
I wish that you will fix this bug or find a way around it soon, and I am willing to help with testing, when you have new versions of Rufus or simply new ways to set the conditions to make it work.
Good luck
Nio
Log
The text was updated successfully, but these errors were encountered: