Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simh changes .dsk image files by silently adding signature #1059

Open
al20878 opened this issue Jul 28, 2021 · 109 comments
Open

simh changes .dsk image files by silently adding signature #1059

al20878 opened this issue Jul 28, 2021 · 109 comments

Comments

@al20878
Copy link
Contributor

al20878 commented Jul 28, 2021

  • Context

This is a new feature but it is a very BAD one: when a disk image it attached to a device in simh, the simulator appends some kind of a signature to the disk image. WHY is this necessary? If this IS necessary -- can this signature be kept separately in a something like a "Media Descriptor File" .MDS -- but NOT in the image itself. Please!

Just attach a "pristine" .dsk image to a simh disk drive and see it is getting corrupted with an additional sector appended. There seems to be no way to opt out of this behavior. This is also a new behavior (so must have been added recently). This behavior is unacceptable. It may be okay to do this for container files like VHD, but NOT disk sector-by-sector image files, and certainly not silently.

  • the output of "sim> SHOW VERSION" while running the simulator which is having the issue

PDP-11 simulator V4.0-0 Current        git commit id: f1a2c81c
sim> show version
PDP-11 simulator V4.0-0 Current
    Simulator Framework Capabilities:
        32b data
        32b addresses
        Polled Ethernet Packet transports:PCAP:NAT:UDP
        Idle/Throttling support is available
        Virtual Hard Disk (VHD) support
        Asynchronous I/O support (Lock free asynchronous event queue)
        Asynchronous Clock support
        FrontPanel API Version 12
    Host Platform:
        Compiler: GCC 10.2.0
        Simulator Compiled as C arch: x64 (Release Build) on Jul 27 2021 at 23:28:27
        Build Tool: simh-makefile
        Memory Access: Little Endian
        Memory Pointer Size: 64 bits
        Large File (>2GB) support
        SDL Video support: No Video Support
        No RegEx support for EXPECT commands
        OS clock resolution: 1ms
        Time taken by msleep(1): 1ms
        Ethernet packet info: libpcap not installed
        OS: CYGWIN_NT-6.1 ANTON 3.2.0(0.340/5/3) 2021-03-29 08:42 x86_64 Cygwin
        tar tool: tar (GNU tar) 1.34
        curl tool: curl 7.77.0 (x86_64-pc-cygwin) libcurl/7.77.0 OpenSSL/1.1.1f zlib/1.2.11 brotli/1.0.9 zstd/1.5.0 libidn2/2.3.1 libpsl/0.21.1 (+libidn2/2.3.1) libssh2/1.9.0 nghttp2/1.43.0 OpenLDAP/2.4.59 libmetalink/0.1.3
        git commit id: f1a2c81c
        git commit time: 2021-07-26T17:50:48-0700
  • how you built the simulator or that you're using prebuilt binaries

make pdp11
  • the simulator configuration file (or commands) which were used when the problem occurred.

  • the expected behavior and the actual behavior

The disk image files must not be modified from outside the guest operating system that is run under the simulator. You cannot assume that the only user of the image is the simulator itself, so the "addition", which is understood by simh, is actually a corruption of the original data.

  • you may also need to provide specific pointers to data files that may be necessary to demonstrate the problem

This garbage does not belong in the .dsk image file (written beyond the actual last sector of the image):

12C4B000  73 69 6D 68 50 44 50 2D 31 31 00 00 00 00 00 00  simhPDP-11......
12C4B010  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
12C4B020  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
12C4B030  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
12C4B040  00 00 00 00 52 44 35 34 00 00 00 00 00 00 00 00  ....RD54........
12C4B050  00 00 00 00 00 00 02 00 00 09 62 58 00 00 00 02  ..........bX....
12C4B060  4D 6F 6E 20 4A 75 6C 20 32 36 20 31 34 3A 30 32  Mon Jul 26 14:02
12C4B070  3A 30 36 20 32 30 32 31 0A 00 00 00 00 01 00 00  :06 2021........
12C4B080  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
12C4B090  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
12C4B0A0  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
12C4B0B0  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
12C4B0C0  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
12C4B0D0  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
12C4B0E0  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
12C4B0F0  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
12C4B100  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
12C4B110  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
12C4B120  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
12C4B130  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
12C4B140  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
12C4B150  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
12C4B160  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
12C4B170  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
12C4B180  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
12C4B190  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
12C4B1A0  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
12C4B1B0  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
12C4B1C0  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
12C4B1D0  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
12C4B1E0  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
12C4B1F0  00 00 00 00 00 00 00 00 00 00 00 00 D7 9D 98 8C  ............×.˜Œ
@al20878
Copy link
Contributor Author

al20878 commented Jul 28, 2021

This is the revision that added this questionable feature.

commit 049ba3250521f0bc08822088bd023115da030711
Author: Mark Pizzolato <mark@infocomm.com>
Date:   Sat Apr 11 13:01:48 2020 -0700

    DISK: Add robust disk container validation

@markpizz
Copy link
Member

Please explain "which part" of the original data is "corrupted", and how this negatively affects you and all the things you may do with such a container.

@al20878
Copy link
Contributor Author

al20878 commented Jul 28, 2021

First and foremost, adding something to a disk image, changes its size (obviously). I use an SD card to store the drive images for the same disk controller back to back. I put these images on SD card with dd in the raw drive mode. Having an extra sector damages the boot sector of the next drive image stored on the SD! simh used to create under-allocated images (which would read 0 beyond the part stored), so dd'ing such images was not a problem, but now it is a real problem because they corrupt something else when they are bigger.

Next thing is that like I mentioned before, the images created previously were sometimes under-allocated. If the guest OS was merely reserving the sectors of such a disk by marking them used, but never writing to those, the sectors were never added to the file still would be treated as "0"s when read beyond the image boundary. And that was fine. With the current addition, the contents read might well be the "magic" added by simh, and that is not okay. One notable thing is the index file on ODS1, which was allocated but no blocks might have been "cleared" -- merely the clean bitmap is written at the beginning of the file. Still, the structure verification program(s) would go ahead and read those never stored sectors, and expect them to be unused (in accordance with the bitmap). With the new addition, the sector does look like a used file header, actually: as it has a non-zero file number and a checksum (at the end of the sector) -- but they contain "garbage" -- as they do not look like either valid fnum or cksm, or invalidated for a deleted file, so the file header does actually look "corrupted"!

Lastly, the code included in sim_disk.c for recognizing the filesystem is cool but incomplete. So simh cannot be "sure" the detection is valid even if it seems like it has figured it out (or worse -- not figured it out!). The logic is missing some corner cases for ODS1 and RT11 filesystems, and is overly simplified for ODS2 -- from what I can tell. I honestly do not understand why simh, all of a sudden, needs to be concerned of what is stored on disk -- it's the job of the guest OS -- so simh must not mess with the data, either. Also, the current fs recognition as coded won't even work correctly on any big-endian host system.

When I first saw simh starting to print a message on attach that it found a filesystem, I thought, "Oh, fine!" But so can do the file command with the following added magic, for example:

$ cat ~/.magic
1008    string  DECFILE11A      Files-11 On-Disk Structure
>525    byte    x               Level %d
>526    string  x               \b, volume label is '%.12s'

Now simh acts upon that rather intrusively, and that it not fine, anymore.

If you need to store meta-information about a container, you should be doing so in a separate file (or, if you like Windows that much and insist on using the same file -- in a separate NTFS stream), so it does not get in the way of the actual disk data. More so, if the user does not want you to create the container information files stored next to their disk images, they can also control that by (at least) changing permissions for the directory not to accept any new files. Finally, if they want to remove what simh added, they won't need to jump through the hoops and tweak the contents of the .dsk files -- they would merely need to delete those extra files (and also all at once, quickly, with something like rm *.mds). Or, maybe it's even better to store all that info under a ".simh/" subfolder, so it does not mix with the .dsk container file(s).

The .dsk file modification is a totally unacceptable technique, no matter what the intent is / was. When you insert a thumb drive into your computer, your system does not start modifying your data behind your back. Yes, Mac tends to create those annoying "thumb" folders, but they do not change any of your files, just because. They keep their stuff separately, and easily removable, too!

So I urge you to revise these (backward-incompatible) changes with the handling of the image files, and to never modify the .dsk containers from outside the guest OS (formatting a new drive with writing the STD144 track is the only exception -- and even that used to be done only with an explicit consent from the user, either via a command switch or a question asked by simh directly).

@markpizz
Copy link
Member

first and foremost, adding something to a disk image, changes its size (obviously). I use an SD card to store the drive images for the same disk controller back to back. I put these images on SD card with dd in the raw drive mode. Having an extra sector damages the boot sector of the next drive image stored on the SD! simh used to create under-allocated images (which would read 0 beyond the part stored), so dd'ing such images was not a problem, but now it is a real problem because they corrupt something else when they are bigger.

I don't understand exactly what you're describing here. Are you saying that you've got n dd images from some physical disk merely stuffed one behind the other on a SD card that DOES NOT HAVE a file system on it? And when you attach this raw SD card to a simh disk, it is writing beyond the first dd'd image and overwriting the first 512 bytes of the second dd image? I'm wondering how you find such a SD disk useful. Specifically how do you reference the second and subsequent simh dd images stuck one behind the other on that SD disk? If, you actually have a file system on this SD disk, then each of these dd images are separate files and thus if sim_disk adds 512 bytes of data BEYOND the actual size of the simh unit that the container is being attached to, then no other disk images will be corrupted NOR will any of the data WITHIN the container (sized based on the drive it is attached to) be changed. The container size may indeed be expanded to the full container size of the simh device, but any such expansion will contain 0's and thus read operations to the simh device will return the same values as reads to the previously unexpanded drive.

Next thing is that like I mentioned before, the images created previously were sometimes under-allocated. If the guest OS was merely reserving the sectors of such a disk by marking them used, but never writing to those, the sectors were never added to the file still would be treated as "0"s when read beyond the image boundary. And that was fine. With the current addition, the contents read might well be the "magic" added by simh, and that is not okay. One notable thing is the index file on ODS1, which was allocated but no blocks might have been "cleared" -- merely the clean bitmap is written at the beginning of the file. Still, the structure verification program(s) would go ahead and read those never stored sectors, and expect them to be unused (in accordance with the bitmap). With the new addition, the sector does look like a used file header, actually: as it has a non-zero file number and a checksum (at the end of the sector) -- but they contain "garbage" -- as they do not look like either valid fnum or cksm, or invalidated for a deleted file, so the file header does actually look "corrupted"!

As I said above, if the container is expanded to reflect the logical size of the simh unit it is attached to, the expanded content (readable from any OS running within the simulator) will contain 0's and thus return the same contents it did on prior versions. The extra 512 bytes are written past the logical size of the simulated disk drive and thus will never be visible to anything running within the simulator. If you think you've got an example case where this is not happening I want to see the details.

Lastly, the code included in sim_disk.c for recognizing the filesystem is cool but incomplete. So simh cannot be "sure" the detection is valid even if it seems like it has figured it out (or worse -- not figured it out!). The logic is missing some corner cases for ODS1 and RT11 filesystems, and is overly simplified for ODS2 --

You seem to have some specific knowledge about the functional failures in the current detection logic for the file systems you mention. Please provide example disk images with legitimate file systems that aren't properly detected and/or propose changes to fix what you've seen.

from what I can tell. I honestly do not understand why simh, all of a sudden, needs to be concerned of what is stored on disk -- it's the job of the guest OS -- so simh must not mess with the data, either.

It comes down to the general concept that AUTOSIZING of disks is the default since that is the most flexible for many more folks since they generally don't read every detail of exactly how to use each device and as you noted above, older disks which had legitimate file systems on them, but hadn't been written out to the full size of the disk would AUTOSIZE to the wrong disk type. Detecting file systems on disks let SCP know what the simulated operating system actually thought the disk size was when it was originally created and thus allow AUTOSIZE to work correctly, otherwise the disk type would be chosen based on the container size which might not be correct at all.

Also, the current fs recognition as coded won't even work correctly on any big-endian host system.

That may or may not be true for some cases, and certainly could be accommodated if there were any reasonable big endian systems available today. I've got an old Sparc box sitting around which I purchased most of 20 years ago specifically to have as a big-endian test system for simh development. That box hasn't been turned on for more than 10 years due to lack of demand and the horrible performance that it had the last time things were tried. If you can point to a big endian system which can be had cheaply we can look at this.

You really haven't described how or when the additional 512 bytes added to the end of disk container files actually causes "corruption" or otherwise actually impacts you.

@al20878
Copy link
Contributor Author

al20878 commented Jul 28, 2021

The SD card goes into my FPGA that is running PDP11... You see, simh is not the only simulator out there, and your change makes it very hard to make the images interchangeable, like they were before. Yes, SD card does not have a file system of its own, it's barely an array of sectors. Drive images there follow each other one after the other, for a single controller -- all of the same size, so base address of the next one is its number times the size of the emulated drive. The additional sector messes that up (I did not know it was there until I noticed the boot sector [of the next drive] was gone). I know that I can use the transfer size with dd but it was never necessary, because the containers were correctly [under]sized. Like I said, your changes are not backward compatible.

How do you know that the containers were correctly sized by simh? What if there's a human error (a mere typo)? Simh would "label" the pristine image per the wrong attachment right away, and there's no way back -- the wrong data is already there (in that label) that was forced onto the disk contents!

I still don't understand whose decision was that to write something into the .dsk files -- community's? What I know for sure, is that it's the wrong thing to do. You can ask anybody. You should not be writing into user's data. Period. You want to keep your metadata -- keep it separately.

If the goal is the mere "autosizing" for the lazy folks, then I guess they won't object if simh did its bookkeeping separately from the images, like I suggested previously -- either side by side with them, or, better, in a "hidden" subfolder, like ".simh/" -- all those 512-byte extras together. You want them to be tied closely with the original -- the subfolder helps as you can then name each of them there arbitrarily, and use an encoded string for the file names that checksums some .dsk file properties like name, size, inode number, etc, even bootsector. So if the original image gets moved, overwritten etc, you won't be using the "wrong" metadata with the new image. I don't see how that can be any difficult, actually. But it won't break the compatibility, and will leave the entire file for the guest OS disposal, like it should be.

@al20878
Copy link
Contributor Author

al20878 commented Jul 29, 2021

if there were any reasonable big endian systems available today

Then I don't understand why you created your label using the byte conversions as if it was to be used between the CPUs of different endian-ness. That's inconsistent, at best.

@markpizz
Copy link
Member

The SD card goes into my FPGA that is running PDP11... You see, simh is not the only simulator out there, and your change makes it very hard to make the images interchangeable, like they were before. Yes, SD card does not have a file system, it's barely an array of sectors. Drives there follow each other one after the other, for a single controller -- all of the same size, so base address of the next one is its number times the size of the emulated drive. The additional sector messes that up (I did not know it was there until I noticed the boot sector [of the next drive] was gone). I know that I can use the transfer size with dd but it was never necessary, because the containers were correctly [under]sized. Like I said, your changes are not backward compatible.

So, simh is NOT actually doing anything to the SD card, your dd of the disk image happened to take the container's contents plus the additional metadata. As you note, your process can be specific and only dd the part of the container file which actually contains disk data. This would not have even been a problem if each time you moved things to the SD card, you moved ALL of the disk images in order. I say this since what you've described suggests that each successive dd operation for the subsequent disk contents should be specifying the offset into the destination SD where each successive disk image belongs. The first file would have an extra 512 bytes, but writing the second disk image would overwrite those bytes with the data for the second disk image, etc.. Note that simh DID NOT corrupt any disk data, your data move process did. Accommodating this extra data has 2 solutions mentioned here.

How do you know that the containers were correctly sized by simh? What if there's a human error (a mere typo)? Simh would "label" the drive per the wrong attachment right away, and there's no way back -- the wrong data is already there (in that label) that was forced onto the disk contents!

I'm not sure what problem you're describing here. If a user created a container and put a file system on it the file system would be within the bounds of the container he created if he happened to type something wrong when he did that it is his error. If he subsequently realizes that he made a mistake, then he should be creating a new container...

Newly created containers are sized based on the disk type in question. For MANY YEARS simh 4.x has created containers that are the full size of the disk type it was creating. The AUTOSIZING file system detection logic was explicitly added to accommodate legacy disk containers created before the simh 4.x full size creation paradigm. The commit you reference above added the metadata beyond the data part of the container.

PDP-11 simulator V4.0-0 Current        git commit id: 5a158445
sim> sh rl
RL      RLV12, address=17774400-17774411, vector=160, BR5, 4 units
  RL0   2621KW, not attached, on line
        write enabled, RL01, autosize
        AUTO detect format
  RL1   2621KW, not attached, on line
        write enabled, RL01, autosize
        AUTO detect format
  RL2   2621KW, not attached, on line
        write enabled, RL01, autosize
        AUTO detect format
  RL3   2621KW, not attached, on line
        write enabled, RL01, autosize
        AUTO detect format
sim> set rl3 rl02
sim> att rl3 rl3-rl02.dsk
RL3: creating new file: rl3-rl02.dsk
Overwrite last track? [N]
sim> detach rl3
sim> diskinfo rl3-rl02.dsk
Container:              rl3-rl02.dsk
   Simulator:           PDP-11
   DriveType:           RL02
   SectorSize:          256
   SectorCount:         40960
   TransferElementSize: 2
   AccessFormat:        SIMH
   CreationTime:        Wed Jul 28 17:19:05 2021
Container Size: 10,485,760 bytes
sim> dir rl3-rl02.dsk
 Directory of d:\Projects\simh-id32\Visual Studio Projects

07/28/2021  05:19 PM        10,486,272 rl3-rl02.dsk
               1 File(s)        10,486,272 bytes
               0 Dir(s)
sim>

I still don't understand whose decision was that to write something into the .dsk files -- community's? What I know for sure, is that it's the wrong thing to do. You can ask anybody. You should not be writing into user's data. Period. You want to keep your metadata -- keep it separately.

The meta data is absolutely outside of the bounds of the user data in the disk container, and as you suggest is kept separately. The operating systems running in the simulator can not see or touch this data.

If the goal is the mere "autosizing" for the lazy folks, then I guess they won't object if simh did its bookkeeping separately from the images, like I suggested previously -- either side by side with them, or, better, in a "hidden" subfolder, like ".simh/" -- all those 512-byte extras together. You want them to be tied closely with the original -- the subfolder helps as you can then name each of them there arbitrarily, and use an encoded string for the file names that checksums some .dsk file properties like name, size, inode number, etc, even bootsector. So if the original image gets moved, overwritten etc, you won't be using the "wrong" metadata with the new image. I don't see how that can be any difficult, actually. But it won't break the compatibility, and will leave the entire file for the guest OS disposal, like it should be.

If you were tasked with solving this problem you clearly would have solved it differently, but given that the problem as it currently is implemented is easily tolerated there really isn't a big problem here. You could use the Bob's 3.x code which probably would meet your goals of interchange media between simh and the FPGA which certainly doesn't yet have all of the devices in the 4.x PDP11 simulator.

if there were any reasonable big endian systems available today

Then I don't understand why you created your label using the byte conversions as if it was to be used between the CPUs of different endian-ness. That's inconsistent, as best.

Byte conversions were used specifically to support the big-endian case. simh data (disks and other things) is supposed to be endian independent and allow interchange. I believe this goes back to when big-endian systems were relatively common. The code currently happens not to have been tested recently, but with sufficient motivation it could be tested and fixed if errors actually exist.

@al20878
Copy link
Contributor Author

al20878 commented Jul 29, 2021

I do not have to transfer disk images to SD in order. I can do so randomly. I used simh to prep an image, then write it at a specific offset. The extra sector damages the integrity of the next image on SD. That is the problem. The container size changed from what it used to be (either lesser or equal to the drive it represented) but has gotten bigger, and that's not good.

I'm not sure what problem you're describing here. If a user created a container and put a file system on it the file system

The user brought a .dsk file with a filesystem on it, created previously. simh not knowing what there was, labeled it at a wrong offset because the user made a mistake attaching it to a wrong drive. The .dsk file is now ruined.

The meta data is absolutely outside of the bounds of the user data in the disk container,

It's not true because simh cannot know for sure what it is there, on the disk. See above. Metadata has to be physically separate for the data operations to be safe. Let the guest OS handle the image data, throughout.

sim> diskinfo rl3-rl02.dsk
Container:              rl3-rl02.dsk
   Simulator:           PDP-11
   DriveType:           RL02
   SectorSize:          256
   SectorCount:         40960
   TransferElementSize: 2
   AccessFormat:        SIMH
   CreationTime:        Wed Jul 28 17:19:05 2021
Container Size: 10,485,760 bytes

So why this "pretty" information can't be pulled from some place else other than the .dsk file itself? I was just suggesting the files (IMO the simplest), but it can be whatever. A registry, a database, but it has got to be separate from the original .dsk image. Not mere logically within, but outside of that file.

simh data (disks and other things) is supposed to be endian independent and allow interchange.

You contradict here with your own code! The new "footer" (which I call simh disk label here) is written in big-endian (network) format, yet the fs detection code, in the very same source file, does not even care to do anything endian-agnostic, and is all very much little-endian. Your 20 y.o. sparc machine can demonstrate you that immediately: I can guarantee you, no filesystems will get recognized. If, however, as you mentioned previously, simh should no longer be concerned about the big-endian arch, then writing just the "footer" in big-endian form is rather weird.

If you were tasked with solving this problem you clearly would have solved it differently, but given that the problem as it currently is implemented is easily tolerated there really isn't a big problem here.

Sadly you seem to be unable to think it out of the box, and fail to acknowledge there is a problem. There's very little I can do about it, but I tried. I'm sure this will be brought up again as folks gradually upgrade their simh binaries and realize the simulator is now messing up their (existing) images.

@AK6DN
Copy link

AK6DN commented Jul 29, 2021

I wholly agree with Anthony's argument. SIMH should NOT modify disk images by default unless allowed by the user.
I have lots of legacy images, many that I mount READ ONLY. I don't want the O/S to change them, nor do I want SIMH to.
I have no problem if you add a new flag to the ATTACH that says allow this to happen. But it should be disabled by default.
Or add a new command to SIMH to enable it, like SET SCREWWITHMYIMAGES enabled or some such. It should be default disabled.

I haven't seen this issue yet (I am still back on an April commit).
I am still not sure who the target user is for this 'feature' and why it is necessary.
I use a similar approach to images as does Anthony, dd'ing raw .dsk files direct to physical blocks on an SD card.
I just expect the image .dsk to be my raw data.

My RX01/02 emulator expects .dsk images (on a FAT filesystem) to be EXACTLY 512,512 or 256,256 bytes to be valid images.
Is this going to screw with them too?

My 2c.

@al20878
Copy link
Contributor Author

al20878 commented Jul 29, 2021

Don, thanks! If you attach your images read-only (as with the -r switch), they won't get the "footer" written to them (and that is the only exception). But if not, and even if you logically mount the image read-only in the guest OS, or if the OS does not ever write anything into the image, then any disk attached will get the 512-byte appendage right away, no questions asked. This also underscores the expected fact that simh does not actually need that "footer" in order to handle the drive perfectly fine.

Ironically, having the metadata separate as suggested, would have still let simh create the metadata even for read-only images (which it can't do now), to warn people on their next attach that they were doing something different:

Warning: Inconsistency detected: This image was last attached as ...... If you want to silence this message, please remove .simh/disk_image_098235763589WRYTDFHJGSF.mds

Yet the main point here was that aside from anything, USER DATA MUST BE RESPECTED. The .dsk containers are pure disk images (not necessarily even created by simh!), portable between different systems, and must remain so (or if not -- then at least given an option, with a big fat warning that it's changing, to politely decline). But this change seems to only cater to some "forgetful" users of simh, totally disregarding the fact of breaking the compatibility and creating a nuisance for others. Who cares, right?

Having the metadata separate would still be able to help the weak-minded individuals just the same with reminding them what attachment they used the last time, and would not even call for another set command (SET SCREWWITHMYIMAGES), or new attach command switch, or any user consent, for that matter.

So far the solution offered here was to switch to the earlier 3.x version. Hmm, really?

Here's what I did, instead:

$ truncate -s -512 mydamagedimage.dsk

for every disk image simh mutilated. And I inserted return SCPE_OK; as the very first executable statement in the store_disk_footer() function (located insim_disk.c, around line 2237 as of the current revision). I'll carry this change forward in my local repo, thanks to git rebase capability, until this is rectified in a sane way.

@al20878 al20878 changed the title simh corrupts disk image files by adding some signature silently simh corrupts .dsk image files by adding some (unwanted) signature silently Jul 29, 2021
@Rhialto
Copy link

Rhialto commented Jul 31, 2021

For dding disk images to raw sdcard storage, you can avoid the problem of the "footer" by specifying to dd a specific number of blocks to copy. Especially if you dd to somewhere "in the middle" of the sdcard that seems prudent to me, because you might be copying a wrong file, which happens to be much longer than the one you intended, and it would overwrite even more than just the boot block of the next image in the sd space.

@al20878
Copy link
Contributor Author

al20878 commented Jul 31, 2021

dd a specific number of blocks to copy

I'm well aware of that, and if you read the thread you'd see that I said:

I know that I can use the transfer size with dd but it was never necessary, because the containers were correctly [under]sized.

But that is not exactly the point of this discussion here. The problem is that simh began meddling with the contents of your files on its own (and worst part, without any permission), which formerly was solely the job of the guest OS -- but that was always only with whatever you instructed the OS to do. It's your data, and only you modify it the way you want / need.

simh can poke around and "guess" the containers' contents all that it wants, and report that information back (as a "filesystem found") but it cannot act upon that the intrusive way, as it is implemented now (no warning, no question, no nothing). The only thing it might be concerned about, can be the size of the container exceeding the capacity of the drive it's being attached to. And even then, a mere warning will do by saying that data beyond the capacity limit won't be accessible. Not even an error, IMO.

What seems "prudent" to me is that simh should not try to be the authority of your data, especially when it cannot actually figure the contents with 100% accuracy (and frankly, it should not even be bothered to do such a job as a hardware simulator, what strictly speaking simh is!). If there's a need to "track" the user actions' "correctness" or consistency, simh is welcome to do so in a totally separate file space -- but again, NOT THE CONTAINERS themselves! Lastly, if the urge is that the containers must be keeping that information, then there should be a clear and explicit consent from the user for allowing the simulator to mess with the data:

sim> attach rq myimage.dsk
CAUTION: "myimage.dsk" needs to be converted to a new format that would include metadata about the container
itself added as an extra sector past the logical end of the drive (per the best estimate of the simulator). 
We believe, this adjustment will improve the robustness and consistency, but it will also make the container
incompatible with implementations that expect to see only pure raw data in the .dsk files.
For more information please see: (link to documentation).
DO YOU WANT TO PROCEED? [N]

How happy would you be, Rhialto, if you inserted a thumb drive with some pictures (that you wanted to show to your friends) into your computer, which would figure it out (Ah, pictures!), then would decide on its own that the balance of white was not optimal, and finally would quietly go ahead and fix them by adjusting the palette (not even the actual "scene" data, it's logically separate) for you without you even knowing? And, oh, all that with also changing the sizes of your pictures' files and dates that they were modified (i.e. taken) -- without you even touching them with any modifications of your own -- just showing them to your friends. How cool is that?

@markpizz
Copy link
Member

How happy would you be, Rhialto, if you inserted a thumb drive with some pictures (that you wanted to show to your friends) into your computer, which would figure it out (Ah, pictures!), then would decide on its own that the balance of white was not optimal, and finally would quietly go ahead and fix them by adjusting the palette (not even the actual "scene" data, it's logically separate) for you without you even knowing? And, oh, all that with also changing the sizes of your pictures' files and dates that they were modified (i.e. taken) -- without you even touching them with any modifications of your own -- just showing them to your friends. How cool is that?

Please note:

  1. Your thumb drive with pictures example is irrelevant to this simh discussion since absolutely nothing visible to things running under simh is in any way affected by this change.
  2. YOU are the one attaching a particular disk container to a particular simulated drive and thus passes management of the container to simh.
  3. simh changes NOTHING about what ANY simulated OS running under it sees about the drive in question. You and whatever happens within your simulated OS remain the total authority of your data.
  4. Once you leave the simulator, YOU and ONLY YOU corrupt data some place else when you copy the extra data that simh now appends to the container.
  5. YOU wish that such a change wasn't made since YOUR EXTERNAL USE of the data in the container presumed nothing would be added (as changed 15 months ago). Sorry to require you to change your external use of the container.
  6. YOU have complete choice as to how you want to handle the contents of the container to manage the fact that there is additional meta data now in the container when you use the contents of that contain in some COMPLETELY EXTERNAL way. It is ONLY your external use of that container that is negatively affected by the additional data.
  7. You could use the ZAP container.dsk command to remove any simh appended meta data from an unattached disk container file which won't affect any container that doesn't have additional simh metadata as opposed to truncate -s -512 mydamagedimage.dsk which will chop off the last 512 bytes of the container each time it is executed.

@al20878
Copy link
Contributor Author

al20878 commented Jul 31, 2021

Please note:

Please note that YOU made my container incompatible with what it used to be. That simple.

thus passes management of the container to simh

Yes, but only on behalf of the operating system that is running under the simulator! I haven't given simh any permissions to change the container on its own volition, have I?

some COMPLETELY EXTERNAL way

It's called "file format", "compatibility", "interchangeability", whatever, and it's broken now thanks to this change, with some sort of a stubbornly-manic and centric superiority of simh over other software that can be also used to deal with this data, and complete disregard of the consequences of these actions.

@al20878
Copy link
Contributor Author

al20878 commented Jul 31, 2021

I am done beating a dead horse here, since like I said, you are unable or unwilling to acknowledge the problem that the "managing" of the container in the way it is implemented currently, is unacceptable by any standard. I don't know who tasked you to code it the way you did, and who you discussed it with prior to doing so, but I am sure that had the discussion been brought to a broader light some 15 months ago, you would have heard a lot of objections, at least in the part of doing the additions to the containers all the way silently -- I am positive about this one!

You could use the ZAP container.dsk

I haven't checked how this is implemented (and I did not know it existed) but I'm quite sure it may not undo the changes in all cases, because if the "footer" was written past an artificial hole in the file, created as a result of reposition to the "logical end", the hole, and hence, the size of the file after ZAP, will remain unequal to the original. I'm not saying that the truncate command would be any better -- it won't be, but it worked in my case, and that's what I care for -- I knew which files were "touched" by simh, and I won't use it with other ones or twice.

@al20878 al20878 changed the title simh corrupts .dsk image files by adding some (unwanted) signature silently simh corrupts .dsk image files by silently adding some (unwanted) signature Jul 31, 2021
@markpizz markpizz changed the title simh corrupts .dsk image files by silently adding some (unwanted) signature simh silently changes.dsk image files by silently adding signature Aug 18, 2021
@markpizz markpizz changed the title simh silently changes.dsk image files by silently adding signature simh changes.dsk image files by silently adding signature Aug 18, 2021
@ghost
Copy link

ghost commented Sep 9, 2021

I agree that altering the contents of a file silently is a bad idea... but it wasn't my idea.

There's a simple solution: use V3.X.

@al20878 al20878 changed the title simh changes.dsk image files by silently adding signature simh changes .dsk image files by silently adding signature Sep 9, 2021
@al20878
Copy link
Contributor Author

al20878 commented Sep 9, 2021

@markpizz I still believe that what you called "change" (by modifying the issue title) is actually a "corruption", because as a hardware simulator, simh has no idea (and should not be even concerned!) of what the image contains, and writing anything in there corrupts that original data, in general.

@gtackett
Copy link

gtackett commented Oct 26, 2021

Please help me understand the prior comments on this issue:

@markpizz refers to the file in question as a container.
@al20878 refers to the same file as an image.

Maybe the two different terms could be implying different semantics:

  • An image file, as @al20878 probably sees it, should be an exact image (within the limits of the media, etc.) of a simulated storage medium.
  • I suspect that @markpizz sees a container file as a sort of wrapper including metadata around what @al20878 is calling an image file.

Does that sound accurate?

@al20878
Copy link
Contributor Author

al20878 commented Oct 26, 2021

@gtackett:
These two things (that you want to label as an image and a container) used to be identical things for the .dsk files, which were supposed to be the pure medium data. Changing the notion and function (an image becoming a container, all of a sudden -- and worst, silently) is not a compatible change. That is what this issue is actually about.

@Rhialto
Copy link

Rhialto commented Oct 27, 2021 via email

@al20878
Copy link
Contributor Author

al20878 commented Oct 29, 2021

The solution is of course, like the saying goes, to add another level of indirection.

First off, the .dsk files have been well-established to only store the pure data -- it's not the simulator's business to enforce or check what filesystem there is, which device it has been mounted previously etc etc for a .dsk file. Therefore, if "another level of indirection" is needed, it should be implemented elsewhere, not within the .dsk files.

Have the container (with metadata inside) be one file.

It does not have to actually be the same file, the metadata can be kept separately from the pure data perfectly fine. For example, in a dot-file or in a .simh subdirectory. This way the .dsk file is always accessible and can be transferred between systems (simulators) like you would do so with the real physical hardware. The way the metadata is stored is actually irrelevant, because it would only pertain to what simh is doing with it.

@Rhialto
Copy link

Rhialto commented Oct 30, 2021 via email

@al20878
Copy link
Contributor Author

al20878 commented Oct 30, 2021

"the image, which would be the other file."

That's not that straightforward: the simh's attach command expects a .dsk file as an argument, meaning that the respective "container" (i.e. the metadata file) should be somehow deduced from that information: the simplest, it can be a "hidden" file named after the .dsk file, kept somewhere "parallel" to it.

But the point remains the same: a .dsk file may not be changed with any frivolous data that simh itself (i.e. without the instructions from the guest OS) wants to keep about the original .dsk file.

@markpizz
Copy link
Member

What the simh program does with disk containers while they are being used by a simulator is completely in the domain of simh. If you or someone or some program want to access or otherwise manipulate the 'data' portion of the container external to simh there are supported ways to explicitly get to that. Feel free to either use the supported method or whatever else you may want to invent external to the simulator.

@markpizz
Copy link
Member

@cheater said:

Can you walk me through how a potential user would get into simh without technical knowledge? I don't want you to write a tutorial of some sort, but there are a few questions in particular.

That is really not my problem. I'm not in the business of selling simh to folks. However they manage to find simh they do.

  • a person just installed simh on their computer. How do they find out what files are necessary to run the operating systems?

In general, I suspect that they don't start by installing simh on the computer. Something else triggers their interest in retro computing and they search around and come across something specific that interest them. Then they maybe install simh or maybe start poking around with the many sample use cases or the hear about Oscar's PiDP-8 or PiDP-11 and just order one and before they even mess with simh at all. Then they start from that. Again most generally don't care at all how simh manages to work. They just care about running the various operating systems or other software that can be found for these systems.

  • how do they obtain them?

Again, not my problem.

  • let's take a single simple example. What files are included in what they'll download? I assume it's more than the .dsk file.
    once they have the files, what sort of other actions are necessary to successfully run software? Let's say I wanted to do something simple, like run a text editor and store a text file on the disk.

I suspect many dig way back in their mind to when they first came into contact with the system(s) that are being simulated, they remember how to do something on the particular system.

@markpizz
Copy link
Member

So, this discussion has been quiet for several days and no one has come along to describe how the current model negatively impacts something else they need to do, just the earlier opinions about how they would have done things differently.

@eschaton spoke with such authority when he said:

There already is a reasonable solution: Store the metadata in a sidecar file rather than in the same file as the raw disk image. This is how virtually every other system emulator handles raw disk images and is the general expectation when working with emulators. The fact that SIMH doesn’t work this way is surprising, a serious divergence from the norm, and causes way more problems than it solves. Just fix it already rather than continue to spend 10x the effort making excuses for it.

It isn't as if I decided out of thin air to store both the disk container data content AND the details about what it actually is in the same container. There are many examples of precisely this concept. Rather than just spouting out arbitrary claims (like @eschaton) without specific examples, I'll point at the following: ViritualBox has VDI, VMDK, and VHD files that can be disk containers with each having its necessary metadata imbedded in the respective container. Microsoft's VM tools (Virtual PC, Virtual Server, Hyper-V) initially supported VHD and now also support VHDX. And, simh v4.x has supported VHD container files since approximately 2011. VMDK seems to come from VMware. All of these containers have metadata stored in the same files as the simulated disk data.

One key goal of the appended metadata was to very specifically not interfere with the use of those containers in prior simh versions (v3.x) at least, and that in fact is the case. The default behavior for essentially all disks in 3.x was to explicitly mention the drive type in the configuration file and not attempt to autosize to determine drive types. Given that paradigm, any appended metadata has absolutely no effect on the behavior of the drive.

It all comes down to this handful of folks have deeply felt beliefs that this is a completely horrible solution and the world might soon end or their desks burst into flames with it implemented this way. :-)

Sorry, there is a working solution now, I'm not looking for ideas about how to do this differently. Since the folks with deeply held beliefs have the option to not have to deal with the meta data yet they still object to the current implementation paradigm, they clearly think others will be somehow harmed if their own idea isn't adopted. I don’t know what else to say.

@wboerhout
Copy link

  1. I will be using the sim> zap /vdisk/* command a lot. Like as the first line of all ini files;
  2. I know of at least two commercial emulators that use the "sidecar" option for storing metadata;
  3. The "working solution" would be even better with an "attach -[no]header" option with -noheader being the default.

@markpizz
Copy link
Member

@AK6DN said:

I wholly agree with Anthony's argument. SIMH should NOT modify disk images by default unless allowed by the user.

If a unit has autosize disabled, then no metadata is added to the disk container. The desired behavior is how simh 3.x defaulted, so configuring a drive that way will avoid the "problem".

I have lots of legacy images, many that I mount READ ONLY. I don't want the O/S to change them, nor do I want SIMH to.

Disk containers attached read only (-R) do not get meta data added.

I have no problem if you add a new flag to the ATTACH that says allow this to happen. But it should be disabled by default.
Or add a new command to SIMH to enable it, like SET SCREWWITHMYIMAGES enabled or some such. It should be default disabled.

I'm considering two things:

  1. Any explicit setting of a particular drive type would then set the drive type and disable autosizing for that drive
  2. A SET NOAUTOSIZE command to disable autosizing for all drives.

@wboerhout said:

  1. I will be using the sim> zap /vdisk/* command a lot. Like as the first line of all ini files;

That will absolutely work.

Is this your approach wholly based on principle or do you have an explicit need to move containers between different simulators (others not being simh) or an explicit need to attach unexpected sized disk containers to different drive types?

  1. I know of at least two commercial emulators that use the "sidecar" option for storing metadata;

I didn't say it was impossible. I just pointed out that metadata contained in the same container file as disk data wasn't unprecedented. Note that I gave explicit examples of these cases. Both you and @eschaton haven't actually provided any specific pointers to your references that manage metadata without setting your desks on fire.

@vrs42
Copy link

vrs42 commented Feb 14, 2022

This has finally got me riled enough to figure out how to comment.

I have two major modes in which I use SIMH. One is where I am effectively doing development in a simulated environment. In that mode, the metadata isn't an issue; I'm modifying the disk images anyway, and their prior content is not of interest.

The other major mode though, is that I archive hundreds of imaged historical media. A few of these are "releases" of the above efforts, but the large majority are meant to form a historical archive.

It is absolutely useful to attach these to SIMH, for various forensic purposes. It is not acceptable to modify them. That said, I can do as Wilm mentions, and adjust my workflows to preserve these images. It is just a pain. (Mostly it creates a new workflow to ensure I didn't screw them up by accident.)

One other comment is that I don't really buy the argument that only a few outspoken critics are complaining. I believe it is in the nature of tools like SIMH, that they are complex enough that the majority of the "users" are using the product in a canned way, created and maintained by a small cadre of individuals who have taken the time to learn the tool in depth. (I certainly am not an in-depth user of github, for example. I mostly use it by following someone's walk-through.) In that sense, I fear that a significant fraction of your hard-core user base (including me) is inconvenienced by this.

Vince

@wboerhout
Copy link

I do need (well, want, no lives depend on it) to move containers between simh and a commercial emulator. But also, I need to move containers between Q-Bus or Unibus controllers on different simh instances that support Q-bus or Unibus. This is how I noticed that I need to do extra things where before I did not.

I have not been a fan of AUTOSIZE from the start, because I (used to) know disk sizes of most RD and RZ disks by heart, and I tend to set explicit disk types anyway. And, correct me if I'm wrong, set RQx NOAUTO does not prevent adding the header when the disk is created by the attach.

The metadata is not unimportant. But, growing up in the VAX era made backwards compatibility important to me. New code and new features should not break old scripts / workflows.

Wilm

@al20878
Copy link
Contributor Author

al20878 commented Feb 14, 2022

I'm missing something conceptually in what you 're saying.

Of course you are! The .dsk file can be a result of random writes to it. Meaning that there will be gaps (that read as 0s) in between the sections of the file that have actually been written (including the pure-zero sectors, which will be present).
Adding a signature and then stripping it, may not result in the file of the same structure, because the end of it will probably get modified with the procedure. Yes, contents-wise you may be able to re-create a file that compares identically to the original, but host filesystem's view of this file may be very different, so for example, the file may become a subject to a backup as it is viewed as "modified" (even if the guest data it contains remain all the same).

I will be using the sim> zap /vdisk/* command a lot. Like as the first line of all ini files;

That will absolutely work.

No, it won't! It's the "attach" command that mutilates the .dsk file; so using "zap" at the first command in the .ini file won't prevent a new signature added, later on in the startup sequence. Also beware that depending on the simh codebase "zap" can actually "bite off" some of the original sectors from the image, along with the trailing signature -- as there was a bug there, previously.

@eschaton spoke with such authority when he said:

And you are saying with another sort of authority that your solution is the best one. Well, if it was, we won't be battling with you here, in an attempt to persuade you that you were wrong with the current approach. All those references to other formats used elsewhere are MOOT because they were designed from the get-go to contain heterogeneous information, pure data and meta-data altogether mixed-in yet logically well-isolated. The .dsk files are pure data only. There's no way to logically isolate the appendage you are authoritatively writing to it from the rest of the data!

and the world might soon end or their desks burst into flames with it implemented this way. :-)

Your sarcasm comes from inability to think outside the box. I warned you at the top of this thread that the issue was going to become recurrent, with more and more people getting frustrated with your implementation. There's nothing to laugh about now.

Again, not my problem.

That's what we heard a lot lately, actually, from different aspects of this discussion. Thanks for using the plain language, at last. Now we know where we all stand with all our suggestions.

Also, if that also applies to new users, why are you so adamant of catering to them this idea of the resident metadata?

@drovak
Copy link

drovak commented Feb 14, 2022 via email

@pghardy
Copy link

pghardy commented Feb 14, 2022 via email

@markpizz
Copy link
Member

@wboerhout said:

I do need (well, want, no lives depend on it) to move containers between simh and a commercial emulator.

And you've determined that the presence of the metadata actually negatively affects operation of this other emulator?

But also, I need to move containers between Q-Bus or Unibus controllers on different simh instances that support Q-bus or Unibus. This is how I noticed that I need to do extra things where before I did not.

There have been some bugs in the interpretation logic of the metadata - independent of where it is stored, that have been fixed. There still potential issues mixing containers between RQ and SCSI, but that has nothing to do with between Unibus and Qbus. Disks attached to RQ devices work on any Qbus or Unibus system that have an RQ controller. Likewise for the other disk containers present on the common controllers these systems supported (RP, RL, etc.).

... correct me if I'm wrong, set RQx NOAUTO does not prevent adding the header when the disk is created by the attach.

That is true. Please explain what valuable user data exists in newly created disk container file. Just in case you can think of some, feel free to ZAP and all of your data will be preserved. :-)

The metadata is not unimportant. But, growing up in the VAX era made backwards compatibility important to me. New code and new features should not break old scripts / workflows.

Unless you've got evidence that your unnamed commercial emulator misbehaves in the presence of the metadata, no workflow is broken. If you encounter bugs or other problems in the metadata handling in simh, then I'll be glad to fix real problems in the logic.


@drovak said:

If this is such a desirable addition to SimH, where are the other users that are coming out to defend its use?

Well, this reminds me of some words from a book we read to my kids: The words were: "I am the Lorax, I speak for the trees, for the trees have no tongues"

In this case, the trees are everyone whom hasn't been involved enough to have to dig deeply into the simh world to get under the covers.

He then said:

.... 1. Disable it by default

That would completely remove benefit for the above mentioned trees and require that whole community to dig far more deeply than they ever need or expect to go.

  1. Use a sidecar file

That would again burden the trees to have distinctly deep internal simh knowledge to find the sidecar and carry it around to where ever the container gets moved to.

Meanwhile, I've struggled with why the community of negative speakers here are so much against adding a single line one time to one file on their system. Specifically change the simh.ini file in your home directory to contain: SET NOAUTOSIZE

Wait a minute, I just realized and checked that the whole concept of simh executing .ini files at startup has never been formally documented. Bob's original "SIMH User's guide" didn't specifically mention startup command file execution and as such the changes to that area never got added.

The key change is the addition to the document was:

When a simulator starts execution, the following sequence of simh command
files are executed if they are found:
    1. If a file named simh.ini is located in your HOME directory, it is
        executed.
    2. If the simh.ini file in your HOME directory isn’t found, a file named
        simh.ini in your current working directory is executed if it exists.
    3. If the simulator is invoked with any arguments, then the arguments are
        presumed to be a command file and possible arguments to that command
        file which is executed.
    4. If the simulator is invoked without any arguments, then a command file
        with the same name as the simulator binary with .ini appended that is
        located in the current working directory is executed.

Note, that up to 2 separate command files may be executed on simulator
startup.  The simh.ini file allows the user to define local user
preferences that align with their personal goals for simulator execution
across all simulators that may be used on their system.

Steps 3 and 4 were inherited from simh v3.x.  Steps 1 and 2 was the result
of conversations with J. David Bryan in April of 2012.  Some how it
never got documented in the ensuing 10 years.

I'm not going to change the simh default behaviors around autosizing, but anyone in the burning desks club can modify their own default simply enough that it might seem like arguing for the sake of arguing. Who knows?

I am absolutely interested in any bugs in the interpretation and usage of the information in the metadata that folks encounter while using simh.


@vrs42 said:

,,,, The other major mode though, is that I archive hundreds of imaged historical media. A few of these are "releases" of the above efforts, but the large majority are meant to form a historical archive.

It is absolutely useful to attach these to SIMH, for various forensic purposes. It is not acceptable to modify them. That said, I can do as Wilm mentions, and adjust my workflows to preserve these images. It is just a pain. (Mostly it creates a new workflow to ensure I didn't screw them up by accident.)

If you attach any of these containers to SIMH, the operating system running in the respective simulator can readily change the contents (unless you attach them read only). Read Only attaches don't add any meta data.

If you really want to protect the source images (for archival sake), in simh, for essentially the past 10 years you could:

sim> SET RQn RA82 (or whatever)
sim> ATTACH RQn -C temp-working-copy.dsk archive.dsk

The above will copy the archive.dsk to the temp-working-copy.dsk container as a VHD (thus minimizing space consumed), but if you wanted the temp-working-copy.dsk container to be in SIMH format, merely add:

sim> SET RQn FORMAT=SIMH

before the ATTACH command.

Sure this is a change to your workflow, but it isn't due to meta data and you might not have known or considered it and it might actually be useful.


@pghardy said:

I don’t know how the changes will impact me going forward, but the changes have adversely affected my use of SimH in three ways:

  1. a CDROM image file from a VAX3900 had metadata added which prevented it being read on a virtual CD drive on a VAXstation 3100. I had to dredge through historic archives to find a non-corrupted version to restore.

This was a bug that got fixed as soon as you reported it, and the original CDROM contents were restorable with ZAP.

  1. Various disk image files which were originally much smaller than the maximum size because only 20% of the drive had been allocated and used by VMS, got enlarged to full device size. For one disk this probably doesn’t matter, but I like to keep multiple backup copies to save temporal state, and this now takes many gigabytes of unnecessary space.

Storage is cheap, but I agree that's not a great excuse to copiously waste it. You could migrate these containers to VHD's (see example above) and they would be almost the original truncated size (potentially smaller in some cases). The smaller potential comes from the fact that unless you play special consideration, VMS's INITIALIZE command tends to locate the ODS2 home blocks around the disk and index file near the middle of the volume. Any basic INITIALIZE activity beyond the beginning of the disk will leave 0 sections of the container just sitting there. When this data gets migrated to a VHD, any 1MB stretches of the disk that only contain 0's don't take up space in the VHD container.

  1. I’d like to move several SimH disk files from a VAX3900 to a MicroVAX 3100 simulation, in order to match historic licences. Originally only the 3900 with DU devices was available, so the disks were set up on that. However if I attach the disk file to the DK device, then the new sizing system objects.

I think you meant RZ rather than DK, where RZ is SCSI, or maybe you meant the RD controller on the MicroVAX2000/VAXStation2000.

As I said previously, more general purpose SCSI support is coming and SCSI should then interoperate with MSCP drives fairly well. Until that time, the above mentioned copy mechanism will allow easy access to the data on a new automatically created container.

Meanwhile, the MicroVAX2000/VAXStation2000 RD device supports various DEC RD53, RD54, etc. Oddly enough, even though these drives had the same names as the ones connected to the RQDX MSDP controllers, each of these same named drives are different in size depending on RQ vs RD controller and as such, the file systems are different sizes. This is a case of the meta data protecting you from yourself without you realizing it. There are ways to get around the problem, but actually understanding what is going on matters.

  • Mark

@al20878
Copy link
Contributor Author

al20878 commented Feb 15, 2022

@pghardy

this should have been implemented (if at all)

The "if at all" part becomes more apparent now, when all the long explanations of @markpizz were laid out above:

The metadata may not even be needed / created and / or used in a plethora of use-cases except for "autosize" but then again, the simulator somehow creates the metadata even for that, out of "thin air", and so it's also not at all critical and can be recreated just like that on-the-fly, and then used solely in-core of the simulator. So why is it there then in our disk images?

It looks like @markpizz on his pathway to becoming an arborist, is obviously missing a few very important points:

  1. whoever is new to the simulator won't start using it by moving the disk images around, so maintaining (or keeping in-sync) the sidecar metadata is not an issue;
  2. adding metadata to existing images (in addition to other negative impacts people were mentioning above) also distorts the well-known drive capacities that more experienced users know their images and devices by;
  3. guest operating systems often include options to mount file-systems as logically read-only (while the device is actually not H/W write-protected) so whoever is doing that, does not expect their disk images to change (regardless if the image was attached read-only or not to the simulator) -- but simh will mutilate the image in the latter case, just because;
  4. at this point, it looks like the entire idea of the metadata was some sort of a self-invented bookkeeping exercise, and ironically, bookkeeping in a separate file can do a lot more magic: it can be done for all sorts of images, read-only or not, modified or not, even for CD-ROMs, tapes and what-not, and yet never disturb the pure user data! Seems like a win-win but obviously the wood thinks otherwise.

@markpizz

I'll be glad to fix real problems in the logic

There's a huge one already: good software never modifies user data behind their back. That's the number one law in the software development. Regardless of the intent.

@friesga
Copy link

friesga commented Feb 15, 2022

For me this issue and its solution are perfectly clear. A raw disk image is in a certain format, let's call it the "raw disk format". When additional meta data is added to such an image, the format of that image changes and it no longer is an image in raw disk format. That means that there should be (at least) two formats: a raw disk format and a second "signature format".

If simh wants to add meta data to a raw disk image it could (and should) ask the user if it wants to upgrade (i.e. copy) the image to the new "signature format". This way the original raw disk image is preserved and the advantages of the "signature format" are available for those wanting these. The signature format could be versioned, allowing changes when the need for more additional meta data arises.

@pghardy
Copy link

pghardy commented Apr 1, 2022 via email

@asbestomolesto
Copy link

To keep it SIMPLE, this is how this unuseful thing impact our work at our Computer Museum:

  1. we have a VAX/VMS system, we want to preserve / study it, so we make a copy of the hard disk simply using dd_rescue

  2. we boot this image simply with simh: everything works FINE, no need for any thing added since this is working good from years

  3. THEN NOW simh decided to write unuseful stuff on our disk image. at the end, at the beginnin, it doesnt matter because

  4. we were used to edit a disk image via simh, make some adjustments and then copy the disk image BACK on the ORIGINAL SYSTEM using dd or ddrescue or whatever.

  5. If we do that with the now "edited" disk image, the original vax/vms will give us hard times, refusing to boot, spitting errors and so on.

So yes, this is a great problem for us as this unwanted feature is creating only problems for our preservation / restoration efforts.

I think is't simple to understand.

Does simh need this kind of data?

Create an external file. Problem solved.

Asbesto
"Museo dell'Informatica Funzionante" Computer Museum

@cheater
Copy link

cheater commented Apr 2, 2022

@markpizz there are now many stories here of users telling you how this unfeature messed up their files and made their work hard. Too many stories to count. You admitted that it's only useful in a very narrow situation - during autosizing. There is literally no one here defending your solution, other than you. You are coming off as a bad actor here.

If this is a project just for you, make the repository private and keep committing to it so you can use it. If it's for everyone, then listen to everyone, not just yourself.

@textfiles
Copy link

Today I was dragged into this discussion by @cheater, who utilized twitter to begin pinging individuals with a scant connection to this project, including historians and documentarians, in some sort of attempt to accomplish something. I assume the something was to shame/brigand @markpizz into reversing a decision they seem rather firm on having made.

This went as well as one might expect - after not finding refuge under my rough take on the thread, cheater moved to harassment, assembling further people to come to my account and debate what the meaning of computer history is and how I can best present myself to the world. All of this predicated, of course, on the fact that I do not see the point of dragging in more individuals into what is absolutely a code and feature discussion.

The result is I am still getting direct messages on twitter attempting to neg-push/shame me into admitting it was a bad idea not to throw my weight behind criticizing/harassing @markpizz. I am primarily posting this comment should individuals with boundary issues continue to attempt to blindfold-recruit underinformed individuals to join the "fight". My advice is not to.

However, while I'm here.

I've had enough experiences with both open-source projects and computer history/preservation situations to know that there are occasionally cases where the feature sets of tools do not match the need of institutions. Institutions have two choices: contribute coders/funding to maintain a fork or feature set within the project to handle their needs, or add additional steps to their process to work around the perceived flaws of the tools.

The debate, therefore, is neither new, unusual, or something requiring a brigand/sea-lioning of anyone classified as standing in the way of the "battle" for preserving software.

Attempting to lay a "professional/unprofessional" patina on this pulls a lot of weight away from how any such preservation project should be conducted. Mistrusting open-source tools, maintaining pristine originals before conducting forensics/analysis and additionally logging the processes handled to ensure fixity are all basic operations in software preservation.

Either the SIMH project has contingency/strength to handle debates about project direction/management, or it does not. However that all shakes out:

Thanks to all the contributors to the SIMH project over the years.

@markpizz
Copy link
Member

markpizz commented Apr 3, 2022

Thanks for the heads up.

@cheater has now been blocked from this organization and all discussions hence forth. He's never actually contributed anything but criticism. He attributes this lack of positive contribution to contractual commitments with his employer.

Meanwhile, the issues raised here are actively being worked on to best address both my goals and the broad needs of the user base and the various problems/bugs that have been reported here.

@pghardy mentioned these bugs/problems:

att rz0 Disks/HARDY1DISK0.VAXDSK
%SIM-ERROR: RZ device: Non-existent parameter - RA90
%SIM-ERROR: RZ0: Cannot set to drive type RA90
%SIM-ERROR: RZ0: 'Disks/HARDY1DISK0.VAXDSK' can only be attached Read
Only

As I mentioned a while back, this issue (attaching containers to dissimilar controllers) was a feature that would be implemtned in the future (very soon now), so in the interim, several suggestions were provided to create compatible containers or to avoid autosizing.

Why does SimH want to insist that the disk files be readonly? If they
are valid enough to read, (the disk blocks mount OK in VMS) they are
valid enough to write!

As I just said this was a quick compromise to allow access to the data before full support was available.

I tried using ZAP to remove the metadata, which seemed to work the first
time, but when I went back to the 3900 to prepare VMS for the different
device names, my ZAP hung processing one of the disks. I killed the
process, but then the disk wouldn’t mount in VMS  (no boot block). I
restored the disk image from backup, and have gone back for the moment
to just using the 3900.

Using ZAP was a reasonable work around, but you probably did a ZAP -Z.
That causes the zap logic to walk backwards from the back of the file removing sectors containing 0's. This check is done one sector at a time and thus will be slowed down to 1 sector removal per disk revolution, which on a large disk container with written data only near the front of the container can take a long time. It's reasonably fast on an SSD, but otherwise not. This was silently happening slowly which is why it seem to hang. The code that does this has already been changed in the repo and the windows binaries so that progress is reported as this is happening and the appearance of a hang won't happen anymore.

1) a CDROM image file from a VAX3900 had metadata added which prevented
it being read on a virtual CD drive on a VAXstation 3100. I had to
dredge through historic archives to find a non-corrupted version to
restore.

This but was fixes shortly after he mentioned it specifically months ago. Since that initial fix, ISO 9660 (and any .iso file) attaches will avoid any metadata additions and will be done read only on any device type.

2) Various disk image files  which were originally much smaller than the
maximum size because only 20% of the drive had been allocated and used
by VMS, got enlarged to full device size. For one disk this probably
doesn’t matter, but I like to keep multiple backup copies to save
temporal state, and this now takes many gigabytes of unnecessary space.

If you created disk containers with any version of simh from the github.com/simh/simh repo any time in the past 10-12 years, your disk container would have been the size of the drive you were creating it on. disk containers started a size 0 with simh 3.x.

Stay tuned for best practice to achieve minimal storage impact.

3) I’d like to move several SimH disk files from a VAX3900 to a MicroVAX
3100 simulation, in order to match historic licences. Originally only
the 3900 with DU devices was available, so the disks were set up on
that. However if I attach the disk file to the DK device, then the new
sizing system objects.

Stay tuned

@ghost
Copy link

ghost commented Apr 3, 2022

I've already had my say on this issue. I agree with textfiles. If you don't like SimH V4 behavior on metadata, you have multiple ways to proceed:

  1. Fork the project. I've already done that, with Dave Bryan (the HP simulator author), by dropping back to V3.X.
  2. Modify the sources to disable the feature before compiling.
  3. Create a .INI file that turns off autosizing on all disks of interest. Based on earlier comments, that will disable metadata creation. (Mark, please confirm.)

Mark added the metadata capability because SimH V3 could and did create sub-sized disk containers that were only as long as the data written, and that could and did confuse the autosizing logic. It was primarily a problem for disks that didn't follow the DEC standard for bad blocks, such as MSCP disks or DECsystem-10 disks. SimH V3 made no attempt to protect users from themselves. V4 does.

One idea I have not heard suggested: decommit the autosize feature and the metadata feature in tandem. I added autosize to deal with the RK06/7 and RP04/6 on the VAX and PDP11 - where it doesn't fail, because the DEC STD 144 bad block table requires a full sized image. But it does fail in lots of other cases, and it isn't very useful. So why not just drop it, and drop the metadata feature too, and everyone lives happily ever after?

@rcornwell
Copy link
Member

How about this as a solution. Make sure all disks support the -n option which creates a blank disk. You can also add a -l or -m option to label the disk with metadata indicating the type of disk and the size. On attach check if the metadata exists and use it. If it does not do nothing! Use the size and attempt to do the best you can. This means that people bringing disks from other simulators or actual hardware will not have their data changed without their permission. And when new disks are created the metadata can be added if the user wishes. Also if the controller does not support autosizing then no metadata will ever be attached.

@markpizz
Copy link
Member

markpizz commented Apr 3, 2022

Thanks for these wonderful suggestions which you are welcome to use on any simulator you work on.

Meanwhile, the design for the subject of this issue is long done and about to see the light of day.

@sirocyl
Copy link

sirocyl commented Apr 3, 2022

As I understand it, these footers were intended to solve a problem, and it's a problem that would only really occur:

  • at the stage of capturing/preserving/retrieving existing storage media (where a new image is created; the metadata would be valuable to capture and add at this stage, and valuable to SIMH to use);
  • by a librarian, technician or curator, whose job it is to correct the metadata of an existing image which, while having a complete image of the source media, is missing important information about its format or type specificities
  • by a simH user, at the stage of setting up a new virtual disk - and therefore, making a new, blank disk image.

For everything else (existing raw disk images, physical media as block devices) the auto-detect/auto-size functions work, right?

Meanwhile, the design for the subject of this issue is long done and about to see the light of day.

I'm curious to see what the solution is here, but I'm glad to see one has been worked out. 🙏

@rcornwell
Copy link
Member

@markpizz That was uncalled for. I was trying to offer a suggestion that would make all parties happy. Perhaps if it is taking time to do these fixes, development and testing should be carried out on a branch, rather then in the master branch.

As a maintainer your job is to fix bugs, manage commits from contributors, and make enhancements as requested. You appear to be moving more into the role of author.

@markpizz
Copy link
Member

markpizz commented Apr 3, 2022

@markpizz That was uncalled for. I was trying to offer a suggestion that would make all parties happy. Perhaps if it is taking time to do these fixes, development and testing should be carried out on a branch, rather then in the master branch.

The meta data functionality was in the master branch and working in simulators coming up on 2 years ago. The first comment about it came some 15 months afterwards. As I've said already, I'm not looking for suggestions for design from scratch ideas, so no insult was intended just clarification of the state of things.

I've already provided a a very simple means to allow the key complainers about the metadata design to avoid the meta data addition to their disk container files (SET NOAUTOSIZE) and a way to remove meta data from containers which may have inadvertently acquired it (ZAP 'container.file'). SET NOAUTOSIZE can be made a personal default by including this command in the ~/simh.ini. These accommodations have seemed not to be sufficient for the couple of primary complainers.

Note the complaints haven't actually been "too many stories to count", just because the same guy repeats his story many times does not actually add to the count. So far, there have been bugs identified and fixed (or being worked on) and 2 distinct stories about cases where someone was using a simh disk container outside of simh and needed to change their external use due to the meta data. The first case is easily fixed by either SET NOAUTOSIZE or by modifying the external procedure slightly.
The most recent of these cases by @asbestomolesto is less than clear that it really was a meta data problem since dd in from a physical disk to a container which then had 512 bytes of meta data appended to the end by simh and then dd out back to the same disk would have run out of places to write by the time meta data was encountered and thus wouldn't have been written to the output device.


As a maintainer your job is to fix bugs, manage commits from contributors, and make enhancements as requested. You appear to be moving more into the role of author.

I find it odd that in addition to the other roles you mention, you're now just realizing that I've taking on the role of author.

The codebase in the simh master branch (just in the scp.c, and sim_*.c,.h files) has expanded by a factor of between 3 and 4 since departing from v3. The vast majority of those changes were authored by me in addition to new code and enhancements directly to code in a number of simulators.

@asbestomolesto
Copy link

The most recent of these cases by @asbestomolesto is less than clear that it really was a meta data problem since dd in from a physical disk to a container which then had 512 bytes of meta data appended to the end by simh and then dd out back to the same disk would have run out of places to write by the time meta data was encountered and thus wouldn't have been written to the output device.

Yes, my bad, I was not clear about that. I was talking about putting back the disk image (used in simh) not on the original disk (that obviously take care of the "problem" if the added data is at the end) but on an hardware disk emulator (like scsi2sd for example) in which you dump the disk image on another modern media (usb, sdcard etc.)

For what I understood, the SET NOAUTOSIZE is helpful, so TY, I will try that :)

@al20878
Copy link
Contributor Author

al20878 commented May 27, 2022

In the light of #1163 this is going to be my last post to this repo in the shape and form under the new "terms" (unless either they are dropped, or the project is forked out as FOSS again elsewhere). I accidentally learned about this new licensing fuss just yesterday as I was about to submit a yet another bug report (and a patch for, ironically, sim_disk.c) but I'm not going to do that anymore...

The first comment about it came some 15 months afterwards.

Now this proves to be a complete lie. You had been told about it by others much earlier (but I was unaware, not on that list):

https://groups.io/g/simh/message/303

The "author" here cannot understand that the success of open source comes from users' feedback... And a lot of that feedback (resulting in hundreds of commits) comes in a form of bug reports and bugfixes, even if that looks like as a discussion rather than a patch or a piece of code. (Ironically, if one has ever tried to submit the latter, they would have probably noticed that @markpizz, out of his extreme vanity and possessiveness, was to reword the patch and almost never to apply it in the form suggested but as his own redo.)

So I am letting Mark Pizzolato (who seems to remain in complete denial) have "his" simulator for his own pleasure (yet I'm quite sure he's not actually using any of it), and keep developing "features" as he pleases.

But I'll be looking elsewhere for a fork of this code, which has a truly open source atmosphere, and which is free of this stubborn, childish and pig-headed attitude, and a clinically insane inability to take any criticism. It's not surprising that many (including Bob Supnik) have decided to EXIT out of here.

You wanted to go postal, you have at it!
Bye Mark

@jfsimon1981
Copy link

jfsimon1981 commented Jan 1, 2023

Hi,
Al,
Though i believe you can fork it and modify/implement. This means just get back to original commit's version.
(although i understood disabling this if required is switchable by parameter).

@jfsimon1981
Copy link

Mark i got a question, running an Unix V7 image which ran with v3.x, i get this error with former dsk image:

%SIM-ERROR: RL0: The disk container '/home/jeanfrancois/got/simh/jfs/drives/pdp11-45_unix_v7_rl.dsk' is larger than simulated device (5242KW > 2621KW)

Can you please specify how to fix the dsk so that newer simh builds will propelry read it ?

Thanks
Jean-François

@markpizz
Copy link
Member

markpizz commented Jan 1, 2023

Well, it seems you're trying to attach a RL02 disk image to a drive set to be a RL01.

I suggest:

  1. Build from the latest source in the master branch of this repo.
  2. run the simulator and remove any potentially added meta data:
    sim> set noautosize
    sim>zap /home/jeanfrancois/got/simh/jfs/drives/pdp11-45_unix_v7_rl.dsk
    sim> set rl0 RL02
    sim> attach RL0 /home/jeanfrancois/got/simh/jfs/drives/pdp11-45_unix_v7_rl.dsk

@jfsimon1981
Copy link

jfsimon1981 commented Jan 2, 2023

Indeed and i found it too in the pdfs.
Unix simulation running terminal is quite slow on PDP11 (i use default tto) is that intended ?
Can we discuss this topic somewhere else perhaps, what's the appropriate space ?
Thanks.
Running Unix V7 very happy to see this one going on ...

@markpizz
Copy link
Member

markpizz commented Jan 2, 2023

Email me at mark@infocomm.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests