Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

container matching doesn't work with directory-only paths: fmt/1196 (SIARD) #123

Closed
ross-spencer opened this Issue Oct 1, 2018 · 10 comments

Comments

Projects
None yet
5 participants
@ross-spencer
Copy link
Contributor

ross-spencer commented Oct 1, 2018

Hi Richard,

As discussed I've attached what I think to be an accurate skeleton container file for fmt/1196. What do you think? (Also noted that SF might not have this capability to identify either as of yet!)

(Unzipping should provide you a SIARD zip)

fmt-1196-container-signature-id-31020.zip

cc. @Dclipsham. I can't get it to pass in DROID either, but l am reluctant to say the problem is there given the recent proximity of the DROID release. PS. David - If there were public samples that could be shared, that would be a great help!

@Dclipsham

This comment has been minimized.

Copy link

Dclipsham commented Oct 1, 2018

Hi Ross,
My test samples for building the signature came from the SFA GitHub - https://github.com/sfa-siard/SiardCmd/tree/master/testfiles - off hand I cannot see anything 'wrong' with your file, but it isn't IDing for me either. Note that in the SFA files the '2.1' directory is empty but in yours I can see a zero-byte file - possibly related?

@ross-spencer

This comment has been minimized.

Copy link
Contributor Author

ross-spencer commented Oct 1, 2018

@Dclipsham it sounds like there is a good chance of that! Thanks for giving the file a once-over. I created the new logic in a bit of a rush and writing a zip with the zero-byte file was the easiest way forward. I wasn't expecting it to impact the identification, but it could I guess? - Will try without.

@richardlehane

This comment has been minimized.

Copy link
Owner

richardlehane commented Oct 1, 2018

👍 @ross-spencer - another notch on the skeleton suite's belt! I hadn't envisioned this type of signature (matching on a directory rather than file) and will need to do a bit of a re-work of my container matching. So this will likely be a bug until the next major release. 👍 @Dclipsham for that extra detail, is good to know that these directories may be empty. 👎 siard team ;), what kind of insanity is this:
image

@richardlehane richardlehane self-assigned this Oct 1, 2018

@richardlehane richardlehane added bug and removed bug labels Oct 1, 2018

@richardlehane richardlehane added this to the 1.8.0 milestone Oct 1, 2018

@marhop

This comment has been minimized.

Copy link

marhop commented Oct 2, 2018

Hi guys,

I think this is a tricky issue with the skeleton file, not with Siegfried or DROID or the SIARD specification.

The SIARD ZIP provided by Ross contains just one entry:

  • 'header/siardversion/2.1/empty_skeleton_container_file'

Extracting this ZIP file creates the expected directory tree with the empty file at the bottom, yes. But still, inside the ZIP file there is only one entry with this unusual name containing slashes. That's different from the entry 'header/siardversion/2.1/' expected by the PRONOM signature. Although this directory is created when extracting the ZIP file, there is no entry of this name in the ZIP file.

Compare this to the attached ZIP file which is corretly identified as fmt/1196 (by DROID at least). It contains the following four entries, amongst them the one expected by the signature:

  • 'header/'
  • 'header/siardversion/'
  • 'header/siardversion/2.1/'
  • 'header/siardversion/2.1/empty_skeleton_container_file'

The differences between the two ZIP files can best be perceived when feeding them to the zipinfo tool or viewing them in a hex editor. (In the latter case, you may find this commented ZIP file example useful for orientation.)

Anyway, it should be noted that although the added file does not impede identification, it may lead to validation failure since the SIARD specification explicitly demands an empty directory. But I suppose we shouldn't worry about validation of skeleton files, so never mind. ;-)

PS: It stands to reason whether one of the two approaches to zipping nested directories is "more correct" than the other. I couldn't find anything specific in the ZIP format specification except that forward slashes are allowed in a file name (see section 4.4.17.1).

@richardlehane

This comment has been minimized.

Copy link
Owner

richardlehane commented Oct 2, 2018

thanks very much for this extra detail @marhop. I tested your sample with siegfried and it fails to ID properly, so a problem there too. Also, same issue with fido from my quick check.

@marhop

This comment has been minimized.

Copy link

marhop commented Oct 2, 2018

It does not? Hm, weird. I used DROID 6.4, signature file 94, container signature file 20180920 with 'open archives' option set to false. Does siegfried correctly identify a "real" SIARD 2.1 file like the one linked by David?

@PhillipTommerholt

This comment has been minimized.

Copy link

PhillipTommerholt commented Oct 2, 2018

Hi guys,

There are many aspects in this thread that I am not sure about, so please disregard if my comment has no relevance.

First thing I notice in the ZIP is that there is no content-folder.
See P_4.2-1 requirement in SIARD2.1

image

Can this help?

@richardlehane richardlehane added the bug label Oct 2, 2018

@richardlehane richardlehane changed the title Skeleton suite: Cannot confirm SIARD fmt/1196 is behaving container matching doesn't work with directory-only paths: fmt/1196 (SIARD) Oct 2, 2018

@richardlehane

This comment has been minimized.

Copy link
Owner

richardlehane commented Oct 2, 2018

@marhop good point - I checked the SIARD 2.1 examples referenced and they don't identify either. Is clearly an assumption I built into siegfried's container matching that I'll just need to rework. fido is also failing for these too so it might have a similar issue with its container algo

@PhillipTommerholt - thanks for this extra detail. It shouldn't matter in this context that the content folder is missing as the container signature only includes that header../ path

siard team - sorry for my little dig above, 👍 for building a very useful digipres tool!

@ross-spencer

This comment has been minimized.

Copy link
Contributor Author

ross-spencer commented Oct 2, 2018

Hi @marhop

PS: It stands to reason whether one of the two approaches to zipping nested directories is "more correct" than the other. I couldn't find anything specific in the ZIP format specification except that forward slashes are allowed in a file name (see section 4.4.17.1).

Yeah, I can see the difference when I unar in Linux. It wasn't something I was looking for! So thank you. I am relatively sure I can make this change using Python's ZIP library 🤞.

I'll see if I can remove the empty file too... that was just the first way the library was happy to create the nested structure in an otherwise empty zip... let's see what we can do!

Cheers,
Ross

@ross-spencer

This comment has been minimized.

Copy link
Contributor Author

ross-spencer commented Oct 6, 2018

This was a weird one to fix in Python.

I think that shutil was maybe meant to work given these issues (I really can't tell):

But I've modified the code to use ZipFile now and it looks good. Directory entries are being created as we would like:

zipinfo *
Archive:  fmt-1196-container-signature-id-31020.siard
Zip file size: 352 bytes, number of entries: 3
drwxr-xr-x  2.0 unx        0 b- stor 18-Oct-06 14:12 header/
drwxr-xr-x  2.0 unx        0 b- stor 18-Oct-06 14:12 header/siardversion/
drwxr-xr-x  2.0 unx        0 b- stor 18-Oct-06 14:12 header/siardversion/2.1/
3 files, 0 bytes uncompressed, 0 bytes compressed:  0.0%

The branch is here, but I might do some proper maintenance on it before I commit to master: https://github.com/exponential-decay/skeleton-container-test-suite-generator/tree/dev/fix-zip-generation

And the resultant SIARD skeleton file for reference (zipped):

fmt-1196-container-signature-id-31020.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.