Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test harness and api changes #31

Closed
wants to merge 27 commits into from
Closed

test harness and api changes #31

wants to merge 27 commits into from

Conversation

edsu
Copy link
Contributor

@edsu edsu commented Aug 27, 2012

I don't know if these changes will be too substantial for you, but I've refactored the Fido class and module to support the development of a test suite, which I have also begun.

The majority of the changes involved making it easy to instantiate a Fido class without having to also do all the stuff that was going on in main. I also removed the printing from the Fido class and put it into main.

Another significant change I made was introducing a new method identify_files which is a generator that lazily returns all matches, walking inside of zip file containers. identify_file now just returns a single identity.

The big thing with this pull request IMHO is the tests. I plan on improving them, so maybe it's worth taking a look?

PS. I turned the README.txt into a README.md for pretty display on Github :-)

@ghost ghost assigned techmaurice Aug 28, 2012
@techmaurice
Copy link
Contributor

Hi Ed,
Thank you very much, I will review the changes and come back to you.
Please note that there was a commit yesterday (August 27, 2012) which you might want to merge with your own fork.
Cheers,
Maurice

@edsu
Copy link
Contributor Author

edsu commented Aug 28, 2012

Will do, if you want to chat some time on the phone some time about this I would be open to it. There might even be some other parties who are interested.

For example we might want to consider pulling these changes into a new 'testing' branch or something instead of into 'main'?

… setup.py with addition of test_suite and minor formatting changes.
@edsu
Copy link
Contributor Author

edsu commented Aug 28, 2012

Just merged in the latest changes, and fixed minor conflict in the new setup.py

@techmaurice
Copy link
Contributor

Hi Ed, I will contact you by e-mail to discuss the changes and further development.

@edsu
Copy link
Contributor Author

edsu commented Aug 29, 2012

Well these comments go to my inbox. I would prefer to keep the electronic
convo out in the public if possible.

@edsu
Copy link
Contributor Author

edsu commented Sep 2, 2012

Ok, good point. I've brought -zip back, and call Fido.identify_files or Fido.identify_file in main.

@edsu
Copy link
Contributor Author

edsu commented Sep 3, 2012

By the way, I wrote up some meandering thoughts about this work at http://inkdroid.org/journal/2012/09/02/fido-test-suite/

@adamfarquhar
Copy link
Contributor

Ed – Really nice to see your blog on this. Have you seen David Tarrant’s thoughts on govdoc at http://www.openplanetsfoundation.org/blogs/2012-07-26-1-million-21000-reducing-govdocs-significantly ?

Cheers,

Adam.

From: Ed Summers [mailto:notifications@github.com]
Sent: 03 September 2012 05:44
To: openplanets/fido
Subject: Re: [fido] test harness and api changes (#31)

By the way, I wrote up some meandering thoughts about this work at http://inkdroid.org/journal/2012/09/02/fido-test-suite/


Reply to this email directly or view it on GitHub #31 (comment) .

https://github.com/notifications/beacon/J6T91GIPIyhU-8ti4GCGP71HMz-JIPP88TUMNC52GAEzwE-GelGzAcGCV9Oi0g4m.gif

Adam Farquhar
Head of Digital Scholarship
Scholarship and Collections
T:+44 (0)20 7412 7832

Adam.Farquhar@bl.uk
The British Library
London

NW1 2DB

http://www.bl.uk/
The British Library’s new interactive Annual Report and Accounts 2010/11

http://www.bl.uk/aboutus/annrep/2010to2011/index.htmlhttp://www.bl.uk/knowledge

http://www.bl.uk/emaildisclaimer.html

@edsu
Copy link
Contributor Author

edsu commented Sep 3, 2012

@adamfarquhar no I hadn't seen that -- thanks!

@techmaurice
Copy link
Contributor

Thanks Ed, still busy reviewing/testing your code. Must say the generator is quite cleverly done...

Read your blog about FIDO test suite. In fact the PUID's are easily resolvable @Pronom, e.g.:
For humans: http://www.nationalarchives.gov.uk/PRONOM/fmt/142
For machines: http://www.nationalarchives.gov.uk/PRONOM/fmt/142.xml

@edsu
Copy link
Contributor Author

edsu commented Oct 23, 2012

Just for anyone else following along, here is a private email I received from Maurice on October 17th:

Hi Ed,

I have reviewed and tested your commit but unfortunately there are a number of issues:

  1. Fido.py crashes after it can't match an object with a signature, container or extension. The identify_file generator yields 'None' while an 'Info' object is expected by print_match().
    This is probably related to the missing "fail" status (see 2.)
  2. Missing "fail" status from files which could not be identified

Note that these issues only appear when invoked from CLI. Did not test your API functionality.

Due to this I also have not been able to test the rest of the functionality.

I also do have some remarks about your test suite:
It does not test the status of a result, e.g. signature, container, extension or fail. It is very important to know the type of match since it tells us something about the reliability of the match and which logic is applied.

Example: a file that matches on "extension" in one signature-version could very well match on a "signature" if the PUID is updated. It can also be the other way around, a file that has been matched before with "signature" might be matching on "extension" in a later version.

Your work has not been in vain, I really like your solution with the generator functions.
Can I merge the generator code with the next commit? Of course your work will be mentioned.

Kind regards,
Maurice de Rooij

@edsu
Copy link
Contributor Author

edsu commented Oct 23, 2012

I responded to the above email with:

Rather than pulling these changes into master do you want to pull it in as a branch and add issue tickets for the problems you noticed?

@techmaurice
Copy link
Contributor

Pulled the changes into a separate branch and will issue tickets for the problems. Closing this pull request.

@edsu
Copy link
Contributor Author

edsu commented Oct 30, 2012

would you be ok with renaming the branch to something a bit more meaningful like testsuite-refactor or something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants