Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XML-hul code review fixes #634

Merged
merged 6 commits into from
Apr 8, 2022
Merged

Conversation

david-russo
Copy link
Member

@david-russo david-russo commented Jul 1, 2020

User-facing changes:

  • Made XML-HUL-1 sub-message text translatable and improved formatting
  • Correctly report schemaLocation attributes that define multiple namespace-to-location relationships
  • Correctly report noNamespaceSchemaLocation values as schema locations, instead of namespaces
  • Added warnings for unrecognized and malformed module parameters
  • Removed misleading module parameter example from default jhove.config
  • Fixed error preventing the processing of XML files with local schema paths
  • Fixed XML version reporting for documents beginning with byte-order marks (BOMs)

And the usual code cleanup and housekeeping:

  • Clarified, corrected and expanded documentation
  • Modernized code constructs (e.g. loops, auto-boxing)
  • Removal of unnecessary overrides and unused code
  • Clearer code formatting (e.g. for Property constructs)
  • Javadoc fixes

The module now correctly handles schemaLocations that define multiple
namespace-to-location relationships, instead of lumping them all into
the first namespace's schema location.

This change also correctly reports noNamespaceSchemaLocation values as
the schema's location, instead of its namespace URI.
All parameter parsing has been moved to a single location, and we now
log warnings for unrecognized parameters, as well as parameter
argument which include invalid URI syntax or unresolvable file paths.

A misleading schema example was also removed from the default config
files, which could be better illustrated in module documentation.
This prevented processing of any XML files with local schema URIs.
@david-russo david-russo added the bug A product defect that needs fixing label Jul 1, 2020
Includes:
- Clarified, corrected and expanded documentation
- Modernized code constructs (e.g. loops, auto-boxing)
- Removal of unnecessary overrides and unused code
- Clearer code formatting
- Javadoc fixes
@carlwilson carlwilson self-requested a review April 8, 2022 15:23
@carlwilson carlwilson added the P2 Medium priority issues to be scheduled in a future release label Apr 8, 2022
@carlwilson carlwilson added this to the JHOVE 1.26 milestone Apr 8, 2022
@carlwilson carlwilson merged commit 3f703ea into openpreserve:integration Apr 8, 2022
@leninoc
Copy link

leninoc commented Mar 16, 2023

Hi David and Carl,
we have updated JHOVE in Rosetta to JHOVE 1.26.1 last year and some users found out that the change in ErrorMessage and SubMessage in XML module 1.5.2 are causing issues. What we see is that details which were part of ErrorMessage in JHOVE 1.24 (xml mod 1.5.1) are not part of ErrorMessage anymore but were moved to SubMessage (see the red part in attached pdf). SAXParseException_XML-ErrorMessages_JHOVE26-1.pdf
This causes issues with Rosetta Submission Rules which can be based on text from the ErrorMessage, existing rules based on those details now moved to SubMessage do not work anymore.
Questions I have re this:

  • is this change a result of "Made XML-HUL-1 sub-message text translatable and improved formatting" fix listed in this Issue?
  • is similar change planned for other JHOVE modules or was this only XML module related?
  • Is there a chance to roll this change back?
    We are trying to find out the best way how to deal with this issue, so any context or details from you will help us.
    Thank you!
    Jan

@david-russo
Copy link
Member Author

david-russo commented Mar 20, 2023

Hi Jan,

Yes, that's the related change that's led to your issue. So, for context, I created the sub-message resource while making the "line" and "column" strings translatable and formattable. I moved all the exception details that could vary into the sub-message with the idea that the Error ID's main message/description could/should/would benefit from remaining static, and would still describe it well enough to be documented in the wiki.

I can't remember what specific benefits I might have had in mind... perhaps searchability... or possibly even just the performance benefits of static strings... but considering your use case, I see how this could make it more complicated to match specific SaxParseExceptions. And since it's a bit of a catch-all ID, the specific error detail probably warrants being kept at the main message level.

So I don't see any problems with returning the SaxParseException detail to the Error's main message. We could then probably get rid of that error's sub-message entirely, since the main message would need to be dynamically generated anyway.

@david-russo david-russo deleted the xml-fixes branch March 20, 2023 21:52
@carlwilson
Copy link
Member

Hi @leninoc, let me have a look at your particular issue and see if we can't do something a bit more helpful. Would a more specific ID help here at all or is it simply the main message that you can process?

@leninoc
Copy link

leninoc commented Mar 24, 2023

hi, we will discuss and i will get back to you both next week, thank you!

@asciim0
Copy link
Contributor

asciim0 commented Apr 13, 2023

So I think it makes no sense to have the ErrorMessage "more generic". The information is now actually semantically repetitve - you have the XML-HUL-1 in the ID which means "SAXParseException" ... and then you have "SAXParseException" again in the ErrorMessage. I'd be in favor of rolling it back. From a user perspective I expect the ID to be there to group files together based on a common class of error - and then use the ErrorMessage to take a more concrete look into the file itself.

@leninoc
Copy link

leninoc commented Apr 16, 2023

hi, yes I agree with @asciim0, rolling back makes most sense. I still would keep Submessage as they were in the module 1.5.1 where they had a value pointing to location of the issue in the file (SubMessage: Line = 422, Column = 1784) as this is important detail, although file specific.

@carlwilson
Copy link
Member

So we hear you and are currently preparing a new RC with this reverted, will be available very soon.

@leninoc
Copy link

leninoc commented Apr 21, 2023

hi @carlwilson, thank you for rolling back as requested. I tested 1.28 RC2 today and ErrorMessage now has the value/s it used to have back in 1.24, ie details from SubMessage were moved back to ErrorMessage so thats great -

RC2_153_02

But it seems that RC2 brought back the issue with repeating InfoMessages (#834)

RC2_153_01 I retested in 1.28 RC1 and no repeated InfoMessages there. Test xml file (zipped) attached 0002.zip

Can you please have a look? Sorry for extra work

@carlwilson
Copy link
Member

No need to apologise, this sort of feedback gets things fixed quickly. Not too late to look. Although I'm suprised my tests missed them I think i can imagine a scenario that would cause it in the case of a reversion. Will see what's causing the duplicates and a feasible solution. Plug the regression with an appropriate test and all should be well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A product defect that needs fixing P2 Medium priority issues to be scheduled in a future release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants