Skip to content

Loading…

JDOM should implement a 'proxy' EntityResolver that handles w3.org lookups #26

Closed
rolfl opened this Issue · 1 comment

1 participant

@rolfl
Collaborator

http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic/

w3.org 'constant' DTD's can maybe be stored as resources, and other DTD's can be 'cached' in some store for re-use.

This wil not only be good netizenship, but can speed up a number of operations.

Perhaps it can be done as simply as an optionally available EntityResolver (although the implementation will not be as simple...).

Update - more detail on this issue:

As I am going through the junit tests, I am now at the point of testing the SAX and DOM builders. The issue I am having is that I am doing a lot of my work on the train as I commute.... and I don't have a network connection.

This is a problem because the validating parsers need to get some DTD's and XML Schemas from the web... (if they are web-referenced resources).

This is an age-old problem, but I can't think of a great solution. The ideal would be to run junit tests without having to have a network connection at all.

Of course, I could just use input documents that only reference local resources... (and I have) but, in the spirit of JDOM, is there an option for making this process easy in a general sense?

This is further compounded by there being some restrictions on some documents too, like the w3.org 'ban' on default Java user-agents: http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic/

My experimentation indicates that w3.org has put a blanket 'tarpit' of 30 seconds on any connection, regardless of what User-agent you use. This is 'significant'.

Typical solutions to this problem are things like OASIS catalogs, etc. but that feels heavy-weight... or, is it?

So, what options are there? Any ideas?

I think the following are key issues (and OASIS does not solve them all):

  • access to local copies of unavailable resources (no network connection?).
  • general performance improvements by caching entities that have an appropriate 'expires' timeout... no network access for 'cached' resources.
  • improved 'internet-friendliness' reducing unnecessary bandwidth to places like w3.org
  • reduce the amount of 'expertise' a JDOM user needs to do 'the right thing'.

Can JDOM be easily configured to become a good netizen? Should it be done by default?

@rolfl
Collaborator

This is a seperate issue to JDOM, and can be solved outside JDOM. Support can be done at the native SAX EntityResolver or StAX XMLResolver level

@rolfl rolfl closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.