Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDOM should implement a 'proxy' EntityResolver that handles w3.org lookups #26

Closed
rolfl opened this issue Aug 27, 2011 · 1 comment
Closed

Comments

@rolfl
Copy link
Collaborator

rolfl commented Aug 27, 2011

http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic/

w3.org 'constant' DTD's can maybe be stored as resources, and other DTD's can be 'cached' in some store for re-use.

This wil not only be good netizenship, but can speed up a number of operations.

Perhaps it can be done as simply as an optionally available EntityResolver (although the implementation will not be as simple...).

Update - more detail on this issue:

As I am going through the junit tests, I am now at the point of testing the SAX and DOM builders. The issue I am having is that I am doing a lot of my work on the train as I commute.... and I don't have a network connection.

This is a problem because the validating parsers need to get some DTD's and XML Schemas from the web... (if they are web-referenced resources).

This is an age-old problem, but I can't think of a great solution. The ideal would be to run junit tests without having to have a network connection at all.

Of course, I could just use input documents that only reference local resources... (and I have) but, in the spirit of JDOM, is there an option for making this process easy in a general sense?

This is further compounded by there being some restrictions on some documents too, like the w3.org 'ban' on default Java user-agents: http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic/

My experimentation indicates that w3.org has put a blanket 'tarpit' of 30 seconds on any connection, regardless of what User-agent you use. This is 'significant'.

Typical solutions to this problem are things like OASIS catalogs, etc. but that feels heavy-weight... or, is it?

So, what options are there? Any ideas?

I think the following are key issues (and OASIS does not solve them all):

  • access to local copies of unavailable resources (no network connection?).
  • general performance improvements by caching entities that have an appropriate 'expires' timeout... no network access for 'cached' resources.
  • improved 'internet-friendliness' reducing unnecessary bandwidth to places like w3.org
  • reduce the amount of 'expertise' a JDOM user needs to do 'the right thing'.

Can JDOM be easily configured to become a good netizen? Should it be done by default?

@rolfl
Copy link
Collaborator Author

rolfl commented Feb 1, 2012

This is a seperate issue to JDOM, and can be solved outside JDOM. Support can be done at the native SAX EntityResolver or StAX XMLResolver level

@rolfl rolfl closed this as completed Feb 1, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant