Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XML DOCTYPE is disallowed when the feature set to true #2608

Closed
tomasonjo opened this issue Mar 6, 2022 · 0 comments
Closed

XML DOCTYPE is disallowed when the feature set to true #2608

tomasonjo opened this issue Mar 6, 2022 · 0 comments

Comments

@tomasonjo
Copy link
Contributor

Guidelines

I am trying to open an xml.gz file with the help of apoc.load.xml, but I get the following error:

Failed to invoke procedure apoc.load.xml: Caused by: org.xml.sax.SAXParseException; lineNumber: 2; columnNumber: 10; DOCTYPE is disallowed when the feature "http://apache.org/xml/features/disallow-doctype-decl" set to true.

I have no idea why this happens, it is independant of the compression config I use or if I try to load from a local file or from internet.

Expected Behavior (Mandatory)

The apoc.load.xml should open the file and display some values

Actual Behavior (Mandatory)

I get the following error when executing apoc.load.xml

Failed to invoke procedure apoc.load.xml: Caused by: org.xml.sax.SAXParseException; lineNumber: 2; columnNumber: 10; DOCTYPE is disallowed when the feature "http://apache.org/xml/features/disallow-doctype-decl" set to true.

How to Reproduce the Problem

Simple Dataset (where it's possibile)

The XML GZ files are available on https://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/

Steps (Mandatory)

If you simply execute the load.xml function you get the following error:

WITH "https://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/pubmed22n1219.xml.gz" AS url
CALL apoc.load.xml(url) YIELD value
UNWIND value AS article
RETURN article LIMIT 5

Failed to invoke procedure apoc.load.xml: Caused by: org.xml.sax.SAXParseException; lineNumber: 2; columnNumber: 10; DOCTYPE is disallowed when the feature "http://apache.org/xml/features/disallow-doctype-decl" set to true.

You can add the compression config but it doesn't help at all:

WITH "https://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/pubmed22n1219.xml.gz" AS url
CALL apoc.load.xml(url, "/", {compression:"GZIP"}) YIELD value
UNWIND value AS article
RETURN article LIMIT 5

Specifications (Mandatory)

Currently used versions

Versions

  • OS: Ubuntu 20.04
  • Neo4j: 4.4.2
  • Neo4j-Apoc: 4.4.0.3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants