-
-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to validate big xml? #1351
Comments
Hi, @iLLysion, Can you provide some examples of what you're seeing? I'm not sure I understand what you're asking, and code will be much clearer. |
Here is code sample http://pastebin.com/5VckXdZm |
Hi @iLLysion, I still don't understand what error you're seeing, or how the behavior of Nokogiri differs from what you expect. You'll need to help me understand in order to receive help. |
I have a simple xml file like this http://pastebin.com/69c50iS8 |
Same for me, I can't find the fast solution to validate large files |
@iLLysion I'm not trying to be obtuse, but you still haven't provided a complete working piece of code demonstrating what you're seeing, which makes it extremely difficult to help you. Here, I've put together such a complete example, which I'll comment on below. #! /usr/bin/env ruby
require 'nokogiri'
require 'tempfile'
require 'pp'
xsd_contents = <<EOXSD
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:annotation>
<xsd:documentation xml:lang="en">
Purchase order schema for Example.com.
Copyright 2000 Example.com. All rights reserved.
</xsd:documentation>
</xsd:annotation>
<xsd:element name="purchaseOrder" type="PurchaseOrderType"/>
<xsd:element name="comment" type="xsd:string"/>
<xsd:complexType name="PurchaseOrderType">
<xsd:sequence>
<xsd:element name="shipTo" type="USAddress"/>
<xsd:element name="billTo" type="USAddress"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="items" type="Items"/>
</xsd:sequence>
<xsd:attribute name="orderDate" type="xsd:date"/>
</xsd:complexType>
<xsd:complexType name="USAddress">
<xsd:sequence>
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="street" type="xsd:string"/>
<xsd:element name="city" type="xsd:string"/>
<xsd:element name="state" type="xsd:string"/>
<xsd:element name="zip" type="xsd:decimal"/>
</xsd:sequence>
<xsd:attribute name="country" type="xsd:NMTOKEN"
fixed="US"/>
</xsd:complexType>
<xsd:complexType name="Items">
<xsd:sequence>
<xsd:element name="item" minOccurs="0" maxOccurs="unbounded">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="productName" type="xsd:string"/>
<xsd:element name="quantity">
<xsd:simpleType>
<xsd:restriction base="xsd:positiveInteger">
<xsd:maxExclusive value="100"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="USPrice" type="xsd:decimal"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="shipDate" type="xsd:date" minOccurs="0"/>
</xsd:sequence>
<xsd:attribute name="partNum" type="SKU" use="required"/>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
<!-- Stock Keeping Unit, a code for identifying products -->
<xsd:simpleType name="SKU">
<xsd:restriction base="xsd:string">
<xsd:pattern value="\\d{3}-[A-Z]{2}"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:schema>
EOXSD
valid_xml_contents = <<EOVALIDXML
<?xml version="1.0"?>
<purchaseOrder orderDate="1999-10-20">
<shipTo country="US">
<name>Alice Smith</name>
<street>123 Maple Street</street>
<city>Mill Valley</city>
<state>CA</state>
<zip>90952</zip>
</shipTo>
<billTo country="US">
<name>Robert Smith</name>
<street>8 Oak Avenue</street>
<city>Old Town</city>
<state>PA</state>
<zip>95819</zip>
</billTo>
<comment>Hurry, my lawn is going wild!</comment>
<items>
<item partNum="872-AA">
<productName>Lawnmower</productName>
<quantity>1</quantity>
<USPrice>148.95</USPrice>
<comment>Confirm this is electric</comment>
</item>
<item partNum="926-AA">
<productName>Baby Monitor</productName>
<quantity>1</quantity>
<USPrice>39.98</USPrice>
<shipDate>1999-05-21</shipDate>
</item>
</items>
</purchaseOrder>
EOVALIDXML
# missing <city></city>
invalid_xml_contents = <<EOINVALIDXML
<?xml version="1.0"?>
<purchaseOrder orderDate="1999-10-20">
<shipTo country="US">
<name>Alice Smith</name>
<street>123 Maple Street</street>
<state>CA</state>
<zip>90952</zip>
</shipTo>
<billTo country="US">
<name>Robert Smith</name>
<street>8 Oak Avenue</street>
<state>PA</state>
<zip>95819</zip>
</billTo>
<comment>Hurry, my lawn is going wild!</comment>
<items>
<item partNum="872-AA">
<productName>Lawnmower</productName>
<quantity>1</quantity>
<USPrice>148.95</USPrice>
<comment>Confirm this is electric</comment>
</item>
<item partNum="926-AA">
<productName>Baby Monitor</productName>
<quantity>1</quantity>
<USPrice>39.98</USPrice>
<shipDate>1999-05-21</shipDate>
</item>
</items>
</purchaseOrder>
EOINVALIDXML
# missing quotes in shipTo
malformed_xml_contents = <<EOMALFORMEDXML
<?xml version="1.0"?>
<purchaseOrder orderDate="1999-10-20">
<shipTo country="US>
<name>Alice Smith</name>
<street>123 Maple Street</street>
<state>CA</state>
<zip>90952</zip>
</shipTo>
<billTo country="US">
<name>Robert Smith</name>
<street>8 Oak Avenue</street>
<state>PA</state>
<zip>95819</zip>
</billTo>
<comment>Hurry, my lawn is going wild!</comment>
<items>
<item partNum="872-AA">
<productName>Lawnmower</productName>
<quantity>1</quantity>
<USPrice>148.95</USPrice>
<comment>Confirm this is electric</comment>
</item>
<item partNum="926-AA">
<productName>Baby Monitor</productName>
<quantity>1</quantity>
<USPrice>39.98</USPrice>
<shipDate>1999-05-21</shipDate>
</item>
</items>
</purchaseOrder>
EOMALFORMEDXML
# validate valid document
xsd = Nokogiri::XML::Schema.new xsd_contents
valid_xml = Nokogiri::XML valid_xml_contents
errors = xsd.validate valid_xml
raise unless errors.length == 0
# validate invalid document
xsd = Nokogiri::XML::Schema.new xsd_contents
invalid_xml = Nokogiri::XML invalid_xml_contents
errors = xsd.validate invalid_xml
pp errors
# => [#<Nokogiri::XML::SyntaxError: Element 'state': This element is not expected. Expected is ( city ).>,
# #<Nokogiri::XML::SyntaxError: Element 'state': This element is not expected. Expected is ( city ).>]
# validate malformed invalid document
xsd = Nokogiri::XML::Schema.new xsd_contents
malformed_xml = Nokogiri::XML malformed_xml_contents
# NOTE that here, nokogiri fixes malformed markup to be:
# <shipTo country="US> "/><name>Alice Smith</name>
errors = xsd.validate malformed_xml
pp errors
# => [#<Nokogiri::XML::SyntaxError: Element 'shipTo', attribute 'country': 'US>' is not a valid value of the atomic type 'xs:NMTOKEN'.>,
# #<Nokogiri::XML::SyntaxError: Element 'shipTo': Missing child element(s). Expected is ( name ).>,
# #<Nokogiri::XML::SyntaxError: Element 'name': This element is not expected. Expected is ( billTo ).>]
# validate valid file
xsd = Nokogiri::XML::Schema.new xsd_contents
valid_xml_file = Tempfile.new "valid"
valid_xml_file.write valid_xml_contents
valid_xml_file.close
errors = xsd.validate valid_xml_file.path
raise unless errors.length == 0
# validate invalid file
xsd = Nokogiri::XML::Schema.new xsd_contents
invalid_xml_file = Tempfile.new "invalid"
invalid_xml_file.write invalid_xml_contents
invalid_xml_file.close
errors = xsd.validate invalid_xml_file.path
pp errors
# => [#<Nokogiri::XML::SyntaxError: Element 'state': This element is not expected. Expected is ( city ).>,
# #<Nokogiri::XML::SyntaxError: Element 'state': This element is not expected. Expected is ( city ).>]
# validate malformed invalid file
xsd = Nokogiri::XML::Schema.new xsd_contents
malformed_xml_file = Tempfile.new "malformed"
malformed_xml_file.write malformed_xml_contents
malformed_xml_file.close
# note that the malformed xml file is still malformed
errors = xsd.validate malformed_xml_file.path
pp errors
# => [#<Nokogiri::XML::SyntaxError: Element 'purchaseOrder': Character content other than whitespace is not allowed because the content type is 'element-only'.>,
# #<Nokogiri::XML::SyntaxError: Element 'purchaseOrder': Character content other than whitespace is not allowed because the content type is 'element-only'.>,
# #<Nokogiri::XML::SyntaxError: Element 'purchaseOrder': Character content other than whitespace is not allowed because the content type is 'element-only'.>,
# #<Nokogiri::XML::SyntaxError: Element 'purchaseOrder': Character content other than whitespace is not allowed because the content type is 'element-only'.>] You're conflating "validity" with "well-formedness". The two things are not the same. This example clearly shows errors when attempting to validate a malformed document; but they're not the same errors I get if I allow Nokogiri to fix the broken markup (making it well-formed) first. If you are seeing something different, then you need to provide me with a complete example of working code including markup and schema. If you pass in a malformed document to |
to make sure we're testing validation of invalid files. Related to #1351.
Thank you. You've answered my question. |
Hello.
I need to validate xml file using xds. When the file is not too big I use document validation and it's Ok.
I have a problem when the file is big. Then I use file validation instead of the document. But this type of the validation doesn't detect error such as unclosed double quote on the attributes and I don't know how many other cases could be.
The text was updated successfully, but these errors were encountered: