Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 and CJK error #26

Open
nosgnoh opened this issue Dec 13, 2018 · 3 comments
Open

UTF-8 and CJK error #26

nosgnoh opened this issue Dec 13, 2018 · 3 comments

Comments

@nosgnoh
Copy link

nosgnoh commented Dec 13, 2018

Hi Kripken,

I have used your library in my project and see some issue but didn't know this issue belong to your lib or mine. So I log this issue there:

When I validate my xml file using xsd schema with format (utf-8). In xml file I have use some CJK characters and then the result was failed. I research some way to resolve but have no ideas. This is my schema and xml file:

`

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:usdm="http://usdm.asia/usdm">

<xs:simpleType name="idType">
    <xs:restriction base="xs:string">
        <xs:pattern value="[^A-Z]+"/>
    </xs:restriction>
</xs:simpleType>

<xs:simpleType name="sortType">
    <xs:restriction base="xs:string">
        <xs:pattern value="[^A-Z]+"/>
    </xs:restriction>
</xs:simpleType>

<xs:simpleType name="NOType">
    <xs:restriction base="xs:string">
        <xs:pattern value="[^a-z]*"/>
    </xs:restriction>
</xs:simpleType>

<xs:simpleType name="richcontentType">
    <xs:restriction base="xs:string"/>
</xs:simpleType>

<xs:complexType name="reasonType">
    <xs:sequence>
        <xs:element name="richcontent" type="richcontentType" minOccurs="0" />
    </xs:sequence>
    <xs:attribute name="id" type="idType" use="required"/>
    <xs:attribute name="sort" type="sortType" use="required"/>
    <xs:attribute name="NO" type="NOType" use="required"/>
</xs:complexType>

<xs:complexType name="descType">
    <xs:sequence>
        <xs:element name="richcontent" type="richcontentType" minOccurs="0" />
    </xs:sequence>
    <xs:attribute name="id" type="idType" use="required"/>
    <xs:attribute name="sort" type="sortType" use="required"/>
    <xs:attribute name="NO" type="NOType" use="required"/>
</xs:complexType>

<xs:complexType name="reqspecType">
    <xs:sequence>
        <xs:choice minOccurs="0" maxOccurs="unbounded">
            <xs:element name="group" type="groupType"/>
            <xs:element name="reqspec" type="reqspecType"/>
            <xs:element name="reason" type="reasonType" />
            <xs:element name="desc" type="descType"/>
        </xs:choice>
        <xs:sequence>
            <xs:element name="richcontent" type="richcontentType" minOccurs="0" />
            <xs:choice minOccurs="0" maxOccurs="unbounded">
                <xs:element name="group" type="groupType"/>
                <xs:element name="reqspec" type="reqspecType"/>
                <xs:element name="reason" type="reasonType" />
                <xs:element name="desc" type="descType"/>
            </xs:choice>
        </xs:sequence>
    </xs:sequence>
    <xs:attribute name="id" type="idType" use="required"/>
    <xs:attribute name="sort" type="sortType" use="required"/>
    <xs:attribute name="NO" type="NOType" use="required"/>
</xs:complexType>

<xs:complexType name="groupType">
    <xs:sequence>
        <xs:choice minOccurs="0" maxOccurs="unbounded">
            <xs:element name="group" type="groupType" />
            <xs:element name="reqspec" type="reqspecType"/>
        </xs:choice>
        <xs:sequence>
            <xs:element name="richcontent" type="richcontentType" minOccurs="0" />
            <xs:choice minOccurs="0" maxOccurs="unbounded">
                <xs:element name="group" type="groupType" />
                <xs:element name="reqspec" type="reqspecType"/>
            </xs:choice>
        </xs:sequence>
    </xs:sequence>
    <xs:attribute name="id" type="idType" use="required"/>
    <xs:attribute name="sort" type="sortType" use="required"/>
    <xs:attribute name="NO" type="NOType" use="required"/>
</xs:complexType>

<xs:complexType name="usdmType">
    <xs:sequence>
        <xs:element name="group" type="groupType" minOccurs="0" />
    </xs:sequence>
    <xs:attribute name="version" type="xs:string" use="required"/>
</xs:complexType>

<xs:element name="usdm" type="usdmType"/>

</xs:schema>
`

xml :
<?xml version="1.0" encoding="utf-8"?> <usdm version="0.0.0" xmlns:usdm="http://usdm.asia/usdm"> <group id="0" sort="0" NO="ROOT.0"> <richcontent>を</richcontent> </group> </usdm>

I realize from this page https://www.utf8-chartable.de/unicode-utf8-table.pl?start=12288&number=512&names=- that the characters begin
U+3081 | め | e3 82 81
to the end is failed with utf-8

Thank for your attention!

@kondr1
Copy link

kondr1 commented Jul 17, 2019

same

@hmuus
Copy link

hmuus commented Apr 16, 2020

similar, utf-8 with or without BOM fails on some chars. allowed linebreaks throw error too. error message claim no valid utf-8 is submitted, but chars and line breaks are allowed, so this is buggy

@Eccenux
Copy link

Eccenux commented May 26, 2020

Similar for Polish characters. Simplified test case:

<?xml version="1.0" encoding="UTF-8"?>
<czytelnicy xsi:noNamespaceSchemaLocation="ImpCz.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <czytelnicy>Ząb</czytelnicy>
</czytelnicy>

XSD:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified">
    <xs:element name="czytelnicy"></xs:element>
</xs:schema>

Error shown on demo page:

file.xml:3: parser error : PCDATA invalid Char value 5
  <czytelnicy>Z�ąb</czytelnicy>
               ^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants