New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fromXML doesn't terminate when potentially empty unbounded is nested inside unbounded #230

Closed
FranklinChen opened this Issue Dec 11, 2013 · 5 comments

Comments

Projects
None yet
2 participants
@FranklinChen

steps

$ git clone git@github.com:FranklinChen/micase-to-chat.git
$ cd micase-to-chat/
$ git co -b nontermination origin/nontermination
$ sbt test

problem

I generated code from an XML schema, but when run on a sample XML document, it seems not to terminate.

seems to result in nontermination of my test case at

val tei = scalaxb.fromXML[TEIu462](elem)

FranklinChen added a commit to FranklinChen/micase-to-chat that referenced this issue Jan 6, 2014

@ghost ghost assigned eed3si9n Jan 6, 2014

@eed3si9n

This comment has been minimized.

Show comment
Hide comment
@eed3si9n

eed3si9n Jan 6, 2014

Owner

I started digging into child elements and the problem seem to be coming from parsing PARTICDESC.

  def e1 = {
    val elem = XML.load(getClass.getResource("/adv700ju023.xml"))
    val headerXml: Node = (elem \ "TEIHEADER").head
    val filedescXml: Node = (headerXml \ "FILEDESC").head
    val profiledescXml: Node = (headerXml \ "PROFILEDESC").head
    val langUsageXml: Node = (profiledescXml \ "LANGUSAGE").head
    val particDescXml: Node = (profiledescXml \ "PARTICDESC").head
    val bodyXml: Node = (elem \ "TEXT").head

    // bad
    // val tei = scalaxb.fromXML[TEIu462](elem)
    // bad
    // val header = scalaxb.fromXML[TEIHEADER](headerXml)
    // ok
    // val filedesc = scalaxb.fromXML[FILEDESC](filedescXml)
    // bad
    // val profiledesc = scalaxb.fromXML[PROFILEDESC](profiledescXml)
    // ok
    // val langUsage = scalaxb.fromXML[LANGUSAGE](langUsageXml)

    // bad
    val particDesc = scalaxb.fromXML[PARTICDESC](particDescXml)

    // ok 
    // val body = scalaxb.fromXML[TEXT](bodyXml)
    success
  }

The following is the definition:

  <xs:element name="PARTICDESC">
    <xs:complexType>
      <xs:choice>
        <xs:element maxOccurs="unbounded" ref="P"/>
        <xs:sequence>
          <xs:choice maxOccurs="unbounded">
            <xs:element ref="PERSON"/>
            <xs:element ref="PERSONGRP"/>
          </xs:choice>
          <xs:element minOccurs="0" ref="PARTICLINKS"/>
        </xs:sequence>
      </xs:choice>
      <!-- <xs:attribute ... -->
    </xs:complexType>
  </xs:element>

scalaxb transforms this into:

def parser(node: scala.xml.Node, stack: List[scalaxb.ElemName]): Parser[com.franklinchen.PARTICDESC] =
  phrase(
    rep(
      ((scalaxb.ElemName(None, "P")) ^^ 
        (x => scalaxb.DataRecord(x.namespace, Some(x.name), scalaxb.fromXML[com.franklinchen.P](x, scalaxb.ElemName(node) :: stack)))) ||| 
      ((rep(
          ((scalaxb.ElemName(None, "PERSON")) ^^ 
            (x => scalaxb.DataRecord(x.namespace, Some(x.name), scalaxb.fromXML[com.franklinchen.PERSON](x, scalaxb.ElemName(node) :: stack)))) | 
          ((scalaxb.ElemName(None, "PERSONGRP")) ^^ 
            (x => scalaxb.DataRecord(x.namespace, Some(x.name), scalaxb.fromXML[com.franklinchen.PERSONGRP](x, scalaxb.ElemName(node) :: stack))))
        ) ~ 
      opt(scalaxb.ElemName(None, "PARTICLINKS"))) ^^ 
        { case p1 ~ p2 => scalaxb.DataRecord(com.franklinchen.PARTICDESCSequence1(p1.toSeq,
        p2.headOption map { scalaxb.fromXML[com.franklinchen.PARTICLINKS](_, scalaxb.ElemName(node) :: stack) })) })
    ) ^^
    { case p1 =>
    com.franklinchen.PARTICDESC(p1.toSeq,
      ...) }

The repetition and choice structure is changed a bit from the xsd, but that's intentional.
I'd think it's not the nested rep(...) causing infinite loop since it may be bad practice but possible to have regular expressions like (x*)*.

Owner

eed3si9n commented Jan 6, 2014

I started digging into child elements and the problem seem to be coming from parsing PARTICDESC.

  def e1 = {
    val elem = XML.load(getClass.getResource("/adv700ju023.xml"))
    val headerXml: Node = (elem \ "TEIHEADER").head
    val filedescXml: Node = (headerXml \ "FILEDESC").head
    val profiledescXml: Node = (headerXml \ "PROFILEDESC").head
    val langUsageXml: Node = (profiledescXml \ "LANGUSAGE").head
    val particDescXml: Node = (profiledescXml \ "PARTICDESC").head
    val bodyXml: Node = (elem \ "TEXT").head

    // bad
    // val tei = scalaxb.fromXML[TEIu462](elem)
    // bad
    // val header = scalaxb.fromXML[TEIHEADER](headerXml)
    // ok
    // val filedesc = scalaxb.fromXML[FILEDESC](filedescXml)
    // bad
    // val profiledesc = scalaxb.fromXML[PROFILEDESC](profiledescXml)
    // ok
    // val langUsage = scalaxb.fromXML[LANGUSAGE](langUsageXml)

    // bad
    val particDesc = scalaxb.fromXML[PARTICDESC](particDescXml)

    // ok 
    // val body = scalaxb.fromXML[TEXT](bodyXml)
    success
  }

The following is the definition:

  <xs:element name="PARTICDESC">
    <xs:complexType>
      <xs:choice>
        <xs:element maxOccurs="unbounded" ref="P"/>
        <xs:sequence>
          <xs:choice maxOccurs="unbounded">
            <xs:element ref="PERSON"/>
            <xs:element ref="PERSONGRP"/>
          </xs:choice>
          <xs:element minOccurs="0" ref="PARTICLINKS"/>
        </xs:sequence>
      </xs:choice>
      <!-- <xs:attribute ... -->
    </xs:complexType>
  </xs:element>

scalaxb transforms this into:

def parser(node: scala.xml.Node, stack: List[scalaxb.ElemName]): Parser[com.franklinchen.PARTICDESC] =
  phrase(
    rep(
      ((scalaxb.ElemName(None, "P")) ^^ 
        (x => scalaxb.DataRecord(x.namespace, Some(x.name), scalaxb.fromXML[com.franklinchen.P](x, scalaxb.ElemName(node) :: stack)))) ||| 
      ((rep(
          ((scalaxb.ElemName(None, "PERSON")) ^^ 
            (x => scalaxb.DataRecord(x.namespace, Some(x.name), scalaxb.fromXML[com.franklinchen.PERSON](x, scalaxb.ElemName(node) :: stack)))) | 
          ((scalaxb.ElemName(None, "PERSONGRP")) ^^ 
            (x => scalaxb.DataRecord(x.namespace, Some(x.name), scalaxb.fromXML[com.franklinchen.PERSONGRP](x, scalaxb.ElemName(node) :: stack))))
        ) ~ 
      opt(scalaxb.ElemName(None, "PARTICLINKS"))) ^^ 
        { case p1 ~ p2 => scalaxb.DataRecord(com.franklinchen.PARTICDESCSequence1(p1.toSeq,
        p2.headOption map { scalaxb.fromXML[com.franklinchen.PARTICLINKS](_, scalaxb.ElemName(node) :: stack) })) })
    ) ^^
    { case p1 =>
    com.franklinchen.PARTICDESC(p1.toSeq,
      ...) }

The repetition and choice structure is changed a bit from the xsd, but that's intentional.
I'd think it's not the nested rep(...) causing infinite loop since it may be bad practice but possible to have regular expressions like (x*)*.

@eed3si9n

This comment has been minimized.

Show comment
Hide comment
@eed3si9n

eed3si9n Jan 6, 2014

Owner

If you want to get on with your day without waiting for scalaxb to figure this out, you can modify the schema slightly (line 341) and parse your data:

  <xs:element name="PARTICDESC">
    <xs:complexType>
      <xs:choice minOccurs="0" maxOccurs="unbounded">
        <xs:element ref="P"/>
        <xs:element ref="PERSON"/>
        <xs:element ref="PERSONGRP"/>
        <xs:element ref="PARTICLINKS"/>
      </xs:choice>
      <xs:attribute name="ID" type="xs:ID"/>
      <xs:attribute name="N"/>
      <xs:attribute name="LANG" type="xs:IDREF"/>
      <xs:attribute name="REND"/>
      <xs:attribute name="DEFAULT" default="NO">
        <xs:simpleType>
          <xs:restriction base="xs:token">
            <xs:enumeration value="YES"/>
            <xs:enumeration value="NO"/>
          </xs:restriction>
        </xs:simpleType>
      </xs:attribute>
      <xs:attribute name="TEIFORM" default="PARTICDESC"/>
    </xs:complexType>
  </xs:element>

and you get:

TEXT(BODY(List(DataRecord(U,U(List(DataRecord( so. i see that you're from Hartland Michigan 
        ), DataRecord(U,U(List(DataRecord( yes )),None,None,None,None,None,None,None,SMOOTH,None,Some(S2),U)), DataRecord(
         this is, right up the road 
      )),None,None,None,None,None,None,None,SMOOTH,None,Some(S1),U))
Owner

eed3si9n commented Jan 6, 2014

If you want to get on with your day without waiting for scalaxb to figure this out, you can modify the schema slightly (line 341) and parse your data:

  <xs:element name="PARTICDESC">
    <xs:complexType>
      <xs:choice minOccurs="0" maxOccurs="unbounded">
        <xs:element ref="P"/>
        <xs:element ref="PERSON"/>
        <xs:element ref="PERSONGRP"/>
        <xs:element ref="PARTICLINKS"/>
      </xs:choice>
      <xs:attribute name="ID" type="xs:ID"/>
      <xs:attribute name="N"/>
      <xs:attribute name="LANG" type="xs:IDREF"/>
      <xs:attribute name="REND"/>
      <xs:attribute name="DEFAULT" default="NO">
        <xs:simpleType>
          <xs:restriction base="xs:token">
            <xs:enumeration value="YES"/>
            <xs:enumeration value="NO"/>
          </xs:restriction>
        </xs:simpleType>
      </xs:attribute>
      <xs:attribute name="TEIFORM" default="PARTICDESC"/>
    </xs:complexType>
  </xs:element>

and you get:

TEXT(BODY(List(DataRecord(U,U(List(DataRecord( so. i see that you're from Hartland Michigan 
        ), DataRecord(U,U(List(DataRecord( yes )),None,None,None,None,None,None,None,SMOOTH,None,Some(S2),U)), DataRecord(
         this is, right up the road 
      )),None,None,None,None,None,None,None,SMOOTH,None,Some(S1),U))
@FranklinChen

This comment has been minimized.

Show comment
Hide comment
@FranklinChen

FranklinChen Jan 6, 2014

Thanks for looking at this! Since I auto-generated the schema from a DTD anyway, I may indeed just alter it manually.

Thanks for looking at this! Since I auto-generated the schema from a DTD anyway, I may indeed just alter it manually.

@eed3si9n

This comment has been minimized.

Show comment
Hide comment
@FranklinChen

This comment has been minimized.

Show comment
Hide comment

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment