New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 BOM results in crash #31

Closed
eed3si9n opened this Issue Feb 17, 2011 · 1 comment

Comments

Projects
None yet
1 participant
@eed3si9n
Owner

eed3si9n commented Feb 17, 2011

originally reported by @fredferrao.

steps

  1. compile a schema that has UTF-8 BOM.

problem

$ scalaxb -p foo consReciNFe_v2.00.xsd 
org.xml.sax.SAXParseException: Content is not allowed in prolog.
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:195)
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:174)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:388)
    at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1414)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:1039)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
    at javax.xml.parsers.SAXParser.parse(SAXParser.java:395)
    at scala.xml.factory.XMLLoader$class.loadXML(XMLLoader.scala:40)
    at scalaxb.compiler.CustomXML$.loadXML(Module.scala:330)
    at scala.xml.factory.XMLLoader$class.load(XMLLoader.scala:53)
    at scalaxb.compiler.CustomXML$.load(Module.scala:330)
    at scalaxb.compiler.xsd.Driver.readerToRawSchema(Driver.scala:79)
    at scalaxb.compiler.xsd.Driver.readerToRawSchema(Driver.scala:31)
    at scalaxb.compiler.Module$$anon$1.toRawSchema(Module.scala:77)
    at scalaxb.compiler.Module$$anon$1.toRawSchema(Module.scala:76)
    at scalaxb.compiler.Module$$anonfun$processReaders$1.apply(Module.scala:159)
    at scalaxb.compiler.Module$$anonfun$processReaders$1.apply(Module.scala:156)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
    at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61)
    at scala.collection.immutable.List.foreach(List.scala:45)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:206)
    at scala.collection.immutable.List.map(List.scala:45)
    at scalaxb.compiler.Module$class.processReaders(Module.scala:156)
    at scalaxb.compiler.xsd.Driver.processReaders(Driver.scala:31)
    at scalaxb.compiler.Module$class.processFiles(Module.scala:98)
    at scalaxb.compiler.xsd.Driver.processFiles(Driver.scala:31)
    at scalaxb.compiler.Main$.start(Main.scala:78)
    at scalaxb.compiler.Main$.main(Main.scala:33)
    at scalaxb.compiler.Main.main(Main.scala)

expectations

it works.

@eed3si9n

This comment has been minimized.

Show comment
Hide comment
@eed3si9n

eed3si9n Feb 18, 2011

Owner

fixed.
c708f3a

override def toRawSchema(value: File) = {
  val BOM_SIZE = 4
  val EF = 0xEF.toByte
  val BB = 0xBB.toByte
  val BF = 0xBF.toByte
  val FE = 0xFE.toByte
  val FF = 0xFF.toByte
  val bom = Array.ofDim[Byte](BOM_SIZE)
  val in = new java.io.PushbackInputStream(new java.io.FileInputStream(value), BOM_SIZE)
  val readSize = in.read(bom, 0, bom.length)
  val (bomSize, encoding) = bom.toList match {
    case EF :: BB :: BF :: xs => (3, "UTF-8")
    case FE :: FF :: xs       => (2, "UTF-16BE")
    case FF :: FE :: xs       => (2, "UTF-16LE")
    case _                    => (0, "UTF-8")
  }
  in.unread(bom, bomSize, readSize - bomSize)
  readerToRawSchema(new BufferedReader(new java.io.InputStreamReader(in, encoding)))
}

this goes back to Sun not fixing Bug ID 4508058: UTF-8 encoding does not recognize initial BOM.

Owner

eed3si9n commented Feb 18, 2011

fixed.
c708f3a

override def toRawSchema(value: File) = {
  val BOM_SIZE = 4
  val EF = 0xEF.toByte
  val BB = 0xBB.toByte
  val BF = 0xBF.toByte
  val FE = 0xFE.toByte
  val FF = 0xFF.toByte
  val bom = Array.ofDim[Byte](BOM_SIZE)
  val in = new java.io.PushbackInputStream(new java.io.FileInputStream(value), BOM_SIZE)
  val readSize = in.read(bom, 0, bom.length)
  val (bomSize, encoding) = bom.toList match {
    case EF :: BB :: BF :: xs => (3, "UTF-8")
    case FE :: FF :: xs       => (2, "UTF-16BE")
    case FF :: FE :: xs       => (2, "UTF-16LE")
    case _                    => (0, "UTF-8")
  }
  in.unread(bom, bomSize, readSize - bomSize)
  readerToRawSchema(new BufferedReader(new java.io.InputStreamReader(in, encoding)))
}

this goes back to Sun not fixing Bug ID 4508058: UTF-8 encoding does not recognize initial BOM.

This issue was closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment