New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When trying to use XMLEventReader with an utf-8 encoded XML document that starts with a byte order mark (EF BB BF), an empty iterator is returned, and an error message is printed to stderr #95
Comments
Is this an issue with |
it's an issue with XMLEventReader. |
There is no attached file? |
ah yes sorry i will attach the file |
How do you know it isn't just an issue with
|
i tested Source.fromFile and it's working fine with this file |
Attaching the file as a screen shot is useless. You'll need to attach the actual file, or link to the actual file, since the actual bytes matter. |
i updated my post with a link, because i coudn't upload an xml file |
I see this was reported back in 2012 at |
closely related and perhaps relevant; here is a still-open ticket from 2009 on essentially the same issue, but in the context of scalac rather than scala-xml, and thus involving |
judging from http://stackoverflow.com/questions/1835430/byte-order-mark-screws-up-file-reading-in-java and http://mindprod.com/jgloss/bom.html, I think this is just standard JVM stuff and is not scala-xml specific. if it were to be addressed, it would be addressed in both of the links I've provided suggest multiple strategies for working around this. |
not for me:
that looks OK, but appearances are misleading:
oops — that's garbage from the BOM. |
@SethTisue: Thank you for answers |
Hey @thatismypath a working scala snippet,
For sbt this is the dependency,
Interestingly enough, XML.loadFile("hasBOM.xml") seems to be working fine, not sure who is handling the BOM, whether the SAXParser which scala-xml uses or something else ? Regards. |
Thank you @biswanaths, it works perfectly with your code. Cool Regards. |
The XML returned from BizTalk contains a BOM character. Java can't handle this. We need to handle this ourselves. See here: #95 (comment) |
When trying to use XMLEventReader with an utf-8 encoded XML document that starts with a byte order mark (EF BB BF), an empty iterator is returned, and an error message is printed to stderr.
test.scala
import scala.io.Source
import scala.xml.pull.XMLEventReader
val t = new XMLEventReader(Source.fromFile("hasBOM.xml"))
println(t)
Output of scala test.scala
file:/tmp/hasBOM.xml:1:1: < expected^
empty iterator
The attached file hasBOM.xml has CR+LF newlines, but LF or CR newlines do not change the behaviour.
https://drive.google.com/file/d/0BzQoj9XC6BxUUWZfdzgwb3J4OWc/view?usp=sharing
The text was updated successfully, but these errors were encountered: