Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api.GetHocrText() returns malformed XML #26

Closed
GoogleCodeExporter opened this issue May 29, 2015 · 11 comments
Closed

api.GetHocrText() returns malformed XML #26

GoogleCodeExporter opened this issue May 29, 2015 · 11 comments

Comments

@GoogleCodeExporter
Copy link

Control characters are inserted into the document, and XML parsers cannot 
handle it without first trying to strip them out. This problem was reportedly 
fixed in the main tesseract SVN a few days ago, and I think producing an update 
linked with SVN will fix it.

Using Python 2.7.3 under Windows 7 X64.

P.S. Are there any instructions for building from SVN with VS 2008? I see the 
binary under downloads but there's no information as for how it was generated. 
Just libtesseract et al wrapped with swig?

Original issue reported on code.google.com by stephen....@gmail.com on 9 Aug 2012 at 2:27

@GoogleCodeExporter
Copy link
Author

If u are trying to use Python 64Version, then the answer is negative. I am 
still working on how to compile tesseract-ocr into windows 64 bit version.

Original comment by FreeT...@gmail.com on 10 Aug 2012 at 3:48

@GoogleCodeExporter
Copy link
Author

No; this is 32-bit python, and I have no interest in compiling/distributing 
anything exclusive to 64-bit machines. Apart from the occasional memory 
corruption from Tesseract and this issue, the package is working very well.

Original comment by stephen....@gmail.com on 10 Aug 2012 at 12:39

@GoogleCodeExporter
Copy link
Author

Since the current release of tesseract is   relatively old, compiling
svn might not compatible with python tesseract all the time. Anyhow, I
will look into it and come back to u ASAP.

Original comment by FreeT...@gmail.com on 10 Aug 2012 at 2:55

@GoogleCodeExporter
Copy link
Author

Below is built vs tesseract-ocr svn737 
http://python-tesseract.googlecode.com/files/python-tesseract-0.7.5.win32-py2.7.
exe

If it works, buy me a coffee pls. 

If not, pls contact me.

Original comment by FreeT...@gmail.com on 10 Aug 2012 at 7:30

@GoogleCodeExporter
Copy link
Author

Well done; that seems to have fixed it. I'm more than happy to help feed your 
coffee adiction. Do you accept PayPal?

Also, if you would be willing to pass on any instructions for getting the SWIG 
portion to build properly under VS2008 (once Tesseract itself is built) I'd be 
happy to update my own copies on my development machine. Thanks again for the 
quick fix!

Steve

Original comment by stephen....@gmail.com on 10 Aug 2012 at 7:52

@GoogleCodeExporter
Copy link
Author

Try and let me know whether the following procedures work for u

svn checkout http://python-tesseract.googlecode.com/svn/trunk/ python-tesseract
cd python-tesseract
python setup.py build
python setup.py install


Original comment by FreeT...@gmail.com on 10 Aug 2012 at 10:48

@GoogleCodeExporter
Copy link
Author

[deleted comment]

@GoogleCodeExporter
Copy link
Author

https://www.paypal.com/cgi-bin/webscr?cmd=_cart&business=VD2Y4PZSK7T86&lc=HK&ite
m_name=To%20support%20the%20development%20of%20python%2dtesseract&amount=5%2e00�
�cy_code=USKD&button_subtype=products&add=1&bn=PP%2dShopCartBF%3abtn_cart_LG%2eg
if%3aNonHosted

Original comment by FreeT...@gmail.com on 10 Aug 2012 at 10:51

@GoogleCodeExporter
Copy link
Author

Worked like a charm. Sending you a couple cups of coffee shortly. Thanks!
Steve

Original comment by stephen....@gmail.com on 13 Aug 2012 at 12:32

@GoogleCodeExporter
Copy link
Author

Thank you for your coffees

Original comment by FreeT...@gmail.com on 13 Aug 2012 at 5:31

@GoogleCodeExporter
Copy link
Author

Original comment by FreeT...@gmail.com on 20 Aug 2012 at 8:47

  • Changed state: Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant