Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@破破的桥 代人求教一个文本识别方面的问题。 #43

Closed
haoawesome opened this issue Aug 1, 2014 · 8 comments
Closed

Comments

@haoawesome
Copy link
Collaborator

极客杨的OCR工具箱:Tesseract 是目前应用最广泛的免费开源OCR工具(背后有Google的支持)。商业产品有ABBYY的finereader,还有Adobe;国产的有文通和汉王。除了常规电脑的应用,Tesseract也被移植到智能手机上。资料卡片流: http://hao.memect.com/?tag=ocr-tools

@好东西传送门 代人求教一个文本识别方面的问题。比如对下图这类中文文字、英文文字、数字混排的文本,传统的文本识别软件效果非常差。不知道有没有合适的低成本的方法将这类图片转成文本文件,并且保证一定的识别率(比如90%)?假如这其中还夹杂着非文字的照片呢?
http://www.weibo.com/1459358890/BgFoRwPgG

http://ww4.sinaimg.cn/bmiddle/56fc0caagw1ej06diuyz2j20b90m0afi.jpg

@haoawesome
Copy link
Collaborator Author

https://code.google.com/p/tesseract-ocr/ mostly used open source ocr software. apache 2.0. It has been improved extensively by Google

http://www.zhihu.com/question/19593313

@haoawesome
Copy link
Collaborator Author

@haoawesome haoawesome changed the title Agent杨 出品:OCR 工具 极客杨 出品:OCR 工具 Aug 1, 2014
@haoawesome haoawesome changed the title 极客杨 出品:OCR 工具 极客杨的OCR工具箱 Aug 1, 2014
@haoawesome
Copy link
Collaborator Author

memect:CardReady http://hao.memect.com/?tag=ocr-tools

@haoawesome haoawesome changed the title 极客杨的OCR工具箱 @破破的桥 代人求教一个文本识别方面的问题 Aug 3, 2014
@haoawesome
Copy link
Collaborator Author

memect:weiboReady http://www.weibo.com/5220650532/BgFEdjQG7

@haoawesome
Copy link
Collaborator Author

极客杨的OCR工具箱:Tesseract 是目前应用最广泛的免费开源OCR工具(背后有Google的支持)。商业产品有ABBYY的finereader,还有Adobe;国产的有文通和汉王。当前热点是将OCR移植到智能手机上拓展新的输入渠道、IOS有基于Tesseract的实现,Android有高通vuforia API。资料卡片流: http://t.cn/RPiRyYc

http://www.weibo.com/5220650532/BgFEdjQG7

@haoawesome haoawesome changed the title @破破的桥 代人求教一个文本识别方面的问题 极客杨的OCR工具箱 Aug 3, 2014
@haoawesome haoawesome reopened this Aug 3, 2014
@haoawesome haoawesome changed the title 极客杨的OCR工具箱 @破破的桥 代人求教一个文本识别方面的问题。 Aug 3, 2014
@haoawesome haoawesome mentioned this issue Aug 3, 2014
@haoawesome
Copy link
Collaborator Author

@ S还未完成
呃,这活我干过,刚来美国的时候在小公司做平面设计就要对着这些菜单打字排版,可以用手机的谷歌翻译软件里的拍照功能,拍照后用手指high light 一下需要的文字部位,然后点翻译按钮,两种文字都出来了,复制粘贴即可
http://www.weibo.com/1459358890/BgFoRwPgG

@haoawesome
Copy link
Collaborator Author

讨论主要发生在 http://www.weibo.com/1459358890/BgFoRwPgG

@haoawesome
Copy link
Collaborator Author

极客杨: 关键还是调参数,主要亮点:不同的语言有不同的初始设置; 有颜色或渐进的背景会极大降低识别准确率,需要先转换成黑白/灰度模式(可以试试OpenCV)。 推荐看两篇文章,一篇是Tesseract简介(2007),另一篇报告了Tesseract在处理彩色图片中遇到的问题。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant